Aussie AI

Reasoning Tokens

  • Last Updated 7 March, 2025
  • by David Spuler, Ph.D.

Reasoning tokens are special non-word meta-tokens that aid reasoning. Simple examples of reasoning tokens include "pause tokens" (pause to think more) or "start thought" and "stop thought" tokens (marks a segment of text as a "thought"). The full extreme is where the interim steps of reasoning are performed in non-language tokens, or even where the entire reasoning process is done using "concept tokens" in Large Concept Models (LCMs). The goals of using reasoning tokens can be two-fold:

  • Greater accuracy from using concepts in reasoning (avoiding ambiguity of language), and/or
  • Faster cost-effective reasoning by using fewer tokens (i.e., token reduction optimizations).

See also other reasoning topics:

Research on Reasoning Tokens

Research papers on reasoning tokens include:

  • Ignacio de Gregorio Noblejas, September 15, 2024, OpenAI Launches o1. Here’s All You Need to Know, https://thetechoasis.beehiiv.com/p/openai-launches-o1-heres-need-know
  • Shibo Hao, Sainbayar Sukhbaatar, DiJia Su, Xian Li, Zhiting Hu, Jason Weston, Yuandong Tian, 9 Dec 2024, Training Large Language Models to Reason in a Continuous Latent Space, https://arxiv.org/abs/2412.06769 (Performing reasoning in a model trained to operate in the embedding vector space, rather than more directly in the token space.)
  • Luyang Liu, Jonas Pfeiffer, Jiaxing Wu, Jun Xie, Arthur Szlam, 23 Dec 2024, Deliberation in Latent Space via Differentiable Cache Augmentation, https://arxiv.org/abs/2412.17747 (Augmenting the KV cache with reasoning information so that decoding will mimic multi-step reasoning with fewer tokens required for intermediate steps.)
  • Sachin Goyal, Ziwei Ji, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar, Vaishnavh Nagarajan, 21 Apr 2024 (v3), Think before you speak: Training Language Models With Pause Tokens, https://arxiv.org/abs/2310.02226 (Inserting extra "pause tokens" that trigger the LLM to perform extra reasoning during the decoding phase.)
  • Jacob Pfau, William Merrill, Samuel R. Bowman, 24 Apr 2024, Let's Think Dot by Dot: Hidden Computation in Transformer Language Models, https://arxiv.org/abs/2404.15758 (Use of dummy "filler tokens" similar to "pause tokens" or "reasoning tokens" to aid multi-step reasoning in decoding.)
  • Eric Zelikman, Georges Harik, Yijia Shao, Varuna Jayasiri, Nick Haber, Noah D. Goodman, 18 Mar 2024 (v2), Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking, https://arxiv.org/abs/2403.09629 (Introduces answers between a start-of-thought and end-of-thought meta-token for reasoning.)
  • Lance Eliot, Dec 18, 2024, Chain Of Continuous Thought Promises Mighty Boost For LLMs And Generative AI By Blowing Up The Fixation On Tokens, https://www.forbes.com/sites/lanceeliot/2024/12/18/chain-of-continuous-thought-promises-mighty-boost-for-llms-and-generative-ai-by-blowing-up-the-fixation-on-tokens/
  • Xuan Shen, Yizhou Wang, Xiangxi Shi, Yanzhi Wang, Pu Zhao, Jiuxiang Gu, 31 Jan 2025, Efficient Reasoning with Hidden Thinking, https://arxiv.org/abs/2501.19201
  • DiJia Su, Hanlin Zhu, Yingchen Xu, Jiantao Jiao, Yuandong Tian, Qinqing Zheng, 5 Feb 2025. Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning, https://arxiv.org/abs/2502.03275
  • Jim the AI Whisperer, Feb 2025, I hacked Perplexity AI’s full system prompt when I shared my own cognitive vulnerabilities with it. How I used my own scrambled brain to outwit Perplexity AI. https://medium.com/the-generator/prompt-hacking-perplexity-ai-system-instructions-7aa6ee923060
  • Kongcheng Zhang, Qi Yao, Baisheng Lai, Jiaxing Huang, Wenkai Fang, Dacheng Tao, Mingli Song, Shunyu Liu, 19 Feb 2025, Reasoning with Reinforced Functional Token Tuning, https://arxiv.org/abs/2502.13389
  • Ziang Ye, Zhenru Zhang, Yang Zhang, Jianxin Ma, Junyang Lin, Fuli Feng, 19 Dec 2024, Disentangling Reasoning Tokens and Boilerplate Tokens For Language Model Fine-tuning, https://arxiv.org/abs/2412.14780
  • Zhenyi Shen, Hanqi Yan, Linhai Zhang, Zhanghao Hu, Yali Du, Yulan He, 28 Feb 2025, CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation, https://arxiv.org/abs/2502.21074

Pause Tokens

Pause tokens are a type of meta-token that helps an LLM perform better reasoning in a multi-step inference algorithm such as Chain-of-Thought. The idea is to train extra meta-tokens that instruct the LLM to "pause" and think some more at the current point. Research papers on pause tokens include:

  • Luyang Liu, Jonas Pfeiffer, Jiaxing Wu, Jun Xie, Arthur Szlam, 23 Dec 2024, Deliberation in Latent Space via Differentiable Cache Augmentation, https://arxiv.org/abs/2412.17747 (Augmenting the KV cache with reasoning information so that decoding will mimic multi-step reasoning with fewer tokens required for intermediate steps.)
  • Sachin Goyal, Ziwei Ji, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar, Vaishnavh Nagarajan, 21 Apr 2024 (v3), Think before you speak: Training Language Models With Pause Tokens, https://arxiv.org/abs/2310.02226 (Inserting extra "pause tokens" that trigger the LLM to perform extra reasoning during the decoding phase.)
  • Jacob Pfau, William Merrill, Samuel R. Bowman, 24 Apr 2024, Let's Think Dot by Dot: Hidden Computation in Transformer Language Models, https://arxiv.org/abs/2404.15758 (Use of dummy "filler tokens" similar to "pause tokens" or "reasoning tokens" to aid multi-step reasoning in decoding.)
  • Eric Zelikman, Georges Harik, Yijia Shao, Varuna Jayasiri, Nick Haber, Noah D. Goodman, 18 Mar 2024 (v2), Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking, https://arxiv.org/abs/2403.09629 (Introduces answers between a start-of-thought and end-of-thought meta-token for reasoning.)
  • Zeyu Tang, Zhenhao Chen, Loka Li, Xiangchen Song, Yunlong Deng, Yifan Shen, Guangyi Chen, Peter Spirtes, Kun Zhang, 5 Feb 2025, Reflection-Window Decoding: Text Generation with Selective Refinement, https://arxiv.org/abs/2502.03678 (Combination of sliding window attention with pausing.)

Concept Tokens

Concept tokens are an LLM inference method that uses meta-tokens to represent concepts rather than words. This can be used to improve accuracy of reasoning (because of less language ambiguity), and/or efficiency of token processing, because there are fewer tokens. The full use of concept tokens for the entire LLM is called a "concept model" or a "Large Concept Model" (LCM). It is also possible to use concept tokens in the interim steps of Chain-of-Thought reasoning.

Research papers on concept tokens and Large Concept Models:

Reasoning and CoT Efficiency Topics

Blog articles on reasoning efficiency:

More research information on general efficiency optimization techniques for reasoning models:

Efficiency optimizations to Chain-of-Thought include:

More AI Research

Read more about: