Aussie AI
Reasoning Tokens
-
Last Updated 7 March, 2025
-
by David Spuler, Ph.D.
Reasoning tokens are special non-word meta-tokens that aid reasoning. Simple examples of reasoning tokens include "pause tokens" (pause to think more) or "start thought" and "stop thought" tokens (marks a segment of text as a "thought"). The full extreme is where the interim steps of reasoning are performed in non-language tokens, or even where the entire reasoning process is done using "concept tokens" in Large Concept Models (LCMs). The goals of using reasoning tokens can be two-fold:
- Greater accuracy from using concepts in reasoning (avoiding ambiguity of language), and/or
- Faster cost-effective reasoning by using fewer tokens (i.e., token reduction optimizations).
See also other reasoning topics:
- Reasoning Algorithms
- Reasoning Inference Optimization
- Chain-of-Thought Optimization (e.g., token reduction, step skipping)
Research on Reasoning Tokens
Research papers on reasoning tokens include:
- Ignacio de Gregorio Noblejas, September 15, 2024, OpenAI Launches o1. Here’s All You Need to Know, https://thetechoasis.beehiiv.com/p/openai-launches-o1-heres-need-know
- Shibo Hao, Sainbayar Sukhbaatar, DiJia Su, Xian Li, Zhiting Hu, Jason Weston, Yuandong Tian, 9 Dec 2024, Training Large Language Models to Reason in a Continuous Latent Space, https://arxiv.org/abs/2412.06769 (Performing reasoning in a model trained to operate in the embedding vector space, rather than more directly in the token space.)
- Luyang Liu, Jonas Pfeiffer, Jiaxing Wu, Jun Xie, Arthur Szlam, 23 Dec 2024, Deliberation in Latent Space via Differentiable Cache Augmentation, https://arxiv.org/abs/2412.17747 (Augmenting the KV cache with reasoning information so that decoding will mimic multi-step reasoning with fewer tokens required for intermediate steps.)
- Sachin Goyal, Ziwei Ji, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar, Vaishnavh Nagarajan, 21 Apr 2024 (v3), Think before you speak: Training Language Models With Pause Tokens, https://arxiv.org/abs/2310.02226 (Inserting extra "pause tokens" that trigger the LLM to perform extra reasoning during the decoding phase.)
- Jacob Pfau, William Merrill, Samuel R. Bowman, 24 Apr 2024, Let's Think Dot by Dot: Hidden Computation in Transformer Language Models, https://arxiv.org/abs/2404.15758 (Use of dummy "filler tokens" similar to "pause tokens" or "reasoning tokens" to aid multi-step reasoning in decoding.)
- Eric Zelikman, Georges Harik, Yijia Shao, Varuna Jayasiri, Nick Haber, Noah D. Goodman, 18 Mar 2024 (v2), Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking, https://arxiv.org/abs/2403.09629 (Introduces answers between a start-of-thought and end-of-thought meta-token for reasoning.)
- Lance Eliot, Dec 18, 2024, Chain Of Continuous Thought Promises Mighty Boost For LLMs And Generative AI By Blowing Up The Fixation On Tokens, https://www.forbes.com/sites/lanceeliot/2024/12/18/chain-of-continuous-thought-promises-mighty-boost-for-llms-and-generative-ai-by-blowing-up-the-fixation-on-tokens/
- Xuan Shen, Yizhou Wang, Xiangxi Shi, Yanzhi Wang, Pu Zhao, Jiuxiang Gu, 31 Jan 2025, Efficient Reasoning with Hidden Thinking, https://arxiv.org/abs/2501.19201
- DiJia Su, Hanlin Zhu, Yingchen Xu, Jiantao Jiao, Yuandong Tian, Qinqing Zheng, 5 Feb 2025. Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning, https://arxiv.org/abs/2502.03275
- Jim the AI Whisperer, Feb 2025, I hacked Perplexity AI’s full system prompt when I shared my own cognitive vulnerabilities with it. How I used my own scrambled brain to outwit Perplexity AI. https://medium.com/the-generator/prompt-hacking-perplexity-ai-system-instructions-7aa6ee923060
- Kongcheng Zhang, Qi Yao, Baisheng Lai, Jiaxing Huang, Wenkai Fang, Dacheng Tao, Mingli Song, Shunyu Liu, 19 Feb 2025, Reasoning with Reinforced Functional Token Tuning, https://arxiv.org/abs/2502.13389
- Ziang Ye, Zhenru Zhang, Yang Zhang, Jianxin Ma, Junyang Lin, Fuli Feng, 19 Dec 2024, Disentangling Reasoning Tokens and Boilerplate Tokens For Language Model Fine-tuning, https://arxiv.org/abs/2412.14780
- Zhenyi Shen, Hanqi Yan, Linhai Zhang, Zhanghao Hu, Yali Du, Yulan He, 28 Feb 2025, CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation, https://arxiv.org/abs/2502.21074
Pause Tokens
Pause tokens are a type of meta-token that helps an LLM perform better reasoning in a multi-step inference algorithm such as Chain-of-Thought. The idea is to train extra meta-tokens that instruct the LLM to "pause" and think some more at the current point. Research papers on pause tokens include:
- Luyang Liu, Jonas Pfeiffer, Jiaxing Wu, Jun Xie, Arthur Szlam, 23 Dec 2024, Deliberation in Latent Space via Differentiable Cache Augmentation, https://arxiv.org/abs/2412.17747 (Augmenting the KV cache with reasoning information so that decoding will mimic multi-step reasoning with fewer tokens required for intermediate steps.)
- Sachin Goyal, Ziwei Ji, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar, Vaishnavh Nagarajan, 21 Apr 2024 (v3), Think before you speak: Training Language Models With Pause Tokens, https://arxiv.org/abs/2310.02226 (Inserting extra "pause tokens" that trigger the LLM to perform extra reasoning during the decoding phase.)
- Jacob Pfau, William Merrill, Samuel R. Bowman, 24 Apr 2024, Let's Think Dot by Dot: Hidden Computation in Transformer Language Models, https://arxiv.org/abs/2404.15758 (Use of dummy "filler tokens" similar to "pause tokens" or "reasoning tokens" to aid multi-step reasoning in decoding.)
- Eric Zelikman, Georges Harik, Yijia Shao, Varuna Jayasiri, Nick Haber, Noah D. Goodman, 18 Mar 2024 (v2), Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking, https://arxiv.org/abs/2403.09629 (Introduces answers between a start-of-thought and end-of-thought meta-token for reasoning.)
- Zeyu Tang, Zhenhao Chen, Loka Li, Xiangchen Song, Yunlong Deng, Yifan Shen, Guangyi Chen, Peter Spirtes, Kun Zhang, 5 Feb 2025, Reflection-Window Decoding: Text Generation with Selective Refinement, https://arxiv.org/abs/2502.03678 (Combination of sliding window attention with pausing.)
Concept Tokens
Concept tokens are an LLM inference method that uses meta-tokens to represent concepts rather than words. This can be used to improve accuracy of reasoning (because of less language ambiguity), and/or efficiency of token processing, because there are fewer tokens. The full use of concept tokens for the entire LLM is called a "concept model" or a "Large Concept Model" (LCM). It is also possible to use concept tokens in the interim steps of Chain-of-Thought reasoning.
Research papers on concept tokens and Large Concept Models:
- LCM team, Loïc Barrault, Paul-Ambroise Duquenne, Maha Elbayad, Artyom Kozhevnikov, Belen Alastruey, Pierre Andrews, Mariano Coria, Guillaume Couairon, Marta R. Costa-jussà, David Dale, Hady Elsahar, Kevin Heffernan, João Maria Janeiro, Tuan Tran, Christophe Ropers, Eduardo Sánchez, Robin San Roman, Alexandre Mourachko, Safiyyah Saleem, Holger Schwenk, 15 Dec 2024 (v2), Large Concept Models: Language Modeling in a Sentence Representation Space, https://arxiv.org/abs/2412.08821 https://github.com/facebookresearch/large_concept_model (Model operates at the sentence concept level, using SONAR sentence embeddings.)
- Dr. Ashish Bamania, Dec 2024, Meta’s Large Concept Models (LCMs) Are Here To Challenge And Redefine LLMs: A deep dive into ‘Large Concept Model’, a novel language processing architecture and evaluating its performance against state-of-the-art LLMs, https://levelup.gitconnected.com/metas-large-concept-models-lcms-are-here-to-challenge-and-redefine-llms-7f9778f88a87
- Sachin Kumar, Sep 17, 2024, Hidden Chain-of-Thought decoding: faster and efficient CoT decoding to improve reasoning of LLMs, https://medium.com/@techsachin/hidden-chain-of-thought-decoding-faster-and-efficient-cot-decoding-to-improve-reasoning-of-llms-d95584bc9346 (Token reduction in CoT by compressing language tokens into an internal "hidden" concise token representation.)
- Tianqiao Liu, Zui Chen, Zitao Liu, Mi Tian, Weiqi Luo, 13 Sep 2024, Expediting and Elevating Large Language Model Reasoning via Hidden Chain-of-Thought Decoding, https://arxiv.org/abs/2409.08561
- Lance Eliot, Dec 18, 2024, Chain Of Continuous Thought Promises Mighty Boost For LLMs And Generative AI By Blowing Up The Fixation On Tokens, https://www.forbes.com/sites/lanceeliot/2024/12/18/chain-of-continuous-thought-promises-mighty-boost-for-llms-and-generative-ai-by-blowing-up-the-fixation-on-tokens/
- Kyle Orland, 13 Dec 2024, Are LLMs capable of non-verbal reasoning? Processing in the "latent space" could help AI with tricky logical questions, https://arstechnica.com/ai/2024/12/are-llms-capable-of-non-verbal-reasoning/
- Alex McFarland, December 16, 2024, Meta’s COCONUT: The AI Method That Thinks Without Language, https://www.unite.ai/metas-coconut-the-ai-method-that-thinks-without-language/
- Maxime Peyrard, Martin Josifoski, Robert West, 21 Mar 2024, The Era of Semantic Decoding, https://arxiv.org/abs/2403.14562
- Hussain Ahmad, Diksha Goel, 8 Jan 2025, The Future of AI: Exploring the Potential of Large Concept Models, https://arxiv.org/abs/2501.05487
- Giuliano Liguori, Jan 2025, Large Concept Models (LCM): A New Frontier in AI Beyond Token-Level Language Models, https://www.linkedin.com/pulse/large-concept-models-lcm-new-frontier-ai-beyond-giuliano-liguori--dnj3f/
- Hanyu Zhang, Xiting Wang, Chengao Li, Xiang Ao, Qing He, 10 Jan 2025, Controlling Large Language Models Through Concept Activation Vectors, https://arxiv.org/abs/2501.05764 (Training a vector used to control the model on certain attributes.)
- Deqian Kong, Minglu Zhao, Dehong Xu, Bo Pang, Shu Wang, Edouardo Honig, Zhangzhang Si, Chuan Li, Jianwen Xie, Sirui Xie, Ying Nian Wu, 3 Feb 2025, Scalable Language Models with Posterior Inference of Latent Thought Vectors, https://arxiv.org/abs/2502.01567
- Jonas Geiping, Sean McLeish, Neel Jain, John Kirchenbauer, Siddharth Singh, Brian R. Bartoldson, Bhavya Kailkhura, Abhinav Bhatele, Tom Goldstein, 7 Feb 2025, Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach, https://arxiv.org/abs/2502.05171
- DiJia Su, Hanlin Zhu, Yingchen Xu, Jiantao Jiao, Yuandong Tian, Qinqing Zheng, 5 Feb 2025. Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning, https://arxiv.org/abs/2502.03275
- Jihoon Tack, Jack Lanchantin, Jane Yu, Andrew Cohen, Ilia Kulikov, Janice Lan, Shibo Hao, Yuandong Tian, Jason Weston, Xian Li, 12 Feb 2025, LLM Pretraining with Continuous Concepts, https://arxiv.org/abs/2502.08524
- Vishal Rajput, Feb 2025, Forget LLMs, It’s Time For Large Concept Models (LCMs), https://medium.com/aiguys/forget-llms-its-time-for-large-concept-models-lcms-05b75fe43185
- Towards Practical Concept-Based Language Models: An Efficiency-Focused Implementation Vivek K. Tiwari, 2025, https://www.researchgate.net/profile/Vivek-Tiwari-41/publication/388753941_Towards_Practical_Concept-Based_Language_Models_An_Efficiency-Focused_Implementation/links/67a4bf86461fb56424cc6b62/Towards-Practical-Concept-Based-Language-Models-An-Efficiency-Focused-Implementation.pdf
- Datacamp, Feb 21, 2025, Large Concept Models: A Guide With Examples: Learn what large concept models are, how they differ from LLMs, and how their architecture leads to improvements in language processing, https://www.datacamp.com/blog/large-concept-models
- Mehul Gupta, Jan 5, 2025, Meta Large Concept Models (LCM): End of LLMs? What are LCMs and how is LCM different from LLMs, https://medium.com/data-science-in-your-pocket/meta-large-concept-models-lcm-end-of-llms-68cb0c5cd5cf
- By AI Papers Academy, 3 January 2025, Large Concept Models (LCMs) by Meta: The Era of AI After LLMs? https://aipapersacademy.com/large-concept-models/
- Andrea Viliotti, 20 Dec 2024, Large Concept Model (LCM): a new paradigm for large-scale semantic reasoning in AI, https://www.andreaviliotti.it/post/large-concept-model-lcm-a-new-paradigm-for-large-scale-semantic-reasoning-in-ai
- Leadership in AI, January, 2025, Meta’s stunning LCM large concept models for artificial intelligence — they are thinking now! https://www.youtub e.com/watch?v=u Z3HCw8ApQ,
- Lance Eliot, Jan 06, 2025, AI Is Breaking Free Of Token-Based LLMs By Upping The Ante To Large Concept Models That Devour Sentences And Adore Concepts, https://www.forbes.com/sites/lanceeliot/2025/01/06/ai-is-breaking-free-of-token-based-llms-by-upping-the-ante-to-large-concept-models-that-devour-sentences-and-adore-concepts/
- Zen the innovator, Jan 5, 2025, Large Concept Models (LCMs), https://medium.com/@ThisIsMeIn360VR/large-concept-models-lcms-d59b86531ef6
- Debabrata Pruseth, Jan 2025, LCMs: Large Concept Models – The Path to AGI ( Artificial General Intelligence) & The Future of AI Thinking, https://debabratapruseth.com/lcms-large-concept-models-the-path-to-agi-the-future-of-ai-thinking/
- Asif Razzaq, December 15, 2024, Meta AI Proposes Large Concept Models (LCMs): A Semantic Leap Beyond Token-based Language Modeling, https://www.marktechpost.com/2024/12/15/meta-ai-proposes-large-concept-models-lcms-a-semantic-leap-beyond-token-based-language-modeling/
- Aniket Hingane, Dec 27, 2024, Practical Advancements in AI: How Large Concept Models Are Redefining the Landscape of LLMs, https://medium.com/@learn-simplified/practical-advancements-in-ai-how-large-concept-models-are-redefining-the-landscape-of-llms-b0220296458b
- Siddhant Rai and Vizuara AI, Dec 30, 2024, Large Concept models : Language Modeling in a Sentence Representation Space: Re-imagining the core principles behind representation generation in foundation model, https://vizuara.substack.com/p/large-concept-models-language-modeling?
Reasoning and CoT Efficiency Topics
Blog articles on reasoning efficiency:
More research information on general efficiency optimization techniques for reasoning models:
- Reasoning inference optimization (RIO)
- Chain-of-Thought (CoT) optimization
- Small Reasoning Models (SRMs)
- Adaptive Inference Time Compute
- Hybrid Reasoning Models
- Reasoning Tokens
Efficiency optimizations to Chain-of-Thought include:
- Hidden Token Chain-of-Thought (HCot)
- Continuous Chain-of-Thought (Coconut)
- Chain of Draft (CoD)
- CoT Reasoning Decoding
- Concise Chain-of-Thought
- CoT Token Reduction
- CoT Step Skipping
- CoT Early Stopping
- CoT Path Reduction
- Constrained Chain-of-Thought
More AI Research
Read more about: