Aussie AI

LLM Reasoning Research

  • Last Updated 30 August, 2025
  • by David Spuler, Ph.D.

Reasoning is a key part of intelligence, and much work is ongoing to improve higher-level reasoning of AI models. Examples include solving mathematical problems or performing multi-step planning such as booking a holiday.

There are two main categories of methods to improve reasoning ability:

  • Training methods ("white box reasoning")
  • Multi-step inference methods ("black box reasoning")

You may also be interested in our recent research and blog articles:

Training-Based Reasoning

White Box Reasoning is the training of the weights internal to an LLM so that it performs better on reasoning tasks. Historically, the first idea to create smarter models was always to train an LLM using better data and better techniques. This has improved raw results on "reasoning" and "generalization" tasks.

Lately, this has given rise to the Large Reasoner Model (LRM) architectures, in two main types. There are the trained reasoning models that still give an answer in one step, and there are the multi-step inference models that use multiple steps and "test time compute" to give better answers to complex questions.

The single-shot inference types of reasoning models do rely on prompt engineering to get the LLM to do its reasoning steps. Many of the basic prmpt engineering ideas are applicable here:

  • Basic step prompting ("Let's think step by step")
  • Emotional prompting
  • Roles/personas
  • CoT prompting
  • Zero-shot CoT prompting
  • Echo prompting ("Let's repeat the question")
  • Self-consistency
  • Self-ask (followup questions)
  • Exemplars (In-Content Learning)

The major LRMs are using more advanced meta-prompts for reasoning, for either single-step or multi-step reasoning, but these prompts are commercially sensitive and not usually available. Interestingly, the meta-prompt for the single-step DeepSeek R1 reasoning model was disclosed in their paper (https://arxiv.org/abs/2501.12948):

    A conversation between User and Assistant. The user asks a question, and the Assistant solves it.
    The assistant first thinks about the reasoning process in the mind and then provides the user
    with the answer. The reasoning process and answer are enclosed within <think> </think> and
    <answer> </answer> tags, respectively, i.e., <think> reasoning process here </think>
    <answer> answer here </answer>. User: PROMPT. Assistant:

Fine-tuning on a more specialized subset of relevant data is a particular submethod of this area. There has been much improvement in this area, in both the capabilities of high-end large SOTA models and also at the other end of the spectrum with Small Language Models (SLMs). See more about training methods, but note that there hasn't yet been much research about fine-tuning of reasoning capabilities.

Inference-Based Reasoning

Black Box Reasoning is the use of multiple steps of inference, wrapped around an LLM. The second idea is to treat the LLM as a "black box" and try to use more LLM calls to improve its reasoning abilities. These are called "few-shot" or "many-shot" or "multi-step" reasoning methods.

Chain-of-thought is the best known of these methods, having been adopted by OpenAI for the "o1" models released in September, 2024. However, multi-step reasoning is a longstanding area of research, with much overlap with prompt engineering techniques. There are numerous methods of doing this type of multiple calls to LLMs in the literature:

  • Chain-of-thought (CoT)
  • Self-reflection
  • Skeleton-of-thought
  • Best-of-N (BoN) method
  • Majority voting
  • Self-consistency decoding
  • Programmatic prompting
  • Tree-of-Thoughts (ToT) prompting
  • Chain-of-Symbols (CoS) prompting
  • Graph-of-Thoughts (GoT)
  • Algorithm-of-Thoughts (AoT)
  • Buffer of Thoughts
  • Least-to-Most prompting
  • Chain-of-Table prompting
  • Thread-of-Thought (ThoT) prompting
  • System 2 Attention (S2A) prompting
  • Chain-of-Verification (CoVe) prompting
  • ReAct prompting (reason-and-act)
  • Rephrase-and-Respond (RaR) prompting
  • Chain-of-Knowledge (CoK) prompting
  • Contrastive Chain-of-Thought (CCoT) prompting
  • Program of Thoughts (PoT) prompting
  • Structured Chain-of-Thought (SCoT) prompting
  • Chain-of-Code (CoC) prompting
  • Take a Step Back prompting

Also related to these areas are the various other ways to have the LLM give a "better" answer, even if it's not really using improved reasoning. The simplest ideas include prompt engineering techniques to give the LLM a better query, RAG architectures and Retrieval Augmented Language Models (RALM) to give an LLM more relevant source data, and also dynamic tool usage integrations to generalize the LLM's capabilities to handle answers that require computations. Also relevant is the research on improving answers by fixing specific LLM limitations such as hallucinations, mathematical problem solving difficulties, and language wordplay (in)abilities.

Long Answers versus Multiple Inference Steps

One of the nuances in the distinction between zero-shot reasoner models and multiple steps of inference is the simplest of ideas: output longer answers. Large Reasoner Models with a single-step architecture, such as DeepSeek R1, mimic the steps of reasoning by repeatedly extending the answers with re-phrased reasoning steps about the problem. This is analogous to multi-step inference reasoning, but the model is "talking to itself" about how to reason through the problem, all in one step of inference.

In effect, the sequence of multiple outputs in chained multi-step reasoning is merged into a single output stream of text. The model is deciding whether or not another step is required as part of the normal decoding phase. The output from these types of single-step reasoner models is a readable sequence showing how the model thought through a problem. Hence, the output to achieve a final answer can be a very long token sequence, which can be costly, and it's important to not restrict the "max tokens" settings in these cases.

Inference costs are obviously higher for producing an extended answer with many of the intermediate thoughts written to the answer. However, the number of tokens in multi-step inference is also high. Whether a single-inference model's long answer will be more or less tokens than a multi-step implementation of Chain-of-Thought is not really clear (need some papers!), but the reasoning ability is high for either approach.

Survey Papers on LLM Reasoning

Survey and review papers on reasoning:

  • Xiangjue Dong, Maria Teleki, James Caverlee, 18 Dec 2024, A Survey on LLM Inference-Time Self-Improvement, https://arxiv.org/abs/2412.14352 https://github.com/dongxiangjue/Awesome-LLM-Self-Improvement (Broad survey of reasoning improvement methods from multi-step inference to RALM to decoding algorithms.)
  • Aske Plaat, Annie Wong, Suzan Verberne, Joost Broekens, Niki van Stein, Thomas Back, 16 Jul 2024, Reasoning with Large Language Models, a Survey, https://arxiv.org/abs/2407.11511
  • Alhassan Mumuni, Fuseini Mumuni, 6 Jan 2025, Large language models for artificial general intelligence (AGI): A survey of foundational principles and approaches, https://arxiv.org/abs/2501.03151
  • Yixin Ji, Juntao Li, Hai Ye, Kaixin Wu, Jia Xu, Linjian Mo, Min Zhang, 5 Jan 2025, Test-time Computing: from System-1 Thinking to System-2 Thinking, https://arxiv.org/abs/2501.02497
  • Violet Xiang, Charlie Snell, Kanishk Gandhi, Alon Albalak, Anikait Singh, Chase Blagden, Duy Phung, Rafael Rafailov, Nathan Lile, Dakota Mahan, Louis Castricato, Jan-Philipp Franken, Nick Haber, Chelsea Finn, 8 Jan 2025, Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought, https://arxiv.org/abs/2501.04682
  • Fengli Xu, Qianyue Hao, Zefang Zong, Jingwei Wang, Yunke Zhang, Jingyi Wang, Xiaochong Lan, Jiahui Gong, Tianjian Ouyang, Fanjin Meng, Chenyang Shao, Yuwei Yan, Qinglong Yang, Yiwen Song, Sijian Ren, Xinyuan Hu, Yu Li, Jie Feng, Chen Gao, Yong Li, 17 Jan 2025 (v2), Towards Large Reasoning Models: A Survey on Scaling LLM Reasoning Capabilities, https://arxiv.org/abs/2501.09686
  • Jie Huang and Kevin Chen-Chuan Chang. July 2023. Towards Reasoning in Large Language Models: A Survey. In Findings of the Association for Computational Linguistics: ACL 2023, pages 1049–1065, Toronto, Canada. Association for Computational Linguistics. https://aclanthology.org/2023.findings-acl.67/
  • Seungpil Lee, Woochang Sim, Donghyeon Shin, Wongyu Seo, Jiwon Park, Seokki Lee, Sanha Hwang, Sejin Kim, and Sundong Kim. Jan 2025. Reasoning Abilities of Large Language Models: In-Depth Analysis on the Abstraction and Reasoning Corpus. ACM Trans. Intell. Syst. Technol. https://doi.org/10.1145/3712701 https://dl.acm.org/doi/10.1145/3712701 https://dl.acm.org/doi/pdf/10.1145/3712701
  • Maciej Besta, Julia Barth, Eric Schreiber, Ales Kubicek, Afonso Catarino, Robert Gerstenberger, Piotr Nyczyk, Patrick Iff, Yueling Li, Sam Houliston, Tomasz Sternal, Marcin Copik, Grzegorz Kwaśniewski, Jürgen Müller, Łukasz Flis, Hannes Eberhard, Hubert Niewiadomski, Torsten Hoefler, 23 Jan 2025 (v3), Reasoning Language Models: A Blueprint, https://arxiv.org/abs/2501.11223 (Survey and blueprint for how to build a Large Reasoning Model.)
  • Mohit Sewak, Ph.D., January 29, 2025, Achieving General Intelligence (AGI) and Super Intelligence (ASI): Pathways, Uncertainties, and Ethical Concerns, https://towardsai.net/p/l/achieving-general-intelligence-agi-and-super-intelligence-asi-pathways-uncertainties-and-ethical-concerns
  • Avinash Patil, 5 Feb 2025, Advancing Reasoning in Large Language Models: Promising Methods and Approaches, https://arxiv.org/abs/2502.03671
  • Hieu Minh "Jord" Nguyen, 10 Feb 2025, A Survey of Theory of Mind in Large Language Models: Evaluations, Representations, and Safety Risks, https://arxiv.org/abs/2502.06470
  • Hanmeng Liu, Zhizhang Fu, Mengru Ding, Ruoxi Ning, Chaoli Zhang, Xiaozhang Liu, Yue Zhang, 13 Feb 2025, Logical Reasoning in Large Language Models: A Survey, https://arxiv.org/abs/2502.09100
  • Fengxiang Cheng, Haoxuan Li, Fenrong Liu, Robert van Rooij, Kun Zhang, Zhouchen Lin, 24 Feb 2025 (v2), Empowering LLMs with Logical Reasoning: A Comprehensive Survey, https://arxiv.org/abs/2502.15652
  • Cameron R. Wolfe, Feb 18, 2025, Demystifying Reasoning Models: Understanding reasoning models and their relation to standard LLMs... https://cameronrwolfe.substack.com/p/demystifying-reasoning-models
  • Zhong-Zhi Li, Duzhen Zhang, Ming-Liang Zhang, Jiaxin Zhang, Zengyan Liu, Yuxuan Yao, Haotian Xu, Junhao Zheng, Pei-Jie Wang, Xiuyi Chen, Yingying Zhang, Fei Yin, Jiahua Dong, Zhijiang Guo, Le Song, Cheng-Lin Liu, 25 Feb 2025 (v2), From System 1 to System 2: A Survey of Reasoning Large Language Models, https://arxiv.org/abs/2502.17419
  • Komal Kumar, Tajamul Ashraf, Omkar Thawakar, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, Phillip H.S. Torr, Salman Khan, Fahad Shahbaz Khan, 28 Feb 2025, LLM Post-Training: A Deep Dive into Reasoning Large Language Models, https://arxiv.org/abs/2502.21321 https://github.com/mbzuai-oryx/Awesome-LLM-Post-training
  • Guiyao Tie, Zeli Zhao, Dingjie Song, Fuyang Wei, Rong Zhou, Yurou Dai, Wen Yin, Zhejian Yang, Jiangyue Yan, Yao Su, Zhenhan Dai, Yifeng Xie, Yihan Cao, Lichao Sun, Pan Zhou, Lifang He, Hechang Chen, Yu Zhang, Qingsong Wen, Tianming Liu, Neil Zhenqiang Gong, Jiliang Tang, Caiming Xiong, Heng Ji, Philip S. Yu, Jianfeng Gao, 8 Mar 2025, A Survey on Post-training of Large Language Models, https://arxiv.org/abs/2503.06072
  • Qiguang Chen, Libo Qin, Jinhao Liu, Dengyun Peng, Jiannan Guan, Peng Wang, Mengkang Hu, Yuhang Zhou, Te Gao, Wanxiang Che, 13 Mar 2025 (v2), Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models, https://arxiv.org/abs/2503.09567 (Massive and broad survey of all types of reasoning.)
  • Yaoting Wang, Shengqiong Wu, Yuecheng Zhang, William Wang, Ziwei Liu, Jiebo Luo, Hao Fei, 16 Mar 2025, Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey, https://arxiv.org/abs/2503.12605
  • Dibyanayan Bandyopadhyay, Soham Bhattacharjee, Asif Ekbal, 13 Mar 2025, Thinking Machines: A Survey of LLM based Reasoning Strategies, https://arxiv.org/abs/2503.10814
  • Xiaoye Qu, Yafu Li, Zhaochen Su, Weigao Sun, Jianhao Yan, Dongrui Liu, Ganqu Cui, Daizong Liu, Shuxian Liang, Junxian He, Peng Li, Wei Wei, Jing Shao, Chaochao Lu, Yue Zhang, Xian-Sheng Hua, Bowen Zhou, Yu Cheng, 27 Mar 2025, A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond, https://arxiv.org/abs/2503.21614
  • Ali Forootani, 22 Mar 2025, A Survey on Mathematical Reasoning and Optimization with Large Language Models, https://arxiv.org/abs/2503.17726
  • Qianjun Pan, Wenkai Ji, Yuyang Ding, Junsong Li, Shilian Chen, Junyi Wang, Jie Zhou, Qin Chen, Min Zhang, Yulan Wu, Liang He, 8 May 2025 (v2), A Survey of Slow Thinking-based Reasoning LLMs using Reinforced Learning and Inference-time Scaling Law, https://arxiv.org/abs/2505.02665
  • Zixuan Ke, Fangkai Jiao, Yifei Ming, Xuan-Phi Nguyen, Austin Xu, Do Xuan Long, Minzhi Li, Chengwei Qin, Peifeng Wang, Silvio Savarese, Caiming Xiong, Shafiq Joty, 5 Aug 2025 (v3), A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems, https://arxiv.org/abs/2504.09037

Reasoning Theory

Papers about the deeper theory of what "reasoning" means:

  • Eghbal Hosseini, Colton Casto, Noga Zaslavsky, Colin Conwell, Mark Richardson, Evelina Fedorenko, Dec 2024, Universality of representation in biological and artificial neural networks, bioRxiv 2024.12.26.629294; doi: https://doi.org/10.1101/2024.12.26.629294 https://www.biorxiv.org/content/10.1101/2024.12.26.629294
  • Kuang-Huei Lee, Ian Fischer, Yueh-Hua Wu, Dave Marwood, Shumeet Baluja, Dale Schuurmans, Xinyun Chen, 17 Jan 2025, Evolving Deeper LLM Thinking, https://arxiv.org/abs/2501.09891 (An alternative search strategy broad/deep, compared to CoT and reflection.)
  • G Bao, H Zhang, C Wang, L Yang, Y Zhang, Jan 2025, How Likely Do LLMs with CoT Mimic Human Reasoning? Proceedings of the 31st International Conference on Computational Linguistics, pages 7831–7850, January 19–24, 2025, https://aclanthology.org/2025.coling-main.524.pdf
  • Santosh Kumar Radha, Oktay Goktas, 23 Jan 2025, On the Reasoning Capacity of AI Models and How to Quantify It, https://arxiv.org/abs/2501.13833
  • Alireza Amiri, Xinting Huang, Mark Rofin, Michael Hahn, 4 Feb 2025, Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers, https://arxiv.org/abs/2502.02393
  • Ahmed El-Kishky, Alexander Wei, Andre Saraiva, Borys Minaev, Daniel Selsam, David Dohan, Francis Song, Hunter Lightman, Ignasi Clavera, Jakub Pachocki, Jerry Tworek, Lorenz Kuhn, Lukasz Kaiser, Mark Chen, Max Schwarzer, Mostafa Rohaninejad, Nat McAleese, o3 contributors, Oleg Mürk, Rhythm Garg, Rui Shu, Szymon Sidor, Vineet Kosaraju, Wenda Zhou, 3 Feb 2025, Competitive Programming with Large Reasoning Models, https://arxiv.org/abs/2502.06807 (OpenAI's paper on o3 that has similar conclusions to what DeepSeek showed about Reinforcement Learning for reasoning models, namely that "scaling general-purpose reinforcement learning" still works.)
  • Xinhao Yao, Ruifeng Ren, Yun Liao, Yong Liu, 7 Feb 2025, Unveiling the Mechanisms of Explicit CoT Training: How Chain-of-Thought Enhances Reasoning Generalization, https://arxiv.org/abs/2502.04667
  • Hanmeng Liu, Zhizhang Fu, Mengru Ding, Ruoxi Ning, Chaoli Zhang, Xiaozhang Liu, Yue Zhang, 13 Feb 2025, Logical Reasoning in Large Language Models: A Survey, https://arxiv.org/abs/2502.09100
  • Kechen Li, Wenqi Zhu, Coralia Cartis, Tianbo Ji, Shiwei Liu, 27 Feb 2025, SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers, https://arxiv.org/abs/2502.20545
  • Yijiong Yu, 16 Jan 2025 (v4), Do LLMs Really Think Step-by-step In Implicit Reasoning? https://arxiv.org/abs/2411.15862 https://github.com/yuyijiong/if_step_by_step_implicit_CoT
  • Marius Jahrens, Thomas Martinetz, 12 Mar 2025, Why LLMs Cannot Think and How to Fix It, https://arxiv.org/abs/2503.09211
  • Pengcheng Wen, Jiaming Ji, Chi-Min Chan, Juntao Dai, Donghai Hong, Yaodong Yang, Sirui Han, Yike Guo, 17 Mar 2025, ThinkPatterns-21k: A Systematic Study on the Impact of Thinking Patterns in LLMs, https://arxiv.org/abs/2503.12918
  • Dibyanayan Bandyopadhyay, Soham Bhattacharjee, Asif Ekbal, 13 Mar 2025, Thinking Machines: A Survey of LLM based Reasoning Strategies, https://arxiv.org/abs/2503.10814
  • Jiaran Ye, Zijun Yao, Zhidian Huang, Liangming Pan, Jinxin Liu, Yushi Bai, Amy Xin, Liu Weichuan, Xiaoyin Che, Lei Hou, Juanzi Li, 29 May 2025, How does Transformer Learn Implicit Reasoning? https://arxiv.org/abs/2505.23653
  • Róbert Csordás, Christopher D. Manning, Christopher Potts, 30 May 2025 (v2), Do Language Models Use Their Depth Efficiently? https://arxiv.org/abs/2505.13898

Reasoning Model Evaluation

Papers about testing LLMs (and overall systems) for their reasoning abilities:

Large Reasoning Models (LRMs)

Large Reasoning Models (LRMs) are large-scale LLMs that have been trained on advanced reasoning capabilities. Their architecture may be training-only, but increasingly the architectures include multi-step inference or "test time compute" reasoning capabilities such as Chain-of-Thought.

Papers on large reasoning models:

  • Ignacio de Gregorio, Dec 2024, Uncovering OpenAI’s Frontier AI Strategy, https://medium.com/@ignacio.de.gregorio.noblejas/uncovering-openais-frontier-ai-strategy-a02e0aa5320e
  • Xiaoxi Li, Guanting Dong, Jiajie Jin, Yuyao Zhang, Yujia Zhou, Yutao Zhu, Peitian Zhang, Zhicheng Dou, 9 Jan 2025, Search-o1: Agentic Search-Enhanced Large Reasoning Models, https://arxiv.org/abs/2501.05366 https://github.com/sunnynexus/Search-o1 (RAG retrieval and agentic methods applied to Large Reasoning Models.)
  • Fengli Xu, Qianyue Hao, Zefang Zong, Jingwei Wang, Yunke Zhang, Jingyi Wang, Xiaochong Lan, Jiahui Gong, Tianjian Ouyang, Fanjin Meng, Chenyang Shao, Yuwei Yan, Qinglong Yang, Yiwen Song, Sijian Ren, Xinyuan Hu, Yu Li, Jie Feng, Chen Gao, Yong Li, 17 Jan 2025 (v2), Towards Large Reasoning Models: A Survey on Scaling LLM Reasoning Capabilities, https://arxiv.org/abs/2501.09686
  • OpenAI, September 12, 2024 Learning to reason with LLMs. We are introducing OpenAI o1, a new large language model trained with reinforcement learning to perform complex reasoning. o1 thinks before it answers—it can produce a long internal chain of thought before responding to the user. https://openai.com/index/learning-to-reason-with-llms/
  • Aske Plaat, Annie Wong, Suzan Verberne, Joost Broekens, Niki van Stein, Thomas Back, 16 Jul 2024, Reasoning with Large Language Models, a Survey, https://arxiv.org/abs/2407.11511
  • Jie Huang and Kevin Chen-Chuan Chang. July 2023. Towards Reasoning in Large Language Models: A Survey. In Findings of the Association for Computational Linguistics: ACL 2023, pages 1049–1065, Toronto, Canada. Association for Computational Linguistics. https://aclanthology.org/2023.findings-acl.67/
  • Seungpil Lee, Woochang Sim, Donghyeon Shin, Wongyu Seo, Jiwon Park, Seokki Lee, Sanha Hwang, Sejin Kim, and Sundong Kim. Jan 2025. Reasoning Abilities of Large Language Models: In-Depth Analysis on the Abstraction and Reasoning Corpus. ACM Trans. Intell. Syst. Technol. https://doi.org/10.1145/3712701 https://dl.acm.org/doi/10.1145/3712701 https://dl.acm.org/doi/pdf/10.1145/3712701
  • Demis Hassabis, Jan 2025, X post: Announcing Gemini 2.0 Flash https://x.com/demishassabis/status/1881844417746632910 (Gemini 2.0 Flash from Google is a Large Reasoning Model with a 1M ultra-long context.)
  • Maciej Besta, Julia Barth, Eric Schreiber, Ales Kubicek, Afonso Catarino, Robert Gerstenberger, Piotr Nyczyk, Patrick Iff, Yueling Li, Sam Houliston, Tomasz Sternal, Marcin Copik, Grzegorz Kwaśniewski, Jürgen Müller, Łukasz Flis, Hannes Eberhard, Hubert Niewiadomski, Torsten Hoefler, 23 Jan 2025 (v3), Reasoning Language Models: A Blueprint, https://arxiv.org/abs/2501.11223 (Survey and blueprint for how to build a Large Reasoning Model.)
  • Alberto Romero, Jan 2025, DeepSeek, a little-known Chinese startup, released R1 yesterday, https://substack.com/@thealgorithmicbridge/note/c-87664591-
  • DeepSeek-AI, Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, Xiaokang Zhang, Xingkai Yu, Yu Wu, Z.F. Wu, Zhibin Gou, Zhihong Shao, Zhuoshu Li, Ziyi Gao, Aixin Liu, Bing Xue, Bingxuan Wang, Bochao Wu, Bei Feng, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, Damai Dai, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fucong Dai, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Han Bao, Hanwei Xu, Haocheng Wang, Honghui Ding, Huajian Xin, Huazuo Gao, Hui Qu, Hui Li, Jianzhong Guo, et al. (100+ additional authors not shown), 22 Jan 2025, DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning, https://arxiv.org/abs/2501.12948 (The DeepSeek R1 large reasoning model.)
  • G Wang, S Zhang, T Zhan, Z Shen, J Li, X Hu, X Sun, Jan 2025, Unlocking the Mysteries of OpenAI o1: A Survey of the Reasoning Abilities of Large Language Models, https://openreview.net/pdf?id=J0ADLa2rNp
  • Ben Dickson, January 31, 2025, Beyond benchmarks: How DeepSeek-R1 and o1 perform on real-world tasks, https://venturebeat.com/ai/beyond-benchmarks-how-deepseek-r1-and-o1-perform-on-real-world-tasks/
  • Deqian Kong, Minglu Zhao, Dehong Xu, Bo Pang, Shu Wang, Edouardo Honig, Zhangzhang Si, Chuan Li, Jianwen Xie, Sirui Xie, Ying Nian Wu, 3 Feb 2025, Scalable Language Models with Posterior Inference of Latent Thought Vectors, https://arxiv.org/abs/2502.01567
  • Ahmed El-Kishky, Alexander Wei, Andre Saraiva, Borys Minaev, Daniel Selsam, David Dohan, Francis Song, Hunter Lightman, Ignasi Clavera, Jakub Pachocki, Jerry Tworek, Lorenz Kuhn, Lukasz Kaiser, Mark Chen, Max Schwarzer, Mostafa Rohaninejad, Nat McAleese, o3 contributors, Oleg Mürk, Rhythm Garg, Rui Shu, Szymon Sidor, Vineet Kosaraju, Wenda Zhou, 3 Feb 2025, Competitive Programming with Large Reasoning Models, https://arxiv.org/abs/2502.06807 (OpenAI's paper on o3 that has similar conclusions to what DeepSeek showed about Reinforcement Learning for reasoning models, namely that "scaling general-purpose reinforcement learning" still works.)
  • DiJia Su, Hanlin Zhu, Yingchen Xu, Jiantao Jiao, Yuandong Tian, Qinqing Zheng, 5 Feb 2025. Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning, https://arxiv.org/abs/2502.03275
  • Cameron R. Wolfe, Feb 18, 2025, Demystifying Reasoning Models: Understanding reasoning models and their relation to standard LLMs... https://cameronrwolfe.substack.com/p/demystifying-reasoning-models
  • Jeremy Kahn, February 28, 2025, OpenAI launches long-awaited GPT-4.5 — but ‘Orion’s’ capabilities already lag competitors, https://fortune.com/2025/02/27/openai-gpt-4-5-orion-launch-sam-altman-benchmarks/
  • Komal Kumar, Tajamul Ashraf, Omkar Thawakar, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, Phillip H.S. Torr, Salman Khan, Fahad Shahbaz Khan, 28 Feb 2025, LLM Post-Training: A Deep Dive into Reasoning Large Language Models, https://arxiv.org/abs/2502.21321 https://github.com/mbzuai-oryx/Awesome-LLM-Post-training
  • Asif Razzaq, March 5, 2025, Qwen Releases QwQ-32B: A 32B Reasoning Model that Achieves Significantly Enhanced Performance in Downstream Task, https://www.marktechpost.com/2025/03/05/qwen-releases-qwq-32b-a-32b-reasoning-model-that-achieves-significantly-enhanced-performance-in-downstream-task/ (Features 32B parameters, 32K context length, 64 layers, RoPE, SwiGLU, RMSNorm, and attention enhancements.)
  • Parshin Shojaee, Maxwell Horton, Iman Mirzadeh, Samy Bengio, Keivan Alizadeh, June 2025, The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity, Apple, https://machinelearning.apple.com/research/illusion-of-thinking https://ml-site.cdn-apple.com/papers/the-illusion-of-thinking.pdf
  • Dr. Ashish Bamania, June 2025, Apple’s New Research Shows That LLM Reasoning Is Completely Broken: A deep dive into Apple research that exposes the flawed thinking process in state-of-the-art Reasoning LLMs, https://ai.gopubby.com/apples-new-research-shows-that-llm-reasoning-is-completely-broken-47b5be71a06a
  • Ryan Browne, Jun 10 2025, Microsoft-backed AI lab Mistral is launching its first reasoning model in challenge to OpenAI, https://www.cnbc.com/2025/06/10/microsoft-backed-ai-lab-mistral-debuts-reasoning-model-to-rival-openai.html (Mistral's new LRM has multilingual reasoning.)
  • Bowen Ding, Yuhan Chen, Futing Wang, Lingfeng Ming, Tao Lin, 30 Jun 2025, Do Thinking Tokens Help or Trap? Towards More Efficient Large Reasoning Model, https://arxiv.org/abs/2506.23840
  • Bin Hong, Jiayu Liu, Zhenya Huang, Kai Zhang, Mengdi Zhang, 13 Aug 2025, Pruning Long Chain-of-Thought of Large Reasoning Models via Small-Scale Preference Optimization, https://arxiv.org/abs/2508.10164
  • Zhipeng Chen, Xiaobo Qin, Youbin Wu, Yue Ling, Qinghao Ye, Wayne Xin Zhao, Guang Shi, 14 Aug 2025, Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models, https://arxiv.org/abs/2508.10751
  • Datta Nimmaturi, Vaishnavi Bhargava, Rajat Ghosh, Johnu George, Debojyoti Dutta, 24 Jul 2025, Predictive Scaling Laws for Efficient GRPO Training of Large Reasoning Models, https://arxiv.org/abs/2507.18014
  • Kaiwen Chen, Xin Tan, Minchen Yu, Hong Xu, 29 Jul 2025, MemShare: Memory Efficient Inference for Large Reasoning Models through KV Cache Reuse, https://arxiv.org/abs/2507.21433
  • Tao He, Rongchuan Mu, Lizi Liao, Yixin Cao, Ming Liu, and Bing Qin, 31 Jul 2025, Good Learners Think Their Thinking: Generative PRM Makes Large Reasoning Model More Efficient Math Learner, https://arxiv.org/abs/2507.23317
  • Dadi Guo, Jiayu Liu, Zhiyuan Fan, Zhitao He, Haoran Li, Yumeng Wang, Yi R. Fung, 31 Jul 2025, Mathematical Proof as a Litmus Test: Revealing Failure Modes of Advanced Large Reasoning Models, https://arxiv.org/abs/2506.17114
  • Linan Yue, Yichao Du, Yizhi Wang, Weibo Gao, Fangzhou Yao, Li Wang, Ye Liu, Ziyu Xu, Qi Liu, Shimin Di, Min-Ling Zhang, 4 Aug 2025, Don't Overthink It: A Survey of Efficient R1-style Large Reasoning Models, https://arxiv.org/abs/2508.02120
  • Yule Liu, Jingyi Zheng, Zhen Sun, Zifan Peng, Wenhan Dong, Zeyang Sha, Shiwen Cui, Weiqiang Wang, Xinlei He, 4 Aug 2025, Thought Manipulation: External Thought Can Be Efficient for Large Reasoning Models, https://arxiv.org/abs/2504.13626
  • Junhong Wu, Jinliang Lu, Zixuan Ren, Ganqiang Hu, Zhi Wu, Dai Dai, Hua Wu, 5 Aug 2025, LLMs Have a Heart of Stone: Demystifying the Soft Thinking Ability of Large Reasoning Models, https://arxiv.org/abs/2508.03440
  • Yuan Xun, Xiaojun Jia, Xinwei Liu, Hua Zhang, 6 Aug 2025, The Emotional Baby Is Truly Deadly: Does your Multimodal Large Reasoning Model Have Emotional Flattery towards Humans?, https://arxiv.org/abs/2508.03986
  • Rui Ha, Chaozhuo Li, Rui Pu, Sen Su, 6 Aug 2025, From "Aha Moments" to Controllable Thinking: Toward Meta-Cognitive Reasoning in Large Reasoning Models via Decoupled Reasoning and Control, https://arxiv.org/abs/2508.04460
  • Thilo Hagendorff, Erik Derner, Nuria Oliver, 4 Aug 2025, Large Reasoning Models Are Autonomous Jailbreak Agents, https://arxiv.org/abs/2508.04039
  • Yuquan Wang, Mi Zhang, Yining Wang, Geng Hong, Xiaoyu You, Min Yang, 6 Aug 2025, ReasoningGuard: Safeguarding Large Reasoning Models with Inference-time Safety Aha Moments, https://arxiv.org/abs/2508.04204
  • Yongjiang Liu, Haoxi Li, Xiaosong Ma, Jie Zhang, Song Guo, 6 Aug 2025, Think How to Think: Mitigating Overthinking with Autonomous Difficulty Cognition in Large Reasoning Models, https://arxiv.org/abs/2507.02663
  • Youcheng Huang, Bowen Qin, Chen Huang, Duanyu Feng, Xi Yang, Wenqiang Lei, 15 Aug 2025, Beyond Solving Math Quiz: Evaluating the Ability of Large Reasoning Models to Ask for Information, https://arxiv.org/abs/2508.11252
  • Nuo Chen, Zhiyuan Hu, Qingyun Zou, Jiaying Wu, Qian Wang, Bryan Hooi, Bingsheng He, 20 Aug 2025, JudgeLRM: Large Reasoning Models as a Judge, https://arxiv.org/abs/2504.00050
  • Haonan Dong, Haoran Ye, Wenhao Zhu, Kehan Jiang, Guojie Song, 24 Aug 2025, Meta-R1: Empowering Large Reasoning Models with Metacognition, https://arxiv.org/abs/2508.17291

Open Source Reasoning

Open source reasoning projects are those that either: (a) use open-source code to implement multi-step inference-based reasoning algorithms such as Chain-of-Thought (on any underlying model), or (b) Large Reasoning Models where the model weights and architectural details have been open-sourced, such as Deepseek R3.

General Research on Intelligence

What does it mean to be smart? There are various answers to this, and it's a very nuanced question.

Research on intelligence or "smartness" of AI systems:

Chain-of-Thought (CoT) Reasoning

Research papers on chain-of-thought (CoT) for reasoning:

  • Maciej Besta, Florim Memedi, Zhenyu Zhang, Robert Gerstenberger, Guangyuan Piao, Nils Blach, Piotr Nyczyk, Marcin Copik, Grzegorz Kwaśniewski, Jürgen Müller, Lukas Gianinazzi, Ales Kubicek, Hubert Niewiadomski, Aidan O'Mahony, Onur Mutlu, Torsten Hoefler, 5 Apr 2024, Demystifying Chains, Trees, and Graphs of Thoughts, https://arxiv.org/abs/2401.14295 http://htor.ethz.ch/publications/img/besta-topologies.pdf
  • Jacob Pfau, William Merrill, Samuel R. Bowman, 24 Apr 2024, Let's Think Dot by Dot: Hidden Computation in Transformer Language Models, https://arxiv.org/abs/2404.15758
  • Hongxuan Zhang, Zhining Liu, Jiaqi Zheng, Chenyi Zhuang, Jinjie Gu, Guihai Chen, Nov 2023, Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads to Answers Faster, https://arxiv.org/abs/2311.08263
  • Hunter Lightman, Vineet Kosaraju, Yura Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, Karl Cobbe, May 2023, Let's Verify Step by Step, https://arxiv.org/abs/2305.20050
  • Xuan Zhang, Chao Du, Tianyu Pang, Qian Liu, Wei Gao, Min Lin, 13 Jun 2024, Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs, https://arxiv.org/abs/2406.09136 Code: https://github.com/sail-sg/CPO
  • kipply's blog, 2023-03-30, Transformer Taxonomy (the last lit review), https://kipp.ly/transformer-taxonomy/ (Papers for all the Transformer architectures and milestone papers for the major optimization improvements on them.)
  • Daniel Lopes, June 21, 2024, A Comprehensive Guide to Text Prompt Engineering Techniques, https://journal.daniellopes.dev/p/practical-prompt-engineering-notes
  • Wenxiao Wang, Wei Chen, Yicong Luo, Yongliu Long, Zhengkai Lin, Liye Zhang, Binbin Lin, Deng Cai, Xiaofei He, 15 Feb 2024, Model Compression and Efficient Inference for Large Language Models: A Survey, https://arxiv.org/abs/2402.09748
  • Hao Zhou, Chengming Hu, Ye Yuan, Yufei Cui, Yili Jin, Can Chen, Haolun Wu, Dun Yuan, Li Jiang, Di Wu, Xue Liu, Charlie Zhang, Xianbin Wang, Jiangchuan Liu, 17 May 2024, Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities, https://arxiv.org/abs/2405.10825
  • Yu Wang, Shiwan Zhao, Zhihu Wang, Heyuan Huang, Ming Fan, Yubo Zhang, Zhixing Wang, Haijun Wang, Ting Liu, 5 Sep 2024, Strategic Chain-of-Thought: Guiding Accurate Reasoning in LLMs through Strategy Elicitation, https://arxiv.org/abs/2409.03271
  • Asankhaya Sharma (codelion), Sep 2024, Optillm: Optimizing inference proxy for LLMs, https://github.com/codelion/optillm
  • Ziqi Jin, Wei Lu, 6 Sep 2024, Self-Harmonized Chain of Thought, https://arxiv.org/abs/2409.04057
  • Pranab Sahoo, Ayush Kumar Singh, Sriparna Saha, Vinija Jain, Samrat Mondal, Aman Chadha, 5 Feb 2024, A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications, https://arxiv.org/abs/2402.07927
  • Shizhe Diao, Pengcheng Wang, Yong Lin, Rui Pan, Xiang Liu, Tong Zhang, 21 Jul 2024 (v5), Active Prompting with Chain-of-Thought for Large Language Models, https://arxiv.org/abs/2302.12246 https://github.com/shizhediao/active-prompt
  • Zhuosheng Zhang, Aston Zhang, Mu Li, Alex Smola, 7 Oct 2022, Automatic Chain of Thought Prompting in Large Language Models, https://arxiv.org/abs/2210.03493 https://github.com/amazon-research/auto-cot
  • Louis Bouchard, Sep 12, 2024, OpenAI's o1 Model: The Future of Reasoning AI? What Sets It Apart, How OpenAI's o1 Model Thinks Through Problems (And Why It's Slower), https://www.louisbouchard.ai/openai-o1/
  • OpenAI, September 12, 2024, Learning to Reason with LLMs, https://openai.com/index/learning-to-reason-with-llms/
  • Emilia David, September 12, 2024, How to prompt on OpenAI’s new o1 models, https://venturebeat.com/ai/how-to-prompt-on-openai-o1/ (Prompt engineering is different for o1, such as "don't use chain of thought.")
  • Du Phan, Matthew D. Hoffman, David Dohan, Sholto Douglas, Tuan Anh Le, Aaron Parisi, Pavel Sountsov, Charles Sutton, Sharad Vikram, Rif A. Saurous, 28 Nov 2023, Training Chain-of-Thought via Latent-Variable Inference, https://arxiv.org/abs/2312.02179
  • Trung Quoc Luong, Xinbo Zhang, Zhanming Jie, Peng Sun, Xiaoran Jin, Hang Li, 27 Jun 2024 (v2), ReFT: Reasoning with Reinforced Fine-Tuning, https://arxiv.org/abs/2401.08967
  • Tianqiao Liu, Zui Chen, Zitao Liu, Mi Tian, Weiqi Luo, 13 Sep 2024, Expediting and Elevating Large Language Model Reasoning via Hidden Chain-of-Thought Decoding, https://arxiv.org/abs/2409.08561
  • Zayne Sprague, Fangcong Yin, Juan Diego Rodriguez, Dongwei Jiang, Manya Wadhwa, Prasann Singhal, Xinyu Zhao, Xi Ye, Kyle Mahowald, Greg Durrett, 18 Sep 2024, To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning, https://arxiv.org/abs/2409.12183
  • Santosh Kumar Radha, Yasamin Nouri Jelyani, Ara Ghukasyan, Oktay Goktas, 19 Sep 2024, Iteration of Thought: Leveraging Inner Dialogue for Autonomous Large Language Model Reasoning, https://arxiv.org/abs/2409.12618
  • Artem Shelamanov, Sep 2024, Why OpenAI’s o1 Model Is A Scam, https://pub.towardsai.net/why-openais-o1-model-is-a-scam-eb3356c3d70e
  • Chung-Yu Wang, Alireza DaghighFarsoodeh, Hung Viet Pham, 24 Sep 2024, Task-oriented Prompt Enhancement via Script Generation, https://arxiv.org/abs/2409.16418
  • Cassandra A. Cohen, William W. Cohen, 17 Sep 2024, Watch Your Steps: Observable and Modular Chains of Thought, https://arxiv.org/abs/2409.15359
  • Tongxuan Liu, Wenjiang Xu, Weizhe Huang, Xingyu Wang, Jiaxing Wang, Hailong Yang, Jing Li, 26 Sep 2024, Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in Large Language Models, https://arxiv.org/abs/2409.17539
  • Zhenwen Liang, Ye Liu, Tong Niu, Xiangliang Zhang, Yingbo Zhou, Semih Yavuz, 5 Oct 2024, Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification, https://arxiv.org/abs/2410.05318
  • Qiguang Chen, Libo Qin, Jiaqi Wang, Jinxuan Zhou, Wanxiang Che, 8 Oct 2024, Unlocking the Boundaries of Thought: A Reasoning Granularity Framework to Quantify and Optimize Chain-of-Thought, https://arxiv.org/abs/2410.05695 https://github.com/LightChen233/reasoning-granularity
  • Yingqian Cui, Pengfei He, Xianfeng Tang, Qi He, Chen Luo, Jiliang Tang, Yue Xing, 21 Oct 2024, A Theoretical Understanding of Chain-of-Thought: Coherent Reasoning and Error-Aware Demonstration, https://arxiv.org/abs/2410.16540
  • Banghao Chen, Zhaofeng Zhang, Nicolas Langrené, Shengxin Zhu, 5 Sep 2024 (v5), Unleashing the potential of prompt engineering in Large Language Models: a comprehensive review, https://arxiv.org/abs/2310.14735
  • Data Camp, Jul 10, 2024, Chain-of-Thought Prompting: Step-by-Step Reasoning with LLMs, https://www.datacamp.com/tutorial/chain-of-thought-prompting
  • Pankaj, Dec 21, 2023, Chain of Thought Prompting: Guiding LLMs Step-by-Step, https://medium.com/@pankaj_pandey/chain-of-thought-prompting-guiding-llms-step-by-step-e6eac32d02d8
  • Jason Wei and Denny Zhou, May 11, 2022, Language Models Perform Reasoning via Chain of Thought, https://research.google/blog/language-models-perform-reasoning-via-chain-of-thought/
  • Cameron R. Wolfe, Jul 24, 2023, Chain of Thought Prompting for LLMs: A practical and simple approach for “reasoning” with LLMs, https://towardsdatascience.com/chain-of-thought-prompting-for-llms-33c963eead38
  • Siwei Wu, Zhongyuan Peng, Xinrun Du, Tuney Zheng, Minghao Liu, Jialong Wu, Jiachen Ma, Yizhi Li, Jian Yang, Wangchunshu Zhou, Qunshu Lin, Junbo Zhao, Zhaoxiang Zhang, Wenhao Huang, Ge Zhang, Chenghua Lin, J.H. Liu, 22 Oct 2024 (v2), A Comparative Study on Reasoning Patterns of OpenAI's o1 Model, https://arxiv.org/abs/2410.13639
  • Tanay Jaipuria, Oct 29, 2024, OpenAI's o-1 and inference-time scaling laws, https://www.tanayj.com/p/openais-o-1-and-inference-time-scaling
  • Junda Wu, Xintong Li, Ruoyu Wang, Yu Xia, Yuxin Xiong, Jianing Wang, Tong Yu, Xiang Chen, Branislav Kveton, Lina Yao, Jingbo Shang, Julian McAuley, 31 Oct 2024, OCEAN: Offline Chain-of-thought Evaluation and Alignment in Large Language Models, https://arxiv.org/abs/2410.23703
  • Siyun Zhao, Yuqing Yang, Zilong Wang, Zhiyuan He, Luna K. Qiu, Lili Qiu, 23 Sep 2024, Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely, https://arxiv.org/abs/2409.14924
  • Guowei Xu, Peng Jin, Li Hao, Yibing Song, Lichao Sun, Li Yuan, 15 Nov 2024, LLaVA-o1: Let Vision Language Models Reason Step-by-Step, https://arxiv.org/abs/2411.10440
  • Carl Franzen, November 20, 2024, DeepSeek’s first reasoning model R1-Lite-Preview turns heads, beating OpenAI o1 performance, https://venturebeat.com/ai/deepseeks-first-reasoning-model-r1-lite-preview-turns-heads-beating-openai-o1-performance/
  • Yu Zhao, Huifeng Yin, Bo Zeng, Hao Wang, Tianqi Shi, Chenyang Lyu, Longyue Wang, Weihua Luo, Kaifu Zhang, 21 Nov 2024, Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions, https://arxiv.org/abs/2411.14405
  • Jun Gao, Yongqi Li, Ziqiang Cao, Wenjie Li, 29 Nov 2024, Interleaved-Modal Chain-of-Thought, https://arxiv.org/abs/2411.19488 (Using CoT on a multimodal/vision model.)
  • Hieu Tran, Zonghai Yao, Junda Wang, Yifan Zhang, Zhichao Yang, Hong Yu, 5 Dec 2024 (v2), RARE: Retrieval-Augmented Reasoning Enhancement for Large Language Models, https://arxiv.org/abs/2412.02830
  • Tiernan Ray, Dec. 10, 2024, How Cerebras boosted Meta's Llama to 'frontier model' performance The company also demonstrates initial training of a one-trillion-parameter AI model on a single machine using conventional DDR5 memory chips. https://www.zdnet.com/article/how-cerebras-boosted-metas-llama-to-frontier-model-performance/
  • Shibo Hao, Sainbayar Sukhbaatar, DiJia Su, Xian Li, Zhiting Hu, Jason Weston, Yuandong Tian, 9 Dec 2024, Training Large Language Models to Reason in a Continuous Latent Space, https://arxiv.org/abs/2412.06769
  • Ben Dickson, December 10, 2024, OpenAI’s o1 model doesn’t show its thinking, giving open source an advantage, https://venturebeat.com/ai/heres-how-openai-o1-might-lose-ground-to-open-source-models/
  • Zhe Chen, Weiyun Wang, Yue Cao, Yangzhou Liu, Zhangwei Gao, Erfei Cui, Jinguo Zhu, Shenglong Ye, Hao Tian, Zhaoyang Liu, Lixin Gu, Xuehui Wang, Qingyun Li, Yimin Ren, Zixuan Chen, Jiapeng Luo, Jiahao Wang, Tan Jiang, Bo Wang, Conghui He, Botian Shi, Xingcheng Zhang, Han Lv, Yi Wang, Wenqi Shao, Pei Chu, Zhongying Tu, Tong He, Zhiyong Wu, Huipeng Deng, Jiaye Ge, Kai Chen, Min Dou, Lewei Lu, Xizhou Zhu, Tong Lu, Dahua Lin, Yu Qiao, Jifeng Dai, Wenhai Wang, 6 Dec 2024, Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling, https://arxiv.org/abs/2412.05271
  • Jiaqi Zhang, Chen Gao, Liyuan Zhang, Yong Li, Hongzhi Yin, 10 Dec 2024, SmartAgent: Chain-of-User-Thought for Embodied Personalized Agent in Cyber World, https://arxiv.org/abs/2412.07472 https://github.com/tsinghua-fib-lab/SmartAgent
  • Kyle Wiggers, December 14, 2024, ‘Reasoning’ AI models have become a trend, for better or worse, https://techcrunch.com/2024/12/14/reasoning-ai-models-have-become-a-trend-for-better-or-worse/
  • Alberto Romero, Dec 21, 2024, OpenAI o3 Model Is a Message From the Future: Update All You Think You Know About AI. Incredible, a miracle, more than just a better state-of-the-art AI model. https://www.thealgorithmicbridge.com/p/openai-o3-model-is-a-message-from
  • Sabrina Ortiz, Dec. 20, 2024, OpenAI unveils its most advanced o3 reasoning model on its last day of 'shipmas', https://www.zdnet.com/article/openai-unveils-its-most-advanced-o3-reasoning-model-on-its-last-day-of-shipmas/
  • Tyler McDonald, Anthony Colosimo, Yifeng Li, Ali Emami, 2 Dec 2024, Can We Afford The Perfect Prompt? Balancing Cost and Accuracy with the Economical Prompting Index, https://arxiv.org/abs/2412.01690
  • Jiaxiang Liu, Yuan Wang, Jiawei Du, Joey Tianyi Zhou, Zuozhu Liu, 18 Dec 2024, MedCoT: Medical Chain of Thought via Hierarchical Expert, https://arxiv.org/abs/2412.13736
  • Changyue Wang, Weihang Su, Qingyao Ai, Yiqun Liu, 23 Dec 2024, Knowledge Editing through Chain-of-Thought, https://arxiv.org/abs/2412.17727 https://github.com/bebr2/EditCoT
  • Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, Karthik Narasimhan, 3 Dec 2023 (v2), Tree of Thoughts: Deliberate Problem Solving with Large Language Models, https://arxiv.org/abs/2305.10601 Code: https://github.com/princeton-nlp/tree-of-thought-llm
  • Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, Denny Zhou, 10 Jan 2023 (v6), Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. https://arxiv.org/abs/2201.11903
  • Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, Yusuke Iwasawa, 29 Jan 2023 (v4), Large Language Models are Zero-Shot Reasoners, https://arxiv.org/abs/2205.11916 https://github.com/kojima-takeshi188/zero_shot_cot ("Let's think step by step" prepended to every prompt for a type of zero-shot CoT.)
  • Xuezhi Wang, Denny Zhou, 23 May 2024 (v2), Chain-of-Thought Reasoning Without Prompting, https://arxiv.org/abs/2402.10200 ("CoT decoding" is examining the alternative paths in the decoding algorithm, which is somewhat similar to Chain-of-Thought reasoning.)
  • xjdr-alt, Dec 2024, entropix: Entropy Based Sampling and Parallel CoT Decoding, https://github.com/xjdr-alt/entropix (Parallel decoding attempts to get something similar to CoT.)
  • Huanjin Yao, Jiaxing Huang, Wenhao Wu, Jingyi Zhang, Yibo Wang, Shunyu Liu, Yingjie Wang, Yuxin Song, Haocheng Feng, Li Shen, Dacheng Tao, 24 Dec 2024, Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search, https://arxiv.org/abs/2412.18319 https://github.com/HJYao00/Mulberry (Multimodal multi-step reasoning like CoT.)
  • Xiangjue Dong, Maria Teleki, James Caverlee, 18 Dec 2024, A Survey on LLM Inference-Time Self-Improvement, https://arxiv.org/abs/2412.14352 https://github.com/dongxiangjue/Awesome-LLM-Self-Improvement (Broad survey of reasoning improvement methods from multi-step inference to RALM to decoding algorithms.)
  • Jiaan Wang, Fandong Meng, Yunlong Liang, Jie Zhou, 23 Dec 2024, DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought, https://arxiv.org/abs/2412.17498 https://github.com/krystalan/DRT-o1 (Examines similes and metaphors in literature using long CoT.)
  • Jiacheng Ye, Shansan Gong, Liheng Chen, Lin Zheng, Jiahui Gao, Han Shi, Chuan Wu, Xin Jiang, Zhenguo Li, Wei Bi, Lingpeng Kong, 5 Dec 2024 (v3), Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models, https://arxiv.org/abs/2402.07754
  • Shiv Sakhuja, 25 Sep 2024, Chain-of-Thought (CoT) Prompting Explained: 7 Techniques for Optimizing AI Performance, https://hub.athina.ai/athina-originals/guides-chain-of-thought-cot-prompting-explained-7-techniques-for-optimizing-ai-performance/
  • Aryasomayajula Ram Bharadwaj, 5 Dec 2024, Understanding Hidden Computations in Chain-of-Thought Reasoning, https://arxiv.org/abs/2412.04537
  • Aske Plaat, Annie Wong, Suzan Verberne, Joost Broekens, Niki van Stein, Thomas Back, 16 Jul 2024, Reasoning with Large Language Models, a Survey, https://arxiv.org/abs/2407.11511
  • Cheng Yang, Chufan Shi, Siheng Li, Bo Shui, Yujiu Yang, Wai Lam, 29 Dec 2024, LLM2: Let Large Language Models Harness System 2 Reasoning, https://arxiv.org/abs/2412.20372
  • Mayi Xu, Yunfeng Ning, Yongqi Li, Jianhao Chen, Jintao Wen, Yao Xiao, Shen Zhou, Birong Pan, Zepeng Bao, Xin Miao, Hankun Kang, Ke Sun, Tieyun Qian, 2 Jan 2025, Reasoning based on symbolic and parametric knowledge bases: a survey, https://arxiv.org/abs/2501.01030 (Extensive survey of reasoning from CoT to knowledge graphs to table-based reasoning.)
  • Yixin Ji, Juntao Li, Hai Ye, Kaixin Wu, Jia Xu, Linjian Mo, Min Zhang, 5 Jan 2025, Test-time Computing: from System-1 Thinking to System-2 Thinking, https://arxiv.org/abs/2501.02497
  • Violet Xiang, Charlie Snell, Kanishk Gandhi, Alon Albalak, Anikait Singh, Chase Blagden, Duy Phung, Rafael Rafailov, Nathan Lile, Dakota Mahan, Louis Castricato, Jan-Philipp Franken, Nick Haber, Chelsea Finn, 8 Jan 2025, Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought, https://arxiv.org/abs/2501.04682
  • Andrea Matarazzo, Riccardo Torlone, 3 Jan 2025, A Survey on Large Language Models with some Insights on their Capabilities and Limitations, https://arxiv.org/abs/2501.04040 (Broad survey with many LLM topics covered from history to architectures to optimizations.)
  • Ziyang Ma, Zhuo Chen, Yuping Wang, Eng Siong Chng, Xie Chen, 13 Jan 2025, Audio-CoT: Exploring Chain-of-Thought Reasoning in Large Audio Language Model, https://arxiv.org/abs/2501.07246
  • Tong Xiao, Jingbo Zhu, 16 Jan 2025, Foundations of Large Language Models, https://arxiv.org/abs/2501.09223 (Huge 230 page paper on many topics such as training, prompting, alignment, and long context.)
  • G Bao, H Zhang, C Wang, L Yang, Y Zhang, Jan 2025, How Likely Do LLMs with CoT Mimic Human Reasoning? Proceedings of the 31st International Conference on Computational Linguistics, pages 7831–7850, January 19–24, 2025, https://aclanthology.org/2025.coling-main.524.pdf
  • Son, M., Won, Y.-J., & Lee, S. (2025). Optimizing Large Language Models: A Deep Dive into Effective Prompt Engineering Techniques. Applied Sciences, 15(3), 1430. https://doi.org/10.3390/app15031430 https://www.mdpi.com/2076-3417/15/3/1430
  • Manish Sanwal, 3 Feb 2025 (v2), Layered Chain-of-Thought Prompting for Multi-Agent LLM Systems: A Comprehensive Approach to Explainable Large Language Models, https://arxiv.org/abs/2501.18645
  • Jianfeng Pan, Senyou Deng, Shaomang Huang, 4 Feb 2025, CoAT: Chain-of-Associated-Thoughts Framework for Enhancing Large Language Models Reasoning, https://arxiv.org/abs/2502.02390 (Integrating results from an "associative memory" in CoT reasoning paths at inference time.)
  • Avinash Patil, 5 Feb 2025, Advancing Reasoning in Large Language Models: Promising Methods and Approaches, https://arxiv.org/abs/2502.03671
  • Daniel Fleischer, Moshe Berchansky, Gad Markovits, Moshe Wasserblat, 13 Feb 2025, SQuARE: Sequential Question Answering Reasoning Engine for Enhanced Chain-of-Thought in Large Language Models, https://arxiv.org/abs/2502.09390 https://github.com/IntelLabs/RAG-FiT/tree/square
  • Komal Kumar, Tajamul Ashraf, Omkar Thawakar, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, Phillip H.S. Torr, Salman Khan, Fahad Shahbaz Khan, 28 Feb 2025, LLM Post-Training: A Deep Dive into Reasoning Large Language Models, https://arxiv.org/abs/2502.21321 https://github.com/mbzuai-oryx/Awesome-LLM-Post-training
  • Bin Hong, Jiayu Liu, Zhenya Huang, Kai Zhang, Mengdi Zhang, 13 Aug 2025, Pruning Long Chain-of-Thought of Large Reasoning Models via Small-Scale Preference Optimization, https://arxiv.org/abs/2508.10164
  • Ke Niu, Haiyang Yu, Zhuofan Chen, Mengyang Zhao, Teng Fu, Bin Li, Xiangyang Xue, 13 Aug 2025, From Intent to Execution: Multimodal Chain-of-Thought Reinforcement Learning for Precise CAD Code Generation, https://arxiv.org/abs/2508.10118
  • Ziyu Guo, Renrui Zhang, Chengzhuo Tong, Zhizheng Zhao, Rui Huang, Haoquan Zhang, Manyuan Zhang, Jiaming Liu, Shanghang Zhang, Peng Gao, Hongsheng Li, Pheng-Ann Heng, 23 Jul 2025, Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step, https://arxiv.org/abs/2501.13926
  • Ang Li, Charles Wang, Kaiyu Yue, Zikui Cai, Ollie Liu, Deqing Fu, Peng Guo, Wang Bill Zhu, Vatsal Sharan, Robin Jia, Willie Neiswanger, Furong Huang, Tom Goldstein, Micah Goldblum, 22 Jul 2025, Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning, https://arxiv.org/abs/2507.16746
  • Hulayyil Alshammari, Praveen Rao, 23 Jul 2025, Evaluating the Performance of AI Text Detectors, Few-Shot and Chain-of-Thought Prompting Using DeepSeek Generated Text, https://arxiv.org/abs/2507.17944
  • Binbin Ji, Siddharth Agrawal, Qiance Tang, and Yvonne Wu, 6 Jul 2025, Enhancing Spatial Reasoning in Vision-Language Models via Chain-of-Thought Prompting and Reinforcement Learning, https://arxiv.org/abs/2507.13362
  • Qiguang Chen, Libo Qin, Jinhao Liu, Dengyun Peng, Jiannan Guan, Peng Wang, Mengkang Hu, Yuhang Zhou, Te Gao, Wanxiang Che, 18 Jul 2025, Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models, https://arxiv.org/abs/2503.09567
  • Lei Chen, Xuanle Zhao, Zhixiong Zeng, Jing Huang, Yufeng Zhong, Lin Ma, 21 Jul 2025, Chart-R1: Chain-of-Thought Supervision and Reinforcement for Advanced Chart Reasoner, https://arxiv.org/abs/2507.15509
  • Luyi Ma, Wanjia Zhang, Kai Zhao, Abhishek Kulkarni, Lalitesh Morishetti, Anjana Ganesh, Ashish Ranjan, Aashika Padmanabhan, Jianpeng Xu, Jason Cho, Praveen Kanumala, Kaushiki Nag, Sumit Dutta, Kamiya Motwani, Malay Patel, Evren Korpeoglu, Sushant Kumar, Kannan Achan, 19 Jul 2025, GRACE: Generative Recommendation via Journey-Aware Sparse Attention on Chain-of-Thought Tokenization, https://arxiv.org/abs/2507.14758
  • Hao Yang, Qinghua Zhao, Lei Li, 28 Jul 2025, How Chain-of-Thought Works? Tracing Information Flow from Decoding, Projection, and Activation, https://arxiv.org/abs/2507.20758
  • Eunkyu Park, Wesley Hanwen Deng, Gunhee Kim, Motahhare Eslami, Maarten Sap, 27 Jul 2025, Cognitive Chain-of-Thought: Structured Multimodal Reasoning about Social Situations, https://arxiv.org/abs/2507.20409
  • Xiangning Yu, Zhuohan Wang, Linyi Yang, Haoxuan Li, Anjie Liu, Xiao Xue, Jun Wang, Mengyue Yang, 26 Jul 2025, Causal Sufficiency and Necessity Improves Chain-of-Thought Reasoning, https://arxiv.org/abs/2506.09853
  • Ping Yu, Jack Lanchantin, Tianlu Wang, Weizhe Yuan, Olga Golovneva, Ilia Kulikov, Sainbayar Sukhbaatar, Jason Weston, Jing Xu, 31 Jul 2025, CoT-Self-Instruct: Building high-quality synthetic prompts for reasoning and non-reasoning tasks, https://arxiv.org/abs/2507.23751
  • Xi Chen, Aske Plaat, Niki van Stein, 24 Jul 2025, How does Chain of Thought Think? Mechanistic Interpretability of Chain-of-Thought Reasoning with Sparse Autoencoding, https://arxiv.org/abs/2507.22928
  • Shixin Yi, Lin Shang, 1 Aug 2025, CoRGI: Verified Chain-of-Thought Reasoning with Visual Grounding, https://arxiv.org/abs/2508.00378
  • Jianwei Wang, Ziming Wu, Fuming Lai, Shaobing Lian, Ziqian Zeng, 1 Aug 2025, SynAdapt: Learning Adaptive Reasoning in Large Language Models via Synthetic Continuous Chain-of-Thought, https://arxiv.org/abs/2508.00574
  • Chengshuai Zhao, Zhen Tan, Pingchuan Ma, Dawei Li, Bohan Jiang, Yancheng Wang, Yingzhen Yang, Huan Liu, 2 Aug 2025, Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens, https://arxiv.org/abs/2508.01191
  • Jialiang Hong, Taihang Zhen, Kai Chen, Jiaheng Liu, Wenpeng Zhu, Jing Huo, Yang Gao, Depeng Wang, Haitao Wan, Xi Yang, Boyan Wang, Fanyu Meng, 4 Aug 2025, Reconsidering Overthinking: Penalizing Internal and External Redundancy in CoT Reasoning, https://arxiv.org/abs/2508.02178
  • Chloe Li, Mary Phuong, Noah Y. Siegel, 31 Jul 2025, LLMs Can Covertly Sandbag on Capability Evaluations Against Chain-of-Thought Monitoring, https://arxiv.org/abs/2508.00943
  • Weibo Zhou, Lingbo Li, Shangsong Liang, 2 Aug 2025, D-SCoRE: Document-Centric Segmentation and CoT Reasoning with Structured Export for QA-CoT Data Generation, https://arxiv.org/abs/2508.01309
  • Fan Gao, Cheng Huang, Nyima Tashi, Yutong Liu, Xiangxiang Wang, Thupten Tsering, Ban Ma-bao, Renzeg Duojie, Gadeng Luosang, Rinchen Dongrub, Dorje Tashi, Xiao Feng, Hao Wang, Yongbin Yu, 4 Aug 2025, TIBSTC-CoT: A Multi-Domain Instruction Dataset for Chain-of-Thought Reasoning in Language Models, https://arxiv.org/abs/2508.01977
  • Huihan Li, You Chen, Siyuan Wang, Yixin He, Ninareh Mehrabi, Rahul Gupta, Xiang Ren, 4 Aug 2025, Diagnosing Memorization in Chain-of-Thought Reasoning, One Token at a Time, https://arxiv.org/abs/2508.02037
  • Hongbo Jin, Ruyang Liu, Wenhao Zhang, Guibo Luo, Ge Li, 3 Aug 2025, CoT-Vid: Dynamic Chain-of-Thought Routing with Self Verification for Training-Free Video Reasoning, https://arxiv.org/abs/2505.11830
  • Zeju Li, Jianyuan Zhong, Ziyang Zheng, Xiangyu Wen, Zhijian Xu, Yingying Cheng, Fan Zhang, Qiang Xu, 5 Aug 2025, Compressing Chain-of-Thought in LLMs via Step Entropy, https://arxiv.org/abs/2508.03346
  • Jueon Park, Yein Park, Minju Song, Soyon Park, Donghyeon Lee, Seungheun Baek and Jaewoo Kang, 5 Aug 2025, CoTox: Chain-of-Thought-Based Molecular Toxicity Reasoning and Prediction, https://arxiv.org/abs/2508.03159
  • Junyao Yang, Jianwei Wang, Huiping Zhuang, Cen Chen, Ziqian Zeng, 5 Aug 2025, RCP-Merging: Merging Long Chain-of-Thought Models with Domain-Specific Models by Considering Reasoning Capability as Prior, https://arxiv.org/abs/2508.03140
  • Weihua Zheng, Xin Huang, Zhengyuan Liu, Tarun Kumar Vangani, Bowei Zou, Xiyan Tao, Yuhao Wu, Ai Ti Aw, Nancy F. Chen, Roy Ka-Wei Lee, 5 Aug 2025, AdaMCoT: Rethinking Cross-Lingual Factual Reasoning through Adaptive Multilingual Chain-of-Thought, https://arxiv.org/abs/2501.16154
  • Xingyu Chen, Junxiu An, Jun Guo, Li Wang, Jingcai Guo, 6 Aug 2025, KG-Augmented Executable CoT for Mathematical Coding, https://arxiv.org/abs/2508.04072
  • Xiao Wang, Liye Jin, Xufeng Lou, Shiao Wang, Lan Chen, Bo Jiang, Zhipeng Zhang, 7 Aug 2025, ReasoningTrack: Chain-of-Thought Reasoning for Long-term Vision-Language Tracking, https://arxiv.org/abs/2508.05221
  • Haonan Shangguan, Xiaocui Yang, Shi Feng, Daling Wang, Yifei Zhang, and Ge Yu, 7 Aug 2025, Resource-Limited Joint Multimodal Sentiment Reasoning and Classification via Chain-of-Thought Enhancement and Distillation, https://arxiv.org/abs/2508.05234
  • Tianyun Yang, Yunwen Li, Ziniu Li, Zhihang Lin, Ruoyu Sun, Tian Ding, 12 Aug 2025, Bridging Formal Language with Chain-of-Thought Reasoning to Geometry Problem Solving, https://arxiv.org/abs/2508.09099
  • Haiyun Guo, ZhiYan Hou, Yu Chen, Jinghan He, Yandu Sun, Yuzhe Zhou, Shujing Guo, Kuan Zhu, Jinqiao Wang, 31 Jul 2025, MLLM-CBench:A Comprehensive Benchmark for Continual Instruction Tuning of Multimodal LLMs with Chain-of-Thought Reasoning Analysis, https://arxiv.org/abs/2508.08275
  • Axel Delaval, Shujian Yang, Haicheng Wang, Han Qiu, Jialiang Lu, 15 Aug 2025, ToxiFrench: Benchmarking and Enhancing Language Models via CoT Fine-Tuning for French Toxicity Detection, https://arxiv.org/abs/2508.11281
  • Phuong Minh Nguyen, Tien Huu Dang, Naoya Inoue, 17 Aug 2025, Non-Iterative Symbolic-Aided Chain-of-Thought for Logical Reasoning, https://arxiv.org/abs/2508.12425
  • Zhifeng Kong, Arushi Goel, Joao Felipe Santos, Sreyan Ghosh, Rafael Valle, Wei Ping, Bryan Catanzaro, 15 Aug 2025, Audio Flamingo Sound-CoT Technical Report: Improving Chain-of-Thought Reasoning in Sound Understanding, https://arxiv.org/abs/2508.11818
  • Ruheng Wang, Hang Zhang, Trieu Nguyen, Shasha Feng, Hao-Wei Pang, Xiang Yu, Li Xiao, Peter Zhiping Zhang, 20 Aug 2025, PepThink-R1: LLM for Interpretable Cyclic Peptide Optimization with CoT SFT and Reinforcement Learning, https://arxiv.org/abs/2508.14765
  • Josh Barua, Seun Eisape, Kayo Yin, Alane Suhr, 20 Aug 2025, Long Chain-of-Thought Reasoning Across Languages, https://arxiv.org/abs/2508.14828
  • Wenqiao Zhu, Ji Liu, Rongjuncheng Zhang, Haipang Wu, Yulun Zhang, 21 Aug 2025, CARFT: Boosting LLM Reasoning via Contrastive Learning with Annotated Chain-of-Thought-based Reinforced Fine-Tuning, https://arxiv.org/abs/2508.15868

Advanced Chain-of-Thought

Some more research on advanced improvements to multi-step Chain-of-Thought are below. See also CoT efficiency optimizations.

  • Jiaan Wang, Fandong Meng, Yunlong Liang, Jie Zhou, 23 Dec 2024, DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought, https://arxiv.org/abs/2412.17498 https://github.com/krystalan/DRT-o1 (Examines similes and metaphors in literature using long CoT.)
  • Violet Xiang, Charlie Snell, Kanishk Gandhi, Alon Albalak, Anikait Singh, Chase Blagden, Duy Phung, Rafael Rafailov, Nathan Lile, Dakota Mahan, Louis Castricato, Jan-Philipp Franken, Nick Haber, Chelsea Finn, 8 Jan 2025, Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought, https://arxiv.org/abs/2501.04682
  • Haotian Xu, Xing Wu, Weinong Wang, Zhongzhi Li, Da Zheng, Boyuan Chen, Yi Hu, Shijia Kang, Jiaming Ji, Yingying Zhang, Zhijiang Guo, Yaodong Yang, Muhan Zhang, Debing Zhang, 20 Jan 2025, RedStar: Does Scaling Long-CoT Data Unlock Better Slow-Reasoning Systems? https://arxiv.org/abs/2501.11284 https://huggingface.co/RedStar-Reasoning
  • Yiyao Yu, Yuxiang Zhang, Dongdong Zhang, Xiao Liang, Hengyuan Zhang, Xingxing Zhang, Ziyi Yang, Mahmoud Khademi, Hany Awadalla, Junjie Wang, Yujiu Yang, Furu Wei, 19 Jan 2025, Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective, https://arxiv.org/abs/2501.11110
  • Yuanheng Fang, Guoqing Chao, Wenqiang Lei, Shaobo Li, Dianhui Chu, 21 Jan 2025, CDW-CoT: Clustered Distance-Weighted Chain-of-Thoughts Reasoning, https://arxiv.org/abs/2501.12226 (CoT with integration of clustering and prompt optimization techniques.)
  • Jishnu Ray Chowdhury, Cornelia Caragea, 21 Jan 2025, Zero-Shot Verification-guided Chain of Thoughts, https://arxiv.org/abs/2501.13122
  • Ziyu Guo, Renrui Zhang, Chengzhuo Tong, Zhizheng Zhao, Peng Gao, Hongsheng Li, Pheng-Ann Heng, 23 Jan 2025, Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step, https://arxiv.org/abs/2501.13926 https://github.com/ZiyuGuo99/Image-Generation-CoT
  • Liang Wang, Haonan Chen, Nan Yang, Xiaolong Huang, Zhicheng Dou, Furu Wei, 24 Jan 2025, Chain-of-Retrieval Augmented Generation, https://arxiv.org/abs/2501.14342 (Combines RAG with multi-step reasoning such as Chain-of-Thought, with a method to control token cost.)
  • Zhenrui Yue, Honglei Zhuang, Aijun Bai, Kai Hui, Rolf Jagerman, Hansi Zeng, Zhen Qin, Dong Wang, Xuanhui Wang, Michael Bendersky, 6 Oct 2024, Inference Scaling for Long-Context Retrieval Augmented Generation, https://arxiv.org/abs/2410.04343 (Combine RAG and multi-step inference, controlling token cost via budgeting allocations.)
  • Jianfeng Pan, Senyou Deng, Shaomang Huang, 4 Feb 2025, CoAT: Chain-of-Associated-Thoughts Framework for Enhancing Large Language Models Reasoning, https://arxiv.org/abs/2502.02390 (Integrating results from an "associative memory" in CoT reasoning paths at inference time.)
  • Chen, H., Zhu, J., Wang, W. et al. Triplet-based contrastive method enhances the reasoning ability of large language models. J Supercomput 81, 555 (2025). https://doi.org/10.1007/s11227-025-07056-6 https://link.springer.com/article/10.1007/s11227-025-07056-6 (Providing prompt examples that contrast correct and incorrect results to improve CoT reasoning.)
  • Yexiang Liu, Zekun Li, Zhi Fang, Nan Xu, Ran He, Tieniu Tan, 16 May 2025, Rethinking the Role of Prompting Strategies in LLM Test-Time Scaling: A Perspective of Probability Theory, https://arxiv.org/abs/2505.10981
  • Halil Alperen Gozeten, M. Emrullah Ildiz, Xuechen Zhang, Hrayr Harutyunyan, Ankit Singh Rawat, Samet Oymak, 29 May 2025, Continuous Chain of Thought Enables Parallel Exploration and Reasoning, https://arxiv.org/abs/2505.23648
  • Chengshuai Zhao, Zhen Tan, Pingchuan Ma, Dawei Li, Bohan Jiang, Yancheng Wang, Yingzhen Yang, Huan Liu, 13 Aug 2025 (v3), Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens, https://arxiv.org/abs/2508.01191

Tree-of-Thought (ToT)

Tree-of-thought is a tree-structured variant of multi-step Chain-of-Thought. Other tree-based versions of CoT are also examined below. Note that the "tree" structure also arises in "CoT decoding algorithms", which are single-step CoT-like inference optimizations that are based on the inherent tree hierarchy in beam search decoding.

Research papers on Tree-of-thought include:

Other Tree-Structured CoT Variants

Research papers on other tree-based CoT variants include:

  • Changcheng Li, Xiangyu Wang, Qiuju Chen, Xiren Zhou, Huanhuan Chen, 5 Dec 2024, MTMT: Consolidating Multiple Thinking Modes to Form a Thought Tree for Strengthening LLM, https://arxiv.org/abs/2412.03987
  • Yixin Ji, Juntao Li, Hai Ye, Kaixin Wu, Jia Xu, Linjian Mo, Min Zhang, 5 Jan 2025, Test-time Computing: from System-1 Thinking to System-2 Thinking, https://arxiv.org/abs/2501.02497
  • Fengli Xu, Qianyue Hao, Zefang Zong, Jingwei Wang, Yunke Zhang, Jingyi Wang, Xiaochong Lan, Jiahui Gong, Tianjian Ouyang, Fanjin Meng, Chenyang Shao, Yuwei Yan, Qinglong Yang, Yiwen Song, Sijian Ren, Xinyuan Hu, Yu Li, Jie Feng, Chen Gao, Yong Li, 17 Jan 2025 (v2), Towards Large Reasoning Models: A Survey on Scaling LLM Reasoning Capabilities, https://arxiv.org/abs/2501.09686
  • Tiesunlong Shen, Jin Wang1, Xuejie Zhang, Erik Cambria, Jan 2025, Reasoning with Trees: Faithful Question Answering over Knowledge Graph, Proceedings of the 31st International Conference on Computational Linguistics, pages 3138–3157 January 19–24, 2025, Association for Computational Linguistics, https://aclanthology.org/2025.coling-main.211.pdf
  • Maciej Besta, Julia Barth, Eric Schreiber, Ales Kubicek, Afonso Catarino, Robert Gerstenberger, Piotr Nyczyk, Patrick Iff, Yueling Li, Sam Houliston, Tomasz Sternal, Marcin Copik, Grzegorz Kwaśniewski, Jürgen Müller, Łukasz Flis, Hannes Eberhard, Hubert Niewiadomski, Torsten Hoefler, 23 Jan 2025 (v3), Reasoning Language Models: A Blueprint, https://arxiv.org/abs/2501.11223 (Survey and blueprint for how to build a Large Reasoning Model.)
  • Xiaoxue Cheng, Junyi Li, Wayne Xin Zhao, Ji-Rong Wen, 2 Jan 2025, Think More, Hallucinate Less: Mitigating Hallucinations via Dual Process of Fast and Slow Thinking, https://arxiv.org/abs/2501.01306
  • Kun-Peng Ning, Jia-Yu Yao, Yu-Yang Liu, Mu-Nan Ning, Li Yuan, 13 Jan 2025, GPT as a Monte Carlo Language Tree: A Probabilistic Perspective, https://arxiv.org/abs/2501.07641
  • G Wang, S Zhang, T Zhan, Z Shen, J Li, X Hu, X Sun, Jan 2025, Unlocking the Mysteries of OpenAI o1: A Survey of the Reasoning Abilities of Large Language Models, https://openreview.net/pdf?id=J0ADLa2rNp
  • Yang Li, 4 Feb 2025, Policy Guided Tree Search for Enhanced LLM Reasoning, https://arxiv.org/abs/2502.06813
  • Yifu Ding, Wentao Jiang, Shunyu Liu, Yongcheng Jing, Jinyang Guo, Yingjie Wang, Jing Zhang, Zengmao Wang, Ziwei Liu, Bo Du, Xianglong Liu, Dacheng Tao, 27 Feb 2025 (v2), Dynamic Parallel Tree Search for Efficient LLM Reasoning, https://arxiv.org/abs/2502.16235

Graph Reasoning

Graph reasoning is the use of a graph structure, such as a Knowledge Graph, as part of the reasoning algorithm. There is also a variant of Chain-of-Thought called "Graph-of-Thought" or GOT (dragons, anyone?). This is a further generalization of tree-based reasoning hierarchies.

Research papers on graph-based reasoning:

  • Cameron R. Wolfe, Jan 3, 2024, Graph-Based Prompting and Reasoning with Language Models. Understanding graph of thoughts prompting and several variants… https://towardsdatascience.com/graph-based-prompting-and-reasoning-with-language-models-d6acbcd6b3d8
  • Jiarui Ji, Runlin Lei, Jialing Bi, Zhewei Wei, Yankai Lin, Xuchen Pan, Yaliang Li, Bolin Ding, 13 Oct 2024, Dynamic and Textual Graph Generation Via Large-Scale LLM-based Agent Simulation, https://arxiv.org/abs/2410.09824
  • Yuwei Hu, Runlin Lei, Xinyi Huang, Zhewei Wei, Yongchao Liu, 7 Oct 2024, Scalable and Accurate Graph Reasoning with LLM-based Multi-Agents, https://arxiv.org/abs/2410.05130
  • Sambhav Khurana, Xiner Li, Shurui Gui, Shuiwang Ji, 29 Oct 2024, A Hierarchical Language Model For Interpretable Graph Reasoning, https://arxiv.org/abs/2410.22372
  • Haoyu Han, Yaochen Xie, Hui Liu, Xianfeng Tang, Sreyashi Nag, William Headden, Hui Liu, Yang Li, Chen Luo, Shuiwang Ji, Qi He, Jiliang Tang, 14 Jan 2025, Reasoning with Graphs: Structuring Implicit Knowledge to Enhance LLMs Reasoning, https://arxiv.org/abs/2501.07845
  • F. Alotaibi, A. Kulkarni and D. Zhou, "Graph of Logic: Enhancing LLM Reasoning with Graphs and Symbolic Logic," 2024 IEEE International Conference on Big Data (BigData), Washington, DC, USA, 2024, pp. 5926-5935, doi: 10.1109/BigData62323.2024.10825450. https://ieeexplore.ieee.org/abstract/document/10825450
  • Maciej Besta, Julia Barth, Eric Schreiber, Ales Kubicek, Afonso Catarino, Robert Gerstenberger, Piotr Nyczyk, Patrick Iff, Yueling Li, Sam Houliston, Tomasz Sternal, Marcin Copik, Grzegorz Kwaśniewski, Jürgen Müller, Łukasz Flis, Hannes Eberhard, Hubert Niewiadomski, Torsten Hoefler, 23 Jan 2025 (v3), Reasoning Language Models: A Blueprint, https://arxiv.org/abs/2501.11223 (Survey and blueprint for how to build a Large Reasoning Model.)
  • Xingtong Yu, Chang Zhou, Zhongwei Kuai, Xinming Zhang, Yuan Fang, 12 Feb 2025, GCoT: Chain-of-Thought Prompt Learning for Graphs, https://arxiv.org/abs/2502.08092
  • Han Zhang, Langshi Zhou, Hanfang Yang, 20 Feb 2025, Learning to Retrieve and Reason on Knowledge Graph through Active Self-Reflection, https://arxiv.org/abs/2502.14932
  • Anastasios Nentidis, Charilaos Akasiadis, Angelos Charalambidis, Alexander Artikis, 26 Feb 2025, Dealing with Inconsistency for Reasoning over Knowledge Graphs: A Survey, https://arxiv.org/abs/2502.19023
  • Avinash Patil, 5 Feb 2025, Advancing Reasoning in Large Language Models: Promising Methods and Approaches, https://arxiv.org/abs/2502.03671
  • Komal Kumar, Tajamul Ashraf, Omkar Thawakar, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, Phillip H.S. Torr, Salman Khan, Fahad Shahbaz Khan, 28 Feb 2025, LLM Post-Training: A Deep Dive into Reasoning Large Language Models, https://arxiv.org/abs/2502.21321 https://github.com/mbzuai-oryx/Awesome-LLM-Post-training
  • Wenjie Wu, Yongcheng Jing, Yingjie Wang, Wenbin Hu, Dacheng Tao, 3 Mar 2025, Graph-Augmented Reasoning: Evolving Step-by-Step Knowledge Graph Retrieval for LLM Reasoning, https://arxiv.org/abs/2503.01642

Skeleton-of-Thought

Skeleton-of-thought is a technique with dual aims of smarter reasoning and faster inference. The idea is to generate an outline that is a list of points, and then have the LLM process each sub-point in parallel. This allows both a more focused answer to that issue, and a faster parallelization of shorter token length answers.

Research on skeleton-of-thought reasoning includes:

Reflection

Reflection, or self-reflection, is a type of reasoning where the LLM takes an extra step to "reflect" on its own answers. This is a type of multi-step reasoning method, where the LLM is admonished to improve its own answers. There are different variants of self-reflection for training improvement or inference improvement.

Research papers on reflection:

  • Cogni Down Under, Sep 2024, Reflection 70B: The AI That Thinks Before It Speaks, https://medium.com/@cognidownunder/reflection-70b-the-ai-that-thinks-before-it-speaks-8a70d3a0e38a
  • Asankhaya Sharma (codelion), Sep 2024, Optillm: Optimizing inference proxy for LLMs, https://github.com/codelion/optillm
  • Pranab Sahoo, Ayush Kumar Singh, Sriparna Saha, Vinija Jain, Samrat Mondal, Aman Chadha, 5 Feb 2024, A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications, https://arxiv.org/abs/2402.07927
  • Lingjiao Chen, Jared Quincy Davis, Boris Hanin, Peter Bailis, Ion Stoica, Matei Zaharia, James Zou, 4 Jun 2024 (v2), Are More LLM Calls All You Need? Towards Scaling Laws of Compound Inference Systems, https://arxiv.org/abs/2403.02419
  • Siyun Zhao, Yuqing Yang, Zilong Wang, Zhiyuan He, Luna K. Qiu, Lili Qiu, 23 Sep 2024, Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely, https://arxiv.org/abs/2409.14924
  • Arun Shankar, Oct 2024, Designing Cognitive Architectures: Agentic Workflow Patterns from Scratch, https://medium.com/google-cloud/designing-cognitive-architectures-agentic-workflow-patterns-from-scratch-63baa74c54bc
  • Anita Kirkovska, David Vargas, Jul 11, 2024, Agentic Workflows in 2024: The ultimate guide, https://www.vellum.ai/blog/agentic-workflows-emerging-architectures-and-design-patterns
  • A. Singh, A. Ehtesham, S. Kumar and T. T. Khoei, "Enhancing AI Systems with Agentic Workflows Patterns in Large Language Model," 2024 IEEE World AI IoT Congress (AIIoT), Seattle, WA, USA, 2024, pp. 527-532, doi: 10.1109/AIIoT61789.2024.10578990. https://ieeexplore.ieee.org/abstract/document/10578990
  • Chawla, Chhavi; Chatterjee, Siddharth; Gadadinni, Sanketh Siddanna; Verma, Pulkit; Banerjee, Sourav, 2024, Agentic AI: The building blocks of sophisticated AI business applications, Journal of AI, Robotics & Workplace Automation, Volume 3 / Number 3 / Summer 2024, pp. 1-15(15), Henry Stewart Publications, DOI: https://doi.org/10.69554/XEHZ1946 https://www.ingentaconnect.com/content/hsp/airwa/2024/00000003/00000003/art00001
  • Yu Zhao, Huifeng Yin, Bo Zeng, Hao Wang, Tianqi Shi, Chenyang Lyu, Longyue Wang, Weihua Luo, Kaifu Zhang, 21 Nov 2024, Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions, https://arxiv.org/abs/2411.14405
  • mshumer, Nov 2024, Open Reasoning Engine, https://github.com/mshumer/OpenReasoningEngine
  • Yaoke Wang, Yun Zhu, Xintong Bao, Wenqiao Zhang, Suyang Dai, Kehan Chen, Wenqiang Li, Gang Huang, Siliang Tang, Yueting Zhuang, 18 Dec 2024, Meta-Reflection: A Feedback-Free Reflection Learning Framework, https://arxiv.org/abs/2412.13781 (One-shot reflection by using a cache of prior reflection results.)
  • Aske Plaat, Annie Wong, Suzan Verberne, Joost Broekens, Niki van Stein, Thomas Back, 16 Jul 2024, Reasoning with Large Language Models, a Survey, https://arxiv.org/abs/2407.11511
  • Thomas Palmeira Ferraz, Kartik Mehta, Yu-Hsiang Lin, Haw-Shiuan Chang, Shereen Oraby, Sijia Liu, Vivek Subramanian, Tagyoung Chung, Mohit Bansal, Nanyun Peng, 9 Oct 2024, LLM Self-Correction with DeCRIM: Decompose, Critique, and Refine for Enhanced Following of Instructions with Multiple Constraints, https://arxiv.org/abs/2410.06458
  • Yuhang Liu, Pengxiang Li, Zishu Wei, Congkai Xie, Xueyu Hu, Xinchen Xu, Shengyu Zhang, Xiaotian Han, Hongxia Yang, Fei Wu, 8 Jan 2025, InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection, https://arxiv.org/abs/2501.04575
  • Ruwei Pan, Hongyu Zhang, Chao Liu, 14 Jan 2025, CodeCoR: An LLM-Based Self-Reflective Multi-Agent Framework for Code Generation, https://arxiv.org/abs/2501.07811
  • Zekun Xi, Wenbiao Yin, Jizhan Fang, Jialong Wu, Runnan Fang, Ningyu Zhang, Jiang Yong, Pengjun Xie, Fei Huang, Huajun Chen, 16 Jan 2025, OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking, https://arxiv.org/abs/2501.09751 (Iteratively going deeper into a topic while generating.)
  • Siyu Yuan, Zehui Chen, Zhiheng Xi, Junjie Ye, Zhengyin Du, Jiecao Chen, 20 Jan 2025, Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training, https://arxiv.org/abs/2501.11425 (Iterative self-training using reflection.)
  • Maciej Besta, Julia Barth, Eric Schreiber, Ales Kubicek, Afonso Catarino, Robert Gerstenberger, Piotr Nyczyk, Patrick Iff, Yueling Li, Sam Houliston, Tomasz Sternal, Marcin Copik, Grzegorz Kwaśniewski, Jürgen Müller, Łukasz Flis, Hannes Eberhard, Hubert Niewiadomski, Torsten Hoefler, 23 Jan 2025 (v3), Reasoning Language Models: A Blueprint, https://arxiv.org/abs/2501.11223 (Survey and blueprint for how to build a Large Reasoning Model.)
  • Xiangjue Dong, Maria Teleki, James Caverlee, 18 Dec 2024, A Survey on LLM Inference-Time Self-Improvement, https://arxiv.org/abs/2412.14352 https://github.com/dongxiangjue/Awesome-LLM-Self-Improvement
  • M. Renze and E. Guven, "Self-Reflection in Large Language Model Agents: Effects on Problem-Solving Performance," 2024 2nd International Conference on Foundation and Large Language Models (FLLM), Dubai, United Arab Emirates, 2024, pp. 516-525, doi: 10.1109/FLLM63129.2024.10852426. https://ieeexplore.ieee.org/abstract/document/10852426/ https://github.com/matthewrenze/self-reflection
  • G Wang, S Zhang, T Zhan, Z Shen, J Li, X Hu, X Sun, Jan 2025, Unlocking the Mysteries of OpenAI o1: A Survey of the Reasoning Abilities of Large Language Models, https://openreview.net/pdf?id=J0ADLa2rNp
  • Komal Kumar, Tajamul Ashraf, Omkar Thawakar, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, Phillip H.S. Torr, Salman Khan, Fahad Shahbaz Khan, 28 Feb 2025, LLM Post-Training: A Deep Dive into Reasoning Large Language Models, https://arxiv.org/abs/2502.21321 https://github.com/mbzuai-oryx/Awesome-LLM-Post-training
  • Yichi Zhou, Jianqiu Zhao, Yongxin Zhang, Bohan Wang, Siran Wang, Luoxin Chen, Jiahui Wang, Haowei Chen, Allan Jie, Xinbo Zhang, Haocheng Wang, Luong Trung, Rong Ye, Phan Nhat Hoang, Huishuai Zhang, Peng Sun, Hang Li, 21 Jul 2025, Solving Formal Math Problems by Decomposition and Iterative Reflection, https://arxiv.org/abs/2507.15225
  • Yufan Song, Jiatao Zhang, Zeng Gu, Qingmiao Liang, Tuocheng Hu, Wei Song, Shiqiang Zhu, 20 Jul 2025, FCRF: Flexible Constructivism Reflection for Long-Horizon Robotic Task Planning with Large Language Models, https://arxiv.org/abs/2507.14975
  • Rui Lu and Jinhe Bi and Yunpu Ma and Feng Xiao and Yuntao Du and Yijun Tian, 10 Aug 2025, MV-Debate: Multi-view Agent Debate with Dynamic Reflection Gating for Multimodal Harmful Content Detection in Social Media, https://arxiv.org/abs/2508.05557
  • Shijie Cao, Yuan Yuan, 3 Aug 2025, ReflecSched: Solving Dynamic Flexible Job-Shop Scheduling via LLM-Powered Hierarchical Reflection, https://arxiv.org/abs/2508.01724
  • Abi Aryan, Zac Liu, 6 Aug 2025, Causal Reflection with Language Models, https://arxiv.org/abs/2508.04495
  • Vishnu Menon, Andy Cherney, Elizabeth B. Cloude, Li Zhang, Tiffany D. Do, 6 Aug 2025, Evaluating the Impact of LLM-guided Reflection on Learning Outcomes with Interactive AI-Generated Educational Podcasts, https://arxiv.org/abs/2508.04787
  • Jiameng Huang, Baijiong Lin, Guhao Feng, Jierun Chen, Di He, and Lu Hou, 7 Aug 2025, Efficient Reasoning for Large Reasoning Language Models via Certainty-Guided Reflection Suppression, https://arxiv.org/abs/2508.05337
  • Lingyuan Liu, Mengxiang Zhang, 8 Aug 2025, Less is More: Selective Reflection for Compatible and Efficient Knowledge Distillation in Large Language Models, https://arxiv.org/abs/2508.06135
  • Zeyu Tang, Alex John London, Atoosa Kasirzadeh, Sanmi Koyejo, Peter Spirtes, Kun Zhang, 10 Aug 2025, Algorithmic Fairness amid Social Determinants: Reflection, Characterization, and Approach, https://arxiv.org/abs/2508.08337
  • Jiawei Zhou, Amy Z. Chen, Darshi Shah, Laura M. Schwab Reese, and Munmun De Choudhury, 11 Aug 2025, A Risk Taxonomy and Reflection Tool for Large Language Model Adoption in Public Health, https://arxiv.org/abs/2411.02594
  • Katharina Stein, Nils Hodel, Daniel Fi\v{s}er, J\"org Hoffmann, Michael Katz and Alexander Koller, 19 Aug 2025, Improved Generalized Planning with LLMs through Strategy Refinement and Reflection, https://arxiv.org/abs/2508.13876
  • Feng Tian, Flora D. Salim, Hao Xue, 25 Aug 2025, TradingGroup: A Multi-Agent Trading System with Self-Reflection and Data-Synthesis, https://arxiv.org/abs/2508.17565
  • Fu-Chieh Chang, Yu-Ting Lee, Pei-Yuan Wu, 23 Aug 2025, Unveiling the Latent Directions of Reflection in Large Language Models, https://arxiv.org/abs/2508.16989
  • Melissa Kazemi Rad, Alberto Purpura, Himanshu Kumar, Emily Chen, Mohammad Shahed Sorower, 23 Aug 2025, GRAID: Synthetic Data Generation with Geometric Constraints and Multi-Agentic Reflection for Harmful Content Detection, https://arxiv.org/abs/2508.17057
  • Aswin RRV, Jacob Dineen, Divij Handa, Md Nayem Uddin, Mihir Parmar, Chitta Baral, Ben Zhou, 11 Aug 2025, ThinkTuning: Instilling Cognitive Reflections without Distillation, https://arxiv.org/abs/2508.07616

LLM as Judge

LLM as Judge is the method of improving outputs by having an LLM "judge" the correctness of another LLM's output, whether to evaluate it or make improvements. When the LLM judges its own output, this is known as "self-reflection." When an LLM judges a group of other LLM outputs from the same query, and chooses the best, this is called "Best-of-N."

Research papers on LLM-as-Judge areas:

  • Cameron R. Wolfe, Ph.D., Dec 02, 2024, Finetuning LLM Judges for Evaluation: The Prometheus suite, JudgeLM, PandaLM, AutoJ, and more..., https://cameronrwolfe.substack.com/p/finetuned-judge
  • Tom Schaul, 25 Nov 2024, Boundless Socratic Learning with Language Games, https://arxiv.org/abs/2411.16905
  • Mingchen Zhuge, Changsheng Zhao, Dylan Ashley, Wenyi Wang, Dmitrii Khizbullin, Yunyang Xiong, Zechun Liu, Ernie Chang, Raghuraman Krishnamoorthi, Yuandong Tian, Yangyang Shi, Vikas Chandra, Jürgen Schmidhuber, 16 Oct 2024 (v2), Agent-as-a-Judge: Evaluate Agents with Agents, https://arxiv.org/abs/2410.10934
  • Haitao Li, Qian Dong, Junjie Chen, Huixue Su, Yujia Zhou, Qingyao Ai, Ziyi Ye, Yiqun Liu, 10 Dec 2024 (v2), LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods, https://arxiv.org/abs/2412.05579 https://github.com/CSHaitao/Awesome-LLMs-as-Judges
  • Xiangjue Dong, Maria Teleki, James Caverlee, 18 Dec 2024, A Survey on LLM Inference-Time Self-Improvement, https://arxiv.org/abs/2412.14352 https://github.com/dongxiangjue/Awesome-LLM-Self-Improvement (Broad survey of reasoning improvement methods from multi-step inference to RALM to decoding algorithms.)
  • Aske Plaat, Annie Wong, Suzan Verberne, Joost Broekens, Niki van Stein, Thomas Back, 16 Jul 2024, Reasoning with Large Language Models, a Survey, https://arxiv.org/abs/2407.11511
  • Yixin Ji, Juntao Li, Hai Ye, Kaixin Wu, Jia Xu, Linjian Mo, Min Zhang, 5 Jan 2025, Test-time Computing: from System-1 Thinking to System-2 Thinking, https://arxiv.org/abs/2501.02497
  • Zhenting Wang, Shuming Hu, Shiyu Zhao, Xiaowen Lin, Felix Juefei-Xu, Zhuowei Li, Ligong Han, Harihar Subramanyam, Li Chen, Jianfa Chen, Nan Jiang, Lingjuan Lyu, Shiqing Ma, Dimitris N. Metaxas, Ankit Jain, 31 Dec 2024, MLLM-as-a-Judge for Image Safety without Human Labeling, https://arxiv.org/abs/2501.00192
  • Zheqi Lv, Wenkai Wang, Jiawei Wang, Shengyu Zhang, Fei Wu, 10 Jan 2025, Cascaded Self-Evaluation Augmented Training for Efficient Multimodal Large Language Models, https://arxiv.org/abs/2501.05662 (Optimize multimodal CoT by breaking down prompts into smaller sub-goals.)
  • Maciej Besta, Julia Barth, Eric Schreiber, Ales Kubicek, Afonso Catarino, Robert Gerstenberger, Piotr Nyczyk, Patrick Iff, Yueling Li, Sam Houliston, Tomasz Sternal, Marcin Copik, Grzegorz Kwaśniewski, Jürgen Müller, Łukasz Flis, Hannes Eberhard, Hubert Niewiadomski, Torsten Hoefler, 23 Jan 2025 (v3), Reasoning Language Models: A Blueprint, https://arxiv.org/abs/2501.11223 (Survey and blueprint for how to build a Large Reasoning Model.)
  • Yafu Li, Zhilin Wang, Tingchen Fu, Ganqu Cui, Sen Yang, Yu Cheng, 21 Jan 2025, From Drafts to Answers: Unlocking LLM Potential via Aggregation Fine-Tuning, https://arxiv.org/abs/2501.11877 (Fine-tune an LLM to accept multiple candidate answers and output a final one.)
  • Swarnadeep Saha, Xian Li, Marjan Ghazvininejad, Jason Weston, Tianlu Wang, 30 Jan 2025, Learning to Plan & Reason for Evaluation with Thinking-LLM-as-a-Judge, https://arxiv.org/abs/2501.18099
  • Yubo Wang, Xiang Yue, Wenhu Chen, 30 Jan 2025 (v2), Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate, https://arxiv.org/abs/2501.17703
  • Gregor Bachmann, Sotiris Anagnostidis, Albert Pumarola, Markos Georgopoulos, Artsiom Sanakoyeu, Yuming Du, Edgar Schönfeld, Ali Thabet, Jonas Kohler, 31 Jan 2025, Judge Decoding: Faster Speculative Sampling Requires Going Beyond Model Alignment, https://arxiv.org/abs/2501.19309 (Using "LLM as Judge" methods to speed up speculative decoding via higher acceptance rates.)
  • Joshua Ong Jun Leang, Giwon Hong, Wenda Li, Shay B. Cohen, 18 Feb 2025, Theorem Prover as a Judge for Synthetic Data Generation, https://arxiv.org/abs/2502.13137
  • Avinash Patil, 5 Feb 2025, Advancing Reasoning in Large Language Models: Promising Methods and Approaches, https://arxiv.org/abs/2502.03671
  • Evangelia Spiliopoulou, Riccardo Fogliato, Hanna Burnsky, Tamer Soliman, Jie Ma, Graham Horwood, Miguel Ballesteros, 8 Aug 2025, Play Favorites: A Statistical Method to Measure Self-Bias in LLM-as-a-Judge, https://arxiv.org/abs/2508.06709
  • Zailong Tian, Zhuoheng Han, Yanzhe Chen, Haozhe Xu, Xi Yang, Richeng Xuan, Houfeng Wang, Lizi Liao, 11 Aug 2025, Overconfidence in LLM-as-a-Judge: Diagnosis and Confidence-Driven Solution, https://arxiv.org/abs/2508.06225
  • Asaf Yehudai, Lilach Eden, Yotam Perlitz, Roy Bar-Haim, Michal Shmueli-Scheuer, 24 Jul 2025, CLEAR: Error Analysis via LLM-as-a-Judge Made Easy, https://arxiv.org/abs/2507.18392
  • Nitay Calderon, Roi Reichart, Rotem Dror, 8 Aug 2025, The Alternative Annotator Test for LLM-as-a-Judge: How to Statistically Justify Replacing Human Annotators with LLMs, https://arxiv.org/abs/2501.10970
  • Francesco Fabbri, Gustavo Penha, Edoardo D'Amico, Alice Wang, Marco De Nadai, Jackie Doremus, Paul Gigioli, Andreas Damianou, Oskar Stal, and Mounia Lalmas, 12 Aug 2025, Evaluating Podcast Recommendations with Profile-Aware LLM-as-a-Judge, https://arxiv.org/abs/2508.08777
  • Yang Zhang, Cunxiang Wang, Lindong Wu, Wenbo Yu, Yidong Wang, Guangsheng Bao, Jie Tang, 13 Aug 2025, UDA: Unsupervised Debiasing Alignment for Pair-wise LLM-as-a-Judge, https://arxiv.org/abs/2508.09724
  • Hongchao Jiang, Yiming Chen, Yushi Cao, Hung-yi Lee, Robby T. Tan, 14 Aug 2025, CodeJudgeBench: Benchmarking LLM-as-a-Judge for Coding Tasks, https://arxiv.org/abs/2507.10535
  • Arduin Findeis, Floris Weers, Guoli Yin, Ke Ye, Ruoming Pang, Tom Gunter, 22 Jul 2025, Can External Validation Tools Improve Annotation Quality for LLM-as-a-Judge?, https://arxiv.org/abs/2507.17015
  • Luke Guerdan, Solon Barocas, Kenneth Holstein, Hanna Wallach, Zhiwei Steven Wu, Alexandra Chouldechova, 21 Aug 2025, Validating LLM-as-a-Judge Systems under Rating Indeterminacy, https://arxiv.org/abs/2503.05965
  • Jiawen Shi, Zenghui Yuan, Yinuo Liu, Yue Huang, Pan Zhou, Lichao Sun, Neil Zhenqiang Gong, 24 Aug 2025, Optimization-based Prompt Injection Attack to LLM-as-a-Judge, https://arxiv.org/abs/2403.17710

System 2

System 2 is the slower reasoning mode of the human brain, which multi-step reasoning algorithms try to emulate. This is the conscious brain and its capability for rational reasoning, usually in a slow and step-by-step fashion, which reasoning algorithms such as Chain-of-Thought aim to copy. By comparison, System 1 is the sensory processing and intuitive type of brain functions, including the "subconscious" brain, which is massively parallel and innate, but also lacking in rationality and explainability, much like a raw neural network.

Research papers on LLMs and System 2 thinking modes:

  • Pranab Sahoo, Ayush Kumar Singh, Sriparna Saha, Vinija Jain, Samrat Mondal, Aman Chadha, 5 Feb 2024, A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications, https://arxiv.org/abs/2402.07927
  • Akash Bajwa, Oct 07, 2024, Inference Time Scaling Laws: AI Megacycle Of System 1 And System 2 Applications, https://akashbajwa.substack.com/p/inference-time-scaling-laws
  • Latent Space, Nov 05, 2024, Inference, Fast and Slow. When System 1/System 2 analogies are not enough: The 6 types of LLM inference https://www.latent.space/p/inference-fast-and-slow
  • Ping Yu, Jing Xu, Jason Weston, Ilia Kulikov, 24 Jul 2024 (v3), Distilling System 2 into System 1, https://arxiv.org/abs/2407.06023
  • DiJia Su, Sainbayar Sukhbaatar, Michael Rabbat, Yuandong Tian, Qinqing Zheng, 13 Oct 2024, Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces, https://arxiv.org/abs/2410.09918
  • Cheng Yang, Chufan Shi, Siheng Li, Bo Shui, Yujiu Yang, Wai Lam, 29 Dec 2024, LLM2: Let Large Language Models Harness System 2 Reasoning, https://arxiv.org/abs/2412.20372
  • Xiaoxue Cheng, Junyi Li, Wayne Xin Zhao, Ji-Rong Wen, 2 Jan 2025, Think More, Hallucinate Less: Mitigating Hallucinations via Dual Process of Fast and Slow Thinking, https://arxiv.org/abs/2501.01306
  • Scott C. Lowe, 29 Oct 2024 (v2), System 2 Reasoning Capabilities Are Nigh, https://arxiv.org/abs/2410.03662
  • Yixin Ji, Juntao Li, Hai Ye, Kaixin Wu, Jia Xu, Linjian Mo, Min Zhang, 5 Jan 2025, Test-time Computing: from System-1 Thinking to System-2 Thinking, https://arxiv.org/abs/2501.02497
  • Violet Xiang, Charlie Snell, Kanishk Gandhi, Alon Albalak, Anikait Singh, Chase Blagden, Duy Phung, Rafael Rafailov, Nathan Lile, Dakota Mahan, Louis Castricato, Jan-Philipp Franken, Nick Haber, Chelsea Finn, 8 Jan 2025, Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought, https://arxiv.org/abs/2501.04682
  • Fengli Xu, Qianyue Hao, Zefang Zong, Jingwei Wang, Yunke Zhang, Jingyi Wang, Xiaochong Lan, Jiahui Gong, Tianjian Ouyang, Fanjin Meng, Chenyang Shao, Yuwei Yan, Qinglong Yang, Yiwen Song, Sijian Ren, Xinyuan Hu, Yu Li, Jie Feng, Chen Gao, Yong Li, 17 Jan 2025 (v2), Towards Large Reasoning Models: A Survey on Scaling LLM Reasoning Capabilities, https://arxiv.org/abs/2501.09686
  • Maciej Besta, Julia Barth, Eric Schreiber, Ales Kubicek, Afonso Catarino, Robert Gerstenberger, Piotr Nyczyk, Patrick Iff, Yueling Li, Sam Houliston, Tomasz Sternal, Marcin Copik, Grzegorz Kwaśniewski, Jürgen Müller, Łukasz Flis, Hannes Eberhard, Hubert Niewiadomski, Torsten Hoefler, 23 Jan 2025 (v3), Reasoning Language Models: A Blueprint, https://arxiv.org/abs/2501.11223 (Survey and blueprint for how to build a Large Reasoning Model.)
  • Bilgehan Sel, Ruoxi Jia, Ming Jin, 23 Jan 2025, LLMs Can Plan Only If We Tell Them, https://arxiv.org/abs/2501.13545
  • Kounianhua Du, Hanjing Wang, Jianxing Liu, Jizheng Chen, Xinyi Dai, Yasheng Wang, Ruiming Tang, Yong Yu, Jun Wang, Weinan Zhang, 18 Feb 2025, Boost, Disentangle, and Customize: A Robust System2-to-System1 Pipeline for Code Generation, https://arxiv.org/abs/2502.12492
  • Alireza S. Ziabari, Nona Ghazizadeh, Zhivar Sourati, Farzan Karimi-Malekabadi, Payam Piray, Morteza Dehghani, 18 Feb 2025, Reasoning on a Spectrum: Aligning LLMs to System 1 and System 2 Thinking, https://arxiv.org/abs/2502.12470
  • Zhong-Zhi Li, Duzhen Zhang, Ming-Liang Zhang, Jiaxin Zhang, Zengyan Liu, Yuxuan Yao, Haotian Xu, Junhao Zheng, Pei-Jie Wang, Xiuyi Chen, Yingying Zhang, Fei Yin, Jiahua Dong, Zhijiang Guo, Le Song, Cheng-Lin Liu, 25 Feb 2025 (v2), From System 1 to System 2: A Survey of Reasoning Large Language Models, https://arxiv.org/abs/2502.17419
  • Pengcheng Wen, Jiaming Ji, Chi-Min Chan, Juntao Dai, Donghai Hong, Yaodong Yang, Sirui Han, Yike Guo, 17 Mar 2025, ThinkPatterns-21k: A Systematic Study on the Impact of Thinking Patterns in LLMs, https://arxiv.org/abs/2503.12918
  • Sapient Intelligence, 22/07/2025, Sapient Intelligence Open-Sources Hierarchical Reasoning Model, a Brain-Inspired Architecture That Solves Complex Reasoning Tasks With 27 Million Parameters, https://www.sapient.inc/blog/5
  • Sejin Kim, Sundong Kim, 13 Aug 2025, System 2 Reasoning for Human-AI Alignment: Generality and Adaptivity via ARC-AGI, https://arxiv.org/abs/2410.07866
  • Runqi Qiao and Qiuna Tan and Peiqing Yang and Yanzi Wang and Xiaowan Wang and Enhui Wan and Sitong Zhou and Guanting Dong and Yuchen Zeng and Yida Xu and Jie Wang and Chong Sun and Chen Li and Honggang Zhang, 14 Aug 2025, We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning, https://arxiv.org/abs/2508.10433

Best of N Reasoning

Best of N is an LLM reasoning method where multiple answers are generated, and the best one is chosen. You can use Best of N (BoN) with multiple answers from a single LLM, or in an ensemble inference architecture with answers from multiple different LLMs. Usually, the last step is another LLM inference that performs "LLM as Judge" computations to choose the best answer. It is also possible to use other types of non-LLM ranking algorithms to choose the best one.

Research papers on Best-of-N reasoning:

  • Siwei Wu, Zhongyuan Peng, Xinrun Du, Tuney Zheng, Minghao Liu, Jialong Wu, Jiachen Ma, Yizhi Li, Jian Yang, Wangchunshu Zhou, Qunshu Lin, Junbo Zhao, Zhaoxiang Zhang, Wenhao Huang, Ge Zhang, Chenghua Lin, J.H. Liu, 22 Oct 2024 (v2), A Comparative Study on Reasoning Patterns of OpenAI's o1 Model, https://arxiv.org/abs/2410.13639
  • Hanshi Sun, Momin Haider, Ruiqi Zhang, Huitao Yang, Jiahao Qiu, Ming Yin, Mengdi Wang, Peter Bartlett, Andrea Zanette, 26 Oct 2024, Fast Best-of-N Decoding via Speculative Rejection, https://arxiv.org/abs/2410.20290
  • Do Xuan Long, Duong Ngoc Yen, Anh Tuan Luu, Kenji Kawaguchi, Min-Yen Kan, Nancy F. Chen, 1 Nov 2024, Multi-expert Prompting Improves Reliability, Safety, and Usefulness of Large Language Models, https://arxiv.org/abs/2411.00492
  • Yinlam Chow, Guy Tennenholtz, Izzeddin Gur, Vincent Zhuang, Bo Dai, Sridhar Thiagarajan, Craig Boutilier, Rishabh Agarwal, Aviral Kumar, Aleksandra Faust, 18 Dec 2024, Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models, https://arxiv.org/abs/2412.15287
  • Violet Xiang, Charlie Snell, Kanishk Gandhi, Alon Albalak, Anikait Singh, Chase Blagden, Duy Phung, Rafael Rafailov, Nathan Lile, Dakota Mahan, Louis Castricato, Jan-Philipp Franken, Nick Haber, Chelsea Finn, 8 Jan 2025, Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought, https://arxiv.org/abs/2501.04682
  • Tong Xiao, Jingbo Zhu, 16 Jan 2025, Foundations of Large Language Models, https://arxiv.org/abs/2501.09223 (Huge 230 page paper on many topics such as training, prompting, alignment, and long context.)
  • Kuang-Huei Lee, Ian Fischer, Yueh-Hua Wu, Dave Marwood, Shumeet Baluja, Dale Schuurmans, Xinyun Chen, 17 Jan 2025, Evolving Deeper LLM Thinking, https://arxiv.org/abs/2501.09891 (An alternative search strategy broad/deep, compared to CoT and reflection.)
  • Edward Beeching, Lewis Tunstall, Sasha Rush Dec 16, 2024, Scaling Test Time Compute with Open Source Models, https://huggingface.co/spaces/HuggingFaceH4/blogpost-scaling-test-time-compute
  • Yafu Li, Zhilin Wang, Tingchen Fu, Ganqu Cui, Sen Yang, Yu Cheng, 21 Jan 2025, From Drafts to Answers: Unlocking LLM Potential via Aggregation Fine-Tuning, https://arxiv.org/abs/2501.11877 (Fine-tune an LLM to accept multiple candidate answers and output a final one.)
  • Weihua Du, Yiming Yang, Sean Welleck, 7 Feb 2025, Optimizing Temperature for Language Models with Multi-Sample Inference, https://arxiv.org/abs/2502.05234 https://github.com/StigLidu/TURN
  • Juntai Cao, Xiang Zhang, Raymond Li, Chuyuan Li, Shafiq Joty, Giuseppe Carenini, 27 Feb 2025, Multi2: Multi-Agent Test-Time Scalable Framework for Multi-Document Processing, https://arxiv.org/abs/2502.20592 (Test time computed applied to the multi-document summarization use case.)
  • Komal Kumar, Tajamul Ashraf, Omkar Thawakar, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, Phillip H.S. Torr, Salman Khan, Fahad Shahbaz Khan, 28 Feb 2025, LLM Post-Training: A Deep Dive into Reasoning Large Language Models, https://arxiv.org/abs/2502.21321 https://github.com/mbzuai-oryx/Awesome-LLM-Post-training
  • Chengsong Huang, Langlin Huang, Jixuan Leng, Jiacheng Liu, Jiaxin Huang, 25 Feb 2025, Efficient Test-Time Scaling via Self-Calibration, https://arxiv.org/abs/2503.00031
  • Yiming Wang, Pei Zhang, Siyuan Huang, Baosong Yang, Zhuosheng Zhang, Fei Huang, Rui Wang, 3 Mar 2025, Sampling-Efficient Test-Time Scaling: Self-Estimating the Best-of-N Sampling in Early Decoding, https://arxiv.org/abs/2503.01422
  • Yiwei Li, Jiayi Shi, Shaoxiong Feng, Peiwen Yuan, Xinglin Wang, Yueqi Zhang, Ji Zhang, Chuyi Tan, Boyuan Pan, Yao Hu, Kan Li, 7 Mar 2025, Speculative Decoding for Multi-Sample Inference, https://arxiv.org/abs/2503.05330 (Optimizing speculative decoding when generating multiple answers for a single query, such as for Best-of-N reasoning.)
  • Eric Zhao, Pranjal Awasthi, Sreenivas Gollapudi, 20 Feb 2025 (v2), Sample, Scrutinize and Scale: Effective Inference-Time Search by Scaling Verification https://arxiv.org/abs/2502.01839 (Wrapping a single model with a Best-of-N approach that self-selects the best answer can significantly improve reasoning rates.)
  • Ningning Wang, Xavier Hu, Pai Liu, He Zhu, Yue Hou, Heyuan Huang, Shengyu Zhang, Jian Yang, Jiaheng Liu, Ge Zhang, Changwang Zhang, Jun Wang, Yuchen Eleanor Jiang, Wangchunshu Zhou, 24 Jul 2025, Efficient Agents: Building Effective Agents While Reducing Cost, https://arxiv.org/pdf/2508.02694 https://github.com/OPPO-PersonalAI/OAgents
  • Shubham Toshniwal, Ivan Sorokin, Aleksander Ficek, Ivan Moshkov, Igor Gitman, 23 Jul 2025, GenSelect: A Generative Approach to Best-of-N, https://arxiv.org/abs/2507.17797
  • Jizhou Guo, Zhaomin Wu, Hanchen Yang, Philip S. Yu, 29 Jul 2025, Mining Intrinsic Rewards from LLM Hidden States for Efficient Best-of-N Sampling, https://arxiv.org/abs/2505.12225

Program Synthesis

Program synthesis is the reasoning method whereby the LLM can synthesize program code that is then executed to solve a problem. Using a Python interpreter with an LLM is common, but any language can potentially be used, including more abstract mathematical symbolic languages. The virtually unlimited flexibility of programming languages, when combined with LLM pattern-matching power to create code, offers a fertile area for reasoning advancement.

Research papers related to program synthesis and similar symbolic reasoning approaches:

  • Guoxin Chen, Minpeng Liao, Chengxi Li, Kai Fan, 6 May 2024, AlphaMath Almost Zero: process Supervision without process, https://arxiv.org/abs/2405.03553 https://github.com/MARIO-Math-Reasoning/Super_MARIO
  • Wenhu Chen, Xueguang Ma, Xinyi Wang, and William W Cohen. Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks. arXiv preprint arXiv:2211.12588, 2022. https://arxiv.org/abs/2211.12588 (Integrate a Python interpreter to execute the code generated by the LLM to answer the query.)
  • Luyu Gao, Aman Madaan, Shuyan Zhou, Uri Alon, Pengfei Liu, Yiming Yang, Jamie Callan, and Graham Neubig. Pal: Program-aided language models. In International Conference on Machine Learning, pages 10764–10799. PMLR, 2023. https://arxiv.org/abs/2211.10435 Code: http://reasonwithpal.com/ (Python interpreter integrated as a tool for LLMs.)
  • Long Hei Matthew Lam, Ehsan Shareghi, 1 Jun 2024, A Closer Look at Logical Reasoning with LLMs: The Choice of Tool Matters, https://arxiv.org/abs/2406.00284 (Using symbolic solvers with LLMs.)
  • M Keber, I Grubišic, A Barešic, A Jovic, 2024, A Review on Neuro-symbolic AI Improvements to Natural Language Processing, https://www.researchgate.net/profile/Alan-Jovic/publication/380911364_A_Review_on_Neuro-symbolic_AI_Improvements_to_Natural_Language_Processing/links/6655c0ec22a7f16b4f51fb2f/A-Review-on-Neuro-symbolic-AI-Improvements-to-Natural-Language-Processing.pdf
  • Joy He-Yueya, Gabriel Poesia, Rose E. Wang, and Noah D. Goodman. Solving math word problems by combining language models with symbolic solvers. ArXiv, abs/2304.09102, 2023. https://arxiv.org/abs/2304.09102
  • Owen Dugan, Donato Manuel Jimenez Beneto, Charlotte Loh, Zhuo Chen, Rumen Dangovski, Marin Soljačić, 4 Jun 2024, OccamLLM: Fast and Exact Language Model Arithmetic in a Single Step, https://arxiv.org/abs/2406.06576
  • Zayne Sprague, Fangcong Yin, Juan Diego Rodriguez, Dongwei Jiang, Manya Wadhwa, Prasann Singhal, Xinyu Zhao, Xi Ye, Kyle Mahowald, Greg Durrett, 18 Sep 2024, To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning, https://arxiv.org/abs/2409.12183
  • Yongchao Chen, Harsh Jhamtani, Srinagesh Sharma, Chuchu Fan, Chi Wang, 4 Oct 2024, Steering Large Language Models between Code Execution and Textual Reasoning, https://arxiv.org/abs/2410.03524 https://yongchao98.github.io/CodeSteer/
  • Iman Mirzadeh, Keivan Alizadeh, Hooman Shahrokhi, Oncel Tuzel, Samy Bengio, Mehrdad Farajtabar, 7 Oct 2024, GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models, https://arxiv.org/abs/2410.05229
  • Jiajun Chen, Yik-Cheung Tam, 5 Dec 2024, Enhancing Mathematical Reasoning in LLMs with Background Operators, https://arxiv.org/abs/2412.04110
  • Aske Plaat, Annie Wong, Suzan Verberne, Joost Broekens, Niki van Stein, Thomas Back, 16 Jul 2024, Reasoning with Large Language Models, a Survey, https://arxiv.org/abs/2407.11511
  • Mayi Xu, Yunfeng Ning, Yongqi Li, Jianhao Chen, Jintao Wen, Yao Xiao, Shen Zhou, Birong Pan, Zepeng Bao, Xin Miao, Hankun Kang, Ke Sun, Tieyun Qian, 2 Jan 2025, Reasoning based on symbolic and parametric knowledge bases: a survey, https://arxiv.org/abs/2501.01030 (Extensive survey of reasoning from CoT to knowledge graphs to table-based reasoning.)
  • Andrea Matarazzo, Riccardo Torlone, 3 Jan 2025, A Survey on Large Language Models with some Insights on their Capabilities and Limitations, https://arxiv.org/abs/2501.04040 (Broad survey with many LLM topics covered from history to architectures to optimizations.)
  • Ndea, Jan 16, 2025, Ndea is building frontier AI systems that blend intuitive pattern recognition and formal reasoning into a unified architecture., https://ndea.com/
  • François Chollet, 25 Nov 2019 (v2), On the Measure of Intelligence, https://arxiv.org/abs/1911.01547
  • Sumit Gulwani, Alex Polozov, Rishabh Singh, 2017, Program Synthesis, NOW, August 2017, Vol 4, https://www.microsoft.com/en-us/research/publication/program-synthesis/ https://www.microsoft.com/en-us/research/wp-content/uploads/2017/10/program_synthesis_now.pdf
  • Shraddha Barke, Emmanuel Anaya Gonzalez, Saketh Ram Kasibatla, Taylor Berg-Kirkpatrick, Nadia Polikarpova, 1 Nov 2024 (v2), HYSYNTH: Context-Free LLM Approximation for Guiding Program Synthesis, https://arxiv.org/abs/2405.15880
  • Stephen Mell, Steve Zdancewic, and Osbert Bastani. 2024. Optimal Program Synthesis via Abstract Interpretation. Proc. ACM Program. Lang. 8, POPL, Article 16 (January 2024), 25 pages. https://doi.org/10.1145/3632858 https://dl.acm.org/doi/abs/10.1145/3632858
  • Yixuan Li, Lewis Frampton, Federico Mora, Elizabeth Polgreen, 9 Jan 2025, Online Prompt and Solver Selection for Program Synthesis, https://arxiv.org/abs/2501.05247
  • Qikang Liu, Yang He, Yanwen Cai, Byeongguk Kwak, Yuepeng Wang, 8 Dec 2024, Synthesizing Document Database Queries using Collection Abstractions, https://arxiv.org/abs/2412.06102
  • F. Alotaibi, A. Kulkarni and D. Zhou, "Graph of Logic: Enhancing LLM Reasoning with Graphs and Symbolic Logic," 2024 IEEE International Conference on Big Data (BigData), Washington, DC, USA, 2024, pp. 5926-5935, doi: 10.1109/BigData62323.2024.10825450. https://ieeexplore.ieee.org/abstract/document/10825450
  • Yiyao Yu, Yuxiang Zhang, Dongdong Zhang, Xiao Liang, Hengyuan Zhang, Xingxing Zhang, Ziyi Yang, Mahmoud Khademi, Hany Awadalla, Junjie Wang, Yujiu Yang, Furu Wei, 19 Jan 2025, Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective, https://arxiv.org/abs/2501.11110
  • Benjamin Callewaert, Simon Vandevelde, Joost Vennekens, 24 Jan 2025, VERUS-LM: a Versatile Framework for Combining LLMs with Symbolic Reasoning, https://arxiv.org/abs/2501.14540
  • G Wang, S Zhang, T Zhan, Z Shen, J Li, X Hu, X Sun, Jan 2025, Unlocking the Mysteries of OpenAI o1: A Survey of the Reasoning Abilities of Large Language Models, https://openreview.net/pdf?id=J0ADLa2rNp
  • Mohit Sewak, Ph.D., January 29, 2025, Achieving General Intelligence (AGI) and Super Intelligence (ASI): Pathways, Uncertainties, and Ethical Concerns, https://towardsai.net/p/l/achieving-general-intelligence-agi-and-super-intelligence-asi-pathways-uncertainties-and-ethical-concerns
  • Yubin Ge, Salvatore Romeo, Jason Cai, Raphael Shu, Monica Sunkara, Yassine Benajiba, Yi Zhang, 3 Feb 2025, TReMu: Towards Neuro-Symbolic Temporal Reasoning for LLM-Agents with Memory in Multi-Session Dialogues, https://arxiv.org/abs/2502.01630
  • Avinash Patil, 5 Feb 2025, Advancing Reasoning in Large Language Models: Promising Methods and Approaches, https://arxiv.org/abs/2502.03671
  • Cheryl Li, Tianyuan Xu, Yiwen Guo, 5 Feb 2025, Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment, https://arxiv.org/abs/2502.07803
  • Hanmeng Liu, Zhizhang Fu, Mengru Ding, Ruoxi Ning, Chaoli Zhang, Xiaozhang Liu, Yue Zhang, 13 Feb 2025, Logical Reasoning in Large Language Models: A Survey, https://arxiv.org/abs/2502.09100
  • Zhong-Zhi Li, Duzhen Zhang, Ming-Liang Zhang, Jiaxin Zhang, Zengyan Liu, Yuxuan Yao, Haotian Xu, Junhao Zheng, Pei-Jie Wang, Xiuyi Chen, Yingying Zhang, Fei Yin, Jiahua Dong, Zhijiang Guo, Le Song, Cheng-Lin Liu, 25 Feb 2025 (v2), From System 1 to System 2: A Survey of Reasoning Large Language Models, https://arxiv.org/abs/2502.17419
  • Ali Forootani, 22 Mar 2025, A Survey on Mathematical Reasoning and Optimization with Large Language Models, https://arxiv.org/abs/2503.17726
  • Siheng Xiong, Jieyu Zhou, Zhangding Liu, Yusen Su, 2 May 2025, SymPlanner: Deliberate Planning in Language Models with Symbolic Representation, https://arxiv.org/abs/2505.01479
  • Adam Stein, Aaditya Naik, Neelay Velingker, Mayur Naik, Eric Wong, 30 May 2025, The Road to Generalizable Neuro-Symbolic Learning Should be Paved with Foundation Models, https://arxiv.org/abs/2505.24874
  • Martin Berger, Nathanaël Fijalkow, Mojtaba Valizadeh, 26 Apr 2025, GPU accelerated program synthesis: Enumerate semantics, not syntax! https://arxiv.org/abs/2504.18943
  • Simon Ouellette, 17 Jul 2025, Out-of-Distribution Generalization in the ARC-AGI Domain: Comparing Execution-Guided Neural Program Synthesis and Test-Time Fine-Tuning, https://arxiv.org/abs/2507.15877
  • Noah van der Vleuten, 20 Jul 2025, Dr. Boot: Bootstrapping Program Synthesis Language Models to Perform Repairing, https://arxiv.org/abs/2507.15889
  • Busra Icoz, Goksel Biricik, 24 Jul 2025, Automated Code Review Using Large Language Models with Symbolic Reasoning, https://arxiv.org/abs/2507.18476
  • Julien Pourcel, C\'edric Colas, Pierre-Yves Oudeyer, 10 Jul 2025, Self-Improving Language Models for Evolutionary Program Synthesis: A Case Study on ARC-AGI, https://arxiv.org/abs/2507.14172
  • Luca Salvatore Lorello, Nikolaos Manginas, Marco Lippi, Stefano Melacci, 23 Jul 2025, LTLZinc: a Benchmarking Framework for Continual Learning and Neuro-Symbolic Temporal Reasoning, https://arxiv.org/abs/2507.17482
  • Gary Marcus, Jul 14, 2025, How o3 and Grok 4 Accidentally Vindicated Neurosymbolic AI, https://garymarcus.substack.com/p/how-o3-and-grok-4-accidentally-vindicated
  • Lin-Han Jia, Si-Yu Han, Wen-Chao Hu, Jie-Jing Shao, Wen-Da Wei, Zhi Zhou, Lan-Zhe Guo, Yu-Feng Li, 10 Aug 2025, When Is Prior Knowledge Helpful? Exploring the Evaluation and Selection of Unsupervised Pretext Tasks from a Neuro-Symbolic Perspective, https://arxiv.org/abs/2508.07299
  • Raffaele Pojer, Andrea Passerini, Kim G. Larsen, Manfred Jaeger, 29 Jul 2025, A Neuro-Symbolic Approach for Probabilistic Reasoning on Graph Data, https://arxiv.org/abs/2507.21873
  • Andrew Kiruluta, Andreas Lemos, and Priscilla Burity, 27 Jul 2025, Operator-Based Machine Intelligence: A Hilbert Space Framework for Spectral Learning and Symbolic Reasoning, https://arxiv.org/abs/2507.21189
  • Andrew Kiruluta, Andreas Lemos, and Priscilla Burity, 27 Jul 2025, Beyond Neural Networks: Symbolic Reasoning over Wavelet Logic Graph Signals, https://arxiv.org/abs/2507.21190
  • Wenkai Tan, Alvaro Velasquez, Houbing Song, 28 Jul 2025, DEM-NeRF: A Neuro-Symbolic Method for Scientific Discovery through Physics-Informed Simulation, https://arxiv.org/abs/2507.21350
  • Vasileios Manginas, Nikolaos Manginas, Edward Stevinson, Sherwin Varghese, Nikos Katzouris, Georgios Paliouras, Alessio Lomuscio, 29 Jul 2025, A Scalable Approach to Probabilistic Neuro-Symbolic Robustness Verification, https://arxiv.org/abs/2502.03274
  • Oren Sultan, Eitan Stern, Dafna Shahaf, 29 Jul 2025, Towards Reliable Proof Generation with LLMs: A Neuro-Symbolic Approach, https://arxiv.org/abs/2505.14479
  • Tilman Hinnerichs, Bart Swinkels, Jaap de Jong, Reuben Gardos Reid, Tudor Magirescu, Neil Yorke-Smith, Sebastijan Dumancic, 10 Jul 2025, Modelling Program Spaces in Program Synthesis with Constraints, https://arxiv.org/abs/2508.00005
  • Xinkai Zou, Xuan Jiang, Ruikai Huang, Haoze He, Parv Kapoor, Jiahua Zhao, 3 Aug 2025, CloudAnoAgent: Anomaly Detection for Cloud Sites via LLM Agent with Neuro-Symbolic Mechanism, https://arxiv.org/abs/2508.01844
  • Long S. T. Nguyen, Khang H. N. Vo, Thu H. A. Nguyen, Tuan C. Bui, Duc Q. Nguyen, Thanh-Tung Tran, Anh D. Nguyen, Minh L. Nguyen, Fabien Baldacci, Thang H. Bui, Emanuel Di Nardo, Angelo Ciaramella, Son H. Le, Ihsan Ullah, Lorenzo Di Rocco, and Tho T. Quan, 2 Aug 2025, Bridging LLMs and Symbolic Reasoning in Educational QA Systems: Insights from the XAI Challenge at IJCNN 2025, https://arxiv.org/abs/2508.01263
  • Zewen Liu, Juntong Ni, Xianfeng Tang, Max S.Y. Lau, Wei Jin, 5 Aug 2025, Can Large Language Models Adequately Perform Symbolic Reasoning Over Time Series?, https://arxiv.org/abs/2508.03963
  • Andrew Kiruluta, 7 Aug 2025, A Novel Architecture for Symbolic Reasoning with Decision Trees and LLM Agents, https://arxiv.org/abs/2508.05311
  • Anjiang Wei, Tarun Suresh, Jiannan Cao, Naveen Kannan, Yuheng Wu, Kai Yan, Thiago S. F. X. Teixeira, Ke Wang, Alex Aiken, 8 Aug 2025, CodeARC: Benchmarking Reasoning Capabilities of LLM Agents for Inductive Program Synthesis, https://arxiv.org/abs/2503.23145
  • Iman Sharifi, Mustafa Yildirim, Saber Fallah, 17 Aug 2025, Towards Safe Autonomous Driving Policies using a Neuro-Symbolic Deep Reinforcement Learning Approach, https://arxiv.org/abs/2307.01316
  • Ronit Virwani and Ruchika Suryawanshi, 18 Aug 2025, LOOP: A Plug-and-Play Neuro-Symbolic Framework for Enhancing Planning in Autonomous Systems, https://arxiv.org/abs/2508.13371
  • Xiao-Wen Yang, Jie-Jing Shao, Lan-Zhe Guo, Bo-Wen Zhang, Zhi Zhou, Lin-Han Jia, Wang-Zhou Dai and Yu-Feng Li, 19 Aug 2025, Neuro-Symbolic Artificial Intelligence: Towards Improving the Reasoning Abilities of Large Language Models, https://arxiv.org/abs/2508.13678
  • Andrew Kiruluta, 19 Aug 2025, A Fully Spectral Neuro-Symbolic Reasoning Architecture with Graph Signal Processing as the Computational Backbone, https://arxiv.org/abs/2508.14923
  • Xuan Zhang, Zhijian Zhou, Weidi Xu, Yanting Miao, Chao Qu, Yuan Qi, 22 Aug 2025, Constraints-Guided Diffusion Reasoner for Neuro-Symbolic Learning, https://arxiv.org/abs/2508.16524
  • Christopher J. Mungall and Adnan Malik and Daniel R. Korn and Justin T. Reese and Noel M. O'Boyle, Noel and Janna Hastings, 24 Aug 2025, Chemical classification program synthesis using generative artificial intelligence, https://arxiv.org/abs/2505.18470
  • Justin Chih-Yao Chen, Sukwon Yun, Elias Stengel-Eskin, Tianlong Chen, Mohit Bansal, 18 Jul 2025, Symbolic Mixture-of-Experts: Adaptive Skill-based Routing for Heterogeneous Reasoning, https://arxiv.org/abs/2503.05641
  • Tong Yang, Yu Huang, Yingbin Liang, Yuejie Chi, 11 Aug 2025, Multi-head Transformers Provably Learn Symbolic Multi-step Reasoning via Gradient Descent, https://arxiv.org/abs/2508.08222
  • Qiushi Sun, Jinyang Gong, Lei Li, Qipeng Guo, Fei Yuan, 25 Jul 2025, CodeEvo: Interaction-Driven Synthesis of Code-centric Data through Hybrid and Iterative Feedback, https://arxiv.org/abs/2507.22080
  • Gongyao Jiang, Qiong Luo, 16 Aug 2025, Chart-CoCa: Self-Improving Chart Understanding of Vision LMs via Code-Driven Synthesis and Candidate-Conditioned Answering, https://arxiv.org/abs/2508.11975
  • Phuong Minh Nguyen, Tien Huu Dang, Naoya Inoue, 17 Aug 2025, Non-Iterative Symbolic-Aided Chain-of-Thought for Logical Reasoning, https://arxiv.org/abs/2508.12425

Reasoning Decoding Algorithms

Reasoning decoding algorithms, or Chain-of-Thought decoding algorithms, are methods of using the decoding phase of LLM inference rather than multiple steps. The idea is that the possible pathways based on logits can be similar to Chain-of-Thought reasoning, and these pathways can be explored and combined during inference. This yields an algorithm that is better at reasoning than simpler decoding algorithms, but is more efficient than Chain-of-Thought because it can examine multiple pathways in a single inference step.

Research papers on reasoning-decoding or CoT-decoding:

  • Xuezhi Wang, Denny Zhou, 23 May 2024 (v2), Chain-of-Thought Reasoning Without Prompting, https://arxiv.org/abs/2402.10200 ("CoT decoding" is examining the alternative paths in the decoding algorithm, which is somewhat similar to Chain-of-Thought reasoning.)
  • xjdr-alt, Dec 2024, entropix: Entropy Based Sampling and Parallel CoT Decoding, https://github.com/xjdr-alt/entropix (Parallel decoding attempts to get something similar to CoT.)
  • Hongxuan Zhang, Zhining Liu, Yao Zhao, Jiaqi Zheng, Chenyi Zhuang, Jinjie Gu, Guihai Chen, 4 Jun 2024 (v2), Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads to Answers Faster, https://arxiv.org/abs/2311.08263 (Use of Jacobi parallel decoding with Chain-of-Thought.)
  • Renato Vukovic, David Arps, Carel van Niekerk, Benjamin Matthias Ruppik, Hsien-Chin Lin, Michael Heck, Milica Gašić, 5 Aug 2024, Dialogue Ontology Relation Extraction via Constrained Chain-of-Thought Decoding, https://arxiv.org/abs/2408.02361
  • Yuntian Deng, Kiran Prasad, Roland Fernandez, Paul Smolensky, Vishrav Chaudhary, Stuart Shieber, 2 Nov 2023, Implicit Chain of Thought Reasoning via Knowledge Distillation, https://arxiv.org/abs/2311.01460 (Knowledge distillation applied to optimizing the interim computations in Chain-of-Thought.)
  • Yuntian Deng, Yejin Choi, Stuart Shieber, 23 May 2024, From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step, https://arxiv.org/abs/2405.14838
  • Ping Yu, Jing Xu, Jason Weston, Ilia Kulikov, 24 Jul 2024 (v3), Distilling System 2 into System 1, https://arxiv.org/abs/2407.06023
  • Mehul Damani, Idan Shenfeld, Andi Peng, Andreea Bobu, Jacob Andreas, 7 Oct 2024, Learning How Hard to Think: Input-Adaptive Allocation of LM Computation, https://arxiv.org/abs/2410.04707
  • Pranjal Aggarwal, Aman Madaan, Yiming Yang, Mausam, 16 Nov 2023 (v2), Let's Sample Step by Step: Adaptive-Consistency for Efficient Reasoning and Coding with LLMs, EMNLP 2023, https://arxiv.org/abs/2305.11860 https://www.sample-step-by-step.info/
  • Shibo Hao, Sainbayar Sukhbaatar, DiJia Su, Xian Li, Zhiting Hu, Jason Weston, Yuandong Tian, 9 Dec 2024, Training Large Language Models to Reason in a Continuous Latent Space, https://arxiv.org/abs/2412.06769 (Performing reasoning in a model trained to operate in the embedding vector space, rather than more directly in the token space.)
  • Luyang Liu, Jonas Pfeiffer, Jiaxing Wu, Jun Xie, Arthur Szlam, 23 Dec 2024, Deliberation in Latent Space via Differentiable Cache Augmentation, https://arxiv.org/abs/2412.17747 (Augmenting the KV cache with reasoning information so that decoding will mimic multi-step reasoning with fewer tokens required for intermediate steps.)
  • Sachin Goyal, Ziwei Ji, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar, Vaishnavh Nagarajan, 21 Apr 2024 (v3), Think before you speak: Training Language Models With Pause Tokens, https://arxiv.org/abs/2310.02226 (Inserting extra "pause tokens" that trigger the LLM to perform extra reasoning during the decoding phase.)
  • Yuval Shalev, Amir Feder, Ariel Goldstein, 19 Jun 2024, Distributional reasoning in LLMs: Parallel reasoning processes in multi-hop reasoning, https://arxiv.org/abs/2406.13858 (Using embeddings from intermediate model layers in decoding to mimic reasoning pathways.)
  • Eden Biran, Daniela Gottesman, Sohee Yang, Mor Geva, Amir Globerson, 14 Oct 2024 (v2), Hopping Too Late: Exploring the Limitations of Large Language Models on Multi-Hop Queries, https://arxiv.org/abs/2406.12775 (Backpatching prior layers using embeddings from the current activations to mimic multi-step reasoning.)
  • Jacob Pfau, William Merrill, Samuel R. Bowman, 24 Apr 2024, Let's Think Dot by Dot: Hidden Computation in Transformer Language Models, https://arxiv.org/abs/2404.15758 (Use of dummy "filler tokens" similar to "pause tokens" or "reasoning tokens" to aid multi-step reasoning in decoding.)
  • Eric Zelikman, Georges Harik, Yijia Shao, Varuna Jayasiri, Nick Haber, Noah D. Goodman, 18 Mar 2024 (v2), Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking, https://arxiv.org/abs/2403.09629 (Introduces answers between a start-of-thought and end-of-thought meta-token for reasoning.)
  • Haoran Wang, Kai Shu, Jan 2025, MakeEveryTokenCount: ASystematic Survey on Decoding Methods for Foundation Model, https://www.researchgate.net/profile/Haoran-Wang-96/publication/387703971_Make_Every_Token_Count_A_Systematic_Survey_on_Decoding_Methods_for_Foundation_Models/links/67784c8ce74ca64e1f49eb15/Make-Every-Token-Count-A-Systematic-Survey-on-Decoding-Methods-for-Foundation-Models.pdf https://github.com/wang2226/Awesome-LLM-Decoding
  • Phuc Phan, Hieu Tran, Long Phan, 23 Aug 2024 (v2), Distillation Contrastive Decoding: Improving LLMs Reasoning with Contrastive Decoding and Distillation, https://arxiv.org/abs/2402.14874
  • Maxime Peyrard, Martin Josifoski, Robert West, 21 Mar 2024, The Era of Semantic Decoding, https://arxiv.org/abs/2403.14562
  • Fengli Xu, Qianyue Hao, Zefang Zong, Jingwei Wang, Yunke Zhang, Jingyi Wang, Xiaochong Lan, Jiahui Gong, Tianjian Ouyang, Fanjin Meng, Chenyang Shao, Yuwei Yan, Qinglong Yang, Yiwen Song, Sijian Ren, Xinyuan Hu, Yu Li, Jie Feng, Chen Gao, Yong Li, 17 Jan 2025 (v2), Towards Large Reasoning Models: A Survey on Scaling LLM Reasoning Capabilities, https://arxiv.org/abs/2501.09686
  • Xiangjue Dong, Maria Teleki, James Caverlee, 18 Dec 2024, A Survey on LLM Inference-Time Self-Improvement, https://arxiv.org/abs/2412.14352 https://github.com/dongxiangjue/Awesome-LLM-Self-Improvement
  • Jonas Geiping, Sean McLeish, Neel Jain, John Kirchenbauer, Siddharth Singh, Brian R. Bartoldson, Bhavya Kailkhura, Abhinav Bhatele, Tom Goldstein, 7 Feb 2025, Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach, https://arxiv.org/abs/2502.05171
  • G Lu, L Peng, L Li, 2025, CoT-Decoding: Complex Reasoning via Chain-of-Thought Decoding, https://epubs.siam.org/doi/pdf/10.1137/1.9781611978520.44

Planning (as part of Reasoning)

Having an LLM know how to make a plan is part of intelligence. Here are some papers specifically on the aspect of "planning" as part of reasoning:

  • Myeonghwa Lee, Seonho An, Min-Soo Kim, 18 Jun 2024, PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makers, https://arxiv.org/abs/2406.12430 Code: https://github.com/myeon9h/PlanRAG
  • Vishal Rajput, Apr 11, 2024, What’s next for AI: AI agentic workflows? https://medium.com/aiguys/next-for-llms-and-rag-ai-agentic-workflows-1869ba0a6796
  • Zehui Chen, Kuikun Liu, Qiuchen Wang, Jiangning Liu, Wenwei Zhang, Kai Chen, Feng Zhao, 29 Jul 2024, MindSearch: Mimicking Human Minds Elicits Deep AI Searcher, https://arxiv.org/abs/2407.20183 Code: https://github.com/InternLM/MindSearch Project: https://mindsearch.netlify.app
  • Daniel Cao, Michael Katz, Harsha Kokel, Kavitha Srinivas, Shirin Sohrabi, 21 Aug 2024, Automating Thought of Search: A Journey Towards Soundness and Completeness, https://arxiv.org/abs/2408.11326
  • Vishal Rajput, Jul 8, 2024, Why LLMs Can’t Plan And Unlikely To Reach AGI? https://medium.com/aiguys/why-llms-cant-plan-and-unlikely-to-reach-agi-642bda3e0aa3
  • Evan Wang, Federico Cassano, Catherine Wu, Yunfeng Bai, Will Song, Vaskar Nath, Ziwen Han, Sean Hendryx, Summer Yue, Hugh Zhang, 5 Sep 2024, Planning In Natural Language Improves LLM Search For Code Generation, https://arxiv.org/abs/2409.03733
  • Yongjing Yin, Junran Ding, Kai Song, Yue Zhang, 17 Sep 2024, Semformer: Transformer Language Models with Semantic Planning, https://arxiv.org/abs/2409.11143
  • Chung-Yu Wang, Alireza DaghighFarsoodeh, Hung Viet Pham, 24 Sep 2024, Task-oriented Prompt Enhancement via Script Generation, https://arxiv.org/abs/2409.16418
  • LangChain, Jul 20, 2024, Planning for Agents, https://blog.langchain.dev/planning-for-agents/
  • A. Singh, A. Ehtesham, S. Kumar and T. T. Khoei, "Enhancing AI Systems with Agentic Workflows Patterns in Large Language Model," 2024 IEEE World AI IoT Congress (AIIoT), Seattle, WA, USA, 2024, pp. 527-532, doi: 10.1109/AIIoT61789.2024.10578990. https://ieeexplore.ieee.org/abstract/document/10578990
  • Chawla, Chhavi; Chatterjee, Siddharth; Gadadinni, Sanketh Siddanna; Verma, Pulkit; Banerjee, Sourav, 2024, Agentic AI: The building blocks of sophisticated AI business applications, Journal of AI, Robotics & Workplace Automation, Volume 3 / Number 3 / Summer 2024, pp. 1-15(15), Henry Stewart Publications, DOI: https://doi.org/10.69554/XEHZ1946 https://www.ingentaconnect.com/content/hsp/airwa/2024/00000003/00000003/art00001
  • Jian Xie, Kexun Zhang, Jiangjie Chen, Siyu Yuan, Kai Zhang, Yikai Zhang, Lei Li, Yanghua Xiao, 16 Oct 2024, Revealing the Barriers of Language Agents in Planning, https://arxiv.org/abs/2410.12409
  • Wenchao Xu, Jinyu Chen, Peirong Zheng, Xiaoquan Yi, Tianyi Tian, Wenhui Zhu, Quan Wan, Haozhao Wang, Yunfeng Fan, Qinliang Su, Xuemin Shen, https://arxiv.org/abs/2412.13437 18 Dec 2024, Deploying Foundation Model Powered Agent Services: A Survey, (A survey of not just deployment, but many inference optimization techniques.)
  • Gautier Dagan, Frank Keller, Alex Lascarides, 30 Dec 2024, Plancraft: an evaluation dataset for planning with LLM agents, https://arxiv.org/abs/2412.21033
  • Andrea Matarazzo, Riccardo Torlone, 3 Jan 2025, A Survey on Large Language Models with some Insights on their Capabilities and Limitations, https://arxiv.org/abs/2501.04040 (Broad survey with many LLM topics covered from history to architectures to optimizations.)
  • Paul Sawers, January 23, 2025, Meta’s Yann LeCun predicts a ‘new AI architectures paradigm’ within 5 years and ‘decade of robotics’, https://techcrunch.com/2025/01/23/metas-yann-lecun-predicts-a-new-ai-architectures-paradigm-within-5-years-and-decade-of-robotics/
  • Ben Dickson, January 22, 2025, DeepMind’s new inference-time scaling technique improves planning accuracy in LLMs, https://venturebeat.com/ai/deepmind-new-inference-time-scaling-technique-improves-planning-accuracy-in-llms/
  • Xinzhe Li, Jan 2025, A Review of Prominent Paradigms for LLM-Based Agents: Tool Use (Including RAG), Planning, and Feedback Learning, Proceedings of the 31st International Conference on Computational Linguistics, pages 9760–9779, January 19–24, 2025. ©2025 Association for Computational Linguistics, https://aclanthology.org/2025.coling-main.652.pdf https://github.com/xinzhel/LLM-Agent-Survey
  • S Wang, X Zhang, J Ma, A Hwang, Z Yu, Jan 2025, JumpStarter: Getting Started on Personal Goals with Adaptive Personal Context Curation, https://sitong-wang.github.io/data/JumpStarter.pdf (Long-term planning of goal-oriented long multi-step projects.)
  • Karthik Valmeekam, Kaya Stechly, Subbarao Kambhampati, 20 Sep 2024, LLMs Still Can't Plan; Can LRMs? A Preliminary Evaluation of OpenAI's o1 on PlanBench, https://arxiv.org/abs/2409.13373
  • Bilgehan Sel, Ruoxi Jia, Ming Jin, 23 Jan 2025, LLMs Can Plan Only If We Tell Them, https://arxiv.org/abs/2501.13545
  • Weimin Xiong, Yifan Song, Qingxiu Dong, Bingchan Zhao, Feifan Song, Xun Wang, Sujian Li, 4 Mar 2025, MPO: Boosting LLM Agents with Meta Plan Optimization, https://arxiv.org/abs/2503.02682
  • Yuqi Zhou, Shuai Wang, Sunhao Dai, Qinglin Jia, Zhaocheng Du, Zhenhua Dong, Jun Xu, 5 Mar 2025, CHOP: Mobile Operating Assistant with Constrained High-frequency Optimized Subtask Planning, https://arxiv.org/abs/2503.03743
  • P Verma, SP Midigeshi, G Sinha, A Solin, N Natarajan, Mar 2025, Plan *RAG: Efficient Test-Time Planning for Retrieval Augmented Generation, ICLR 2025 review, https://openreview.net/pdf?id=gi9aqlYdBk (Improve RAG reasoning efficiency via planning for parallel reasoning.)
  • Lutfi Eren Erdogan, Nicholas Lee, Sehoon Kim, Suhong Moon, Hiroki Furuta, Gopala Anumanchipalli, Kurt Keutzer, Amir Gholami, 12 Mar 2025, Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks, https://arxiv.org/abs/2503.09572
  • Siheng Xiong, Jieyu Zhou, Zhangding Liu, Yusen Su, 2 May 2025, SymPlanner: Deliberate Planning in Language Models with Symbolic Representation, https://arxiv.org/abs/2505.01479
  • Pengfei Cao, Tianyi Men, Wencan Liu, Jingwen Zhang, Xuzhao Li, Xixun Lin, Dianbo Sui, Yanan Cao, Kang Liu, Jun Zhao, 26 May 2025, Large Language Models for Planning: A Comprehensive and Systematic Survey, https://arxiv.org/abs/2505.19683
  • Kenneth Payne, Baptiste Alloui-Cros, 3 Jul 2025, Strategic Intelligence in Large Language Models: Evidence from evolutionary Game Theory, https://arxiv.org/abs/2507.02618
  • Guancheng Zeng, Xueyi Chen, Jiawang Hu, Shaohua Qi, Yaxuan Mao, Zhantao Wang, Yifan Nie, Shuang Li, Qiuyang Feng, Pengxu Qiu, Yujia Wang, Wenqiang Han, Linyan Huang, Gang Li, Jingjing Mo, Haowen Hu, 22 Jul 2025 (v2), Routine: A Structural Planning Framework for LLM Agent System in Enterprise, https://arxiv.org/abs/2507.14447
  • Sangwoo Jeon, Juchul Shin, Gyeong-Tae Kim, YeonJe Cho and Seongwoo Kim, 14 Aug 2025, Scaling Up without Fading Out: Goal-Aware Sparse GNN for RL-based Generalized Planning, https://arxiv.org/abs/2508.10747
  • Steven Klee and Yuntian Xia, 13 Aug 2025, Measuring Time Series Forecast Stability for Demand Planning, https://arxiv.org/abs/2508.10063
  • Anantha Narayanan, Battu Bhanu Teja, Pruthwik Mishra, 14 Aug 2025, TLE-Based A2C Agent for Terrestrial Coverage Orbital Path Planning, https://arxiv.org/abs/2508.10872
  • Rishi Parekh, Saisubramaniam Gopalakrishnan, Zishan Ahmad, Anirudh Deodhar, 23 Jul 2025, Leveraging Knowledge Graphs and LLM Reasoning to Identify Operational Bottlenecks for Warehouse Planning Assistance, https://arxiv.org/abs/2507.17273
  • Stefan Borgwardt, Duy Nhu, Gabriele R\"oger, 23 Jul 2025, Automated planning with ontologies under coherence update semantics (Extended Version), https://arxiv.org/abs/2507.15120
  • Muhayy Ud Din and Jan Rosell and Waseem Akram and Isiah Zaplana and Maximo A Roa and Irfan Hussain, 23 Jul 2025, Onto-LLM-TAMP: Knowledge-oriented Task and Motion Planning using Large Language Models, https://arxiv.org/abs/2412.07493
  • Zixiao Huang, Junhao Hu, Hao Lin, Chunyang Zhu, Yueran Tang, Quanlu Zhang, Zhen Guo, Zhenhua Li, Shengen Yan, Zhenhua Zhu, Guohao Dai, Yu Wang, 22 Jul 2025, Reducing GPU Memory Fragmentation via Spatio-Temporal Planning for Efficient Large-Scale Model Training, https://arxiv.org/abs/2507.16274
  • Chi-Pin Huang, Yueh-Hua Wu, Min-Hung Chen, Yu-Chiang Frank Wang, Fu-En Yang, 22 Jul 2025, ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning, https://arxiv.org/abs/2507.16815
  • Dominic LaBella, Valeriia Abramova, Mehdi Astaraki, Andre Ferreira, Zhifan Jiang, Mason C. Cleveland, Ramandeep Kang, Uma M. Lal-Trehan Estrada, Cansu Yalcin, Rachika E. Hamadache, Clara Lisazo, Adri\`a Casamitjana, Joaquim Salvi, Arnau Oliver, Xavier Llad\'o, Iuliana Toma-Dasu, Tiago Jesus, Behrus Puladi, Jens Kleesiek, Victor Alves, Jan Egger, Daniel Capell\'an-Mart\'in, Abhijeet Parida, Austin Tapp, Xinyang Liu, Maria J. Ledesma-Carbayo, Jay B. Patel, Thomas N. McNeal, Maya Viera, Owen McCall, Albert E. Kim, Elizabeth R. Gerstner, Christopher P. Bridge, Katherine Schumacher, Michael Mix, Kevin Leu, Shan McBurney-Lin, Pierre Nedelec, Javier Villanueva-Meyer, David R. Raleigh, Jonathan Shapey, Tom Vercauteren, Kazumi Chia, Marina Ivory, Theodore Barfoot, Omar Al-Salihi, Justin Leu, Lia M. Halasz, et al. (57 additional authors not shown), 21 Jul 2025, Analysis of the 2024 BraTS Meningioma Radiotherapy Planning Automated Segmentation Challenge, https://arxiv.org/abs/2405.18383
  • Bofei Liu and Dong Ye and Zunhao Yao and Zhaowei Sun, 22 Jul 2025, A Goal-Oriented Reinforcement Learning-Based Path Planning Algorithm for Modular Self-Reconfigurable Satellites, https://arxiv.org/abs/2505.01966
  • Sunandita Patra, Mehtab Pathan, Mahmoud Mahfouz, Parisa Zehtabi, Wided Ouaja, Daniele Magazzeni, and Manuela Veloso, 21 Jul 2025, Capacity Planning and Scheduling for Jobs with Uncertainty in Resource Usage and Duration, https://arxiv.org/abs/2507.01225
  • Dario Della Monica, Angelo Montanari, Pietro Sala, 23 Jul 2025, Synthesis of timeline-based planning strategies avoiding determinization, https://arxiv.org/abs/2507.17988
  • Gilberto Cunha, Alexandra Ram\^oa, Andr\'e Sequeira, Michael de Oliveira, Lu\'is Barbosa, 24 Jul 2025, Hybrid quantum-classical algorithm for near-optimal planning in POMDPs, https://arxiv.org/abs/2507.18606
  • Andres M Bran, Theo A Neukomm, Daniel P Armstrong, Zlatko Jon\v{c}ev, Philippe Schwaller, 23 Jul 2025, Chemical reasoning in LLMs unlocks strategy-aware synthesis planning and reaction mechanism elucidation, https://arxiv.org/abs/2503.08537
  • Zhiwei Xu, 24 Jul 2025, DAA*: Deep Angular A Star for Image-based Path Planning, https://arxiv.org/abs/2507.09305
  • Genliang Li, Yaxin Cui, Jinyu Su, 18 Jul 2025, A multi-strategy improved snake optimizer for three-dimensional UAV path planning and engineering problems, https://arxiv.org/abs/2507.14043
  • Yuejiao Xie, Maonan Wang, Di Zhou, Man-On Pun, and Zhu Han, 18 Jul 2025, Real-Time Communication-Aware Ride-Sharing Route Planning for Urban Air Mobility: A Multi-Source Hybrid Attention Reinforcement Learning Approach, https://arxiv.org/abs/2507.14249
  • Yufan Song, Jiatao Zhang, Zeng Gu, Qingmiao Liang, Tuocheng Hu, Wei Song, Shiqiang Zhu, 20 Jul 2025, FCRF: Flexible Constructivism Reflection for Long-Horizon Robotic Task Planning with Large Language Models, https://arxiv.org/abs/2507.14975
  • Thanh Thi Nguyen, Saeid Nahavandi, Imran Razzak, Dung Nguyen, Nhat Truong Pham, Quoc Viet Hung Nguyen, 21 Jul 2025, The Emergence of Deep Reinforcement Learning for Path Planning, https://arxiv.org/abs/2507.15469
  • Alexandru Coca, Mark Gaynor, Zhenxing Zhang, Jianpeng Cheng, Bo-Hsiang Tseng, Pete Boothroyd, H\'ector Martinez Alonso, Diarmuid \'O S\'eaghdha, Anders Johannsen, 21 Jul 2025, ASPERA: A Simulated Environment to Evaluate Planning for Complex Action Execution, https://arxiv.org/abs/2507.15501
  • Jubin Abhishek Soni, Amit Anand, Rajesh Kumar Pandey, Aniket Abhishek Soni, 19 Jul 2025, Dynamic Context Tuning for Retrieval-Augmented Generation: Enhancing Multi-Turn Planning and Tool Adaptation, https://arxiv.org/abs/2506.11092
  • Giwon Lee, Wooseong Jeong, Daehee Park, Jaewoo Jeong, and Kuk-Jin Yoon, 21 Jul 2025, Interaction-Merged Motion Planning: Effectively Leveraging Diverse Motion Datasets for Robust Planning, https://arxiv.org/abs/2507.04790
  • Abhinav Sagar, Sai Teja Gilukara, 20 Jul 2025, CBAGAN-RRT: Convolutional Block Attention Generative Adversarial Network for Sampling-Based Path Planning, https://arxiv.org/abs/2305.10442
  • Hayeon Oh, 21 Jul 2025, LaViPlan : Language-Guided Visual Path Planning with RLVR, https://arxiv.org/abs/2507.12911
  • Markus Fritzsche, Elliot Gestrin, Jendrik Seipp, 11 Aug 2025, Symmetry-Aware Transformer Training for Automated Planning, https://arxiv.org/abs/2508.07743
  • Alejandro Murillo-Gonzalez, Junhong Xu and Lantao Liu, 8 Aug 2025, Learning Causal Structure Distributions for Robust Planning, https://arxiv.org/abs/2508.06742
  • Naiyi Li, Zihui Ma, Runlong Yu, Lingyao Li, 9 Aug 2025, LSDTs: LLM-Augmented Semantic Digital Twins for Adaptive Knowledge-Intensive Infrastructure Planning, https://arxiv.org/abs/2508.06799
  • Alberto Pozanco, Marianela Morales, Daniel Borrajo, Manuela Veloso, 11 Aug 2025, A Planning Compilation to Reason about Goal Achievement at Planning Time, https://arxiv.org/abs/2503.09545
  • Yanchen Zhu, Honghui Zou, Chufan Liu, Yuyu Luo, Yuankai Wu, Yuxuan Liang, 10 Aug 2025, Reinforcement Learning for Hybrid Charging Stations Planning and Operation Considering Fixed and Mobile Chargers, https://arxiv.org/abs/2506.16764
  • Jaike van Twiller, Yossiri Adulyasak, Erick Delage, Djordje Grbic, Rune M{\o}ller Jensen, 11 Aug 2025, Navigating Demand Uncertainty in Container Shipping: Deep Reinforcement Learning for Enabling Adaptive and Feasible Master Stowage Planning, https://arxiv.org/abs/2502.12756
  • Vlad Sobal, Wancong Zhang, Kyunghyun Cho, Randall Balestriero, Tim G. J. Rudner, Yann LeCun, 10 Aug 2025, Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models, https://arxiv.org/abs/2502.14819
  • Zhipeng Tang, Sha Zhang, Jiajun Deng, Chenjie Wang, Guoliang You, Yuting Huang, Xinrui Lin and Yanyong Zhang, 27 Jul 2025, VLMPlanner: Integrating Visual Language Models with Motion Planning, https://arxiv.org/abs/2507.20342
  • Sara Pohland and Claire Tomlin, 26 Jul 2025, Competency-Aware Planning for Probabilistically Safe Navigation Under Perception Uncertainty, https://arxiv.org/abs/2409.06111
  • Chang-Hun Ji, SiWoon Song, Youn-Hee Han, SungTae Moon, 29 Jul 2025, Decision Transformer-Based Drone Trajectory Planning with Dynamic Safety-Efficiency Trade-Offs, https://arxiv.org/abs/2507.21506
  • Tyler Han, Yanda Bao, Bhaumik Mehta, Gabriel Guo, Anubhav Vishwakarma, Emily Kang, Sanghun Jung, Rosario Scalise, Jason Zhou, Bryan Xu, Byron Boots, 29 Jul 2025, Model Predictive Adversarial Imitation Learning for Planning from Observation, https://arxiv.org/abs/2507.21533
  • Yi Kong, Dianxi Shi, Guoli Yang, Zhang ke-di, Chenlin Huang, Xiaopeng Li, Songchang Jin, 29 Jul 2025, MapAgent: Trajectory-Constructed Memory-Augmented Planning for Mobile Task Automation, https://arxiv.org/abs/2507.21953
  • Ratijit Mitra and Indranil Saha, 28 Jul 2025, Online Concurrent Multi-Robot Coverage Path Planning, https://arxiv.org/abs/2403.10460
  • Shengao Yi, Xiaojiang Li, Wei Tu, Tianhong Zhao, 30 Jul 2025, Planning for Cooler Cities: A Multimodal AI Framework for Predicting and Mitigating Urban Heat Stress through Urban Landscape Transformation, https://arxiv.org/abs/2507.23000
  • Mahmoud Ghorab and Matthias Lorenzen, 31 Jul 2025, Multi-Waypoint Path Planning and Motion Control for Non-holonomic Mobile Robots in Agricultural Applications, https://arxiv.org/abs/2507.23350
  • Yiyan Ji, Haoran Chen, Qiguang Chen, Chengyue Wu, Libo Qin, Wanxiang Che, 31 Jul 2025, MPCC: A Novel Benchmark for Multimodal Planning with Complex Constraints in Multimodal Large Language Models, https://arxiv.org/abs/2507.23382
  • Kai Goebel and Patrik Zips, 31 Jul 2025, Can LLM-Reasoning Models Replace Classical Planning? A Benchmark Study, https://arxiv.org/abs/2507.23589
  • Babak Esmaeili, Hamidreza Modares, Stefano Di Cairano, 31 Jul 2025, Data-Driven Motion Planning for Uncertain Nonlinear Systems, https://arxiv.org/abs/2508.00154
  • Milad Farjadnasab, Shahin Sirouspour, 31 Jul 2025, Cooperative and Asynchronous Transformer-based Mission Planning for Heterogeneous Teams of Mobile Robots, https://arxiv.org/abs/2410.06372
  • Yuanzhe Shen, Kaimin Wang, Changze Lv, Xiaoqing Zheng, Xuanjing Huang, 2 Aug 2025, TripTailor: A Real-World Benchmark for Personalized Travel Planning, https://arxiv.org/abs/2508.01432
  • Yinghao Zhu, Yifan Qi, Zixiang Wang, Lei Gu, Dehao Sui, Haoran Hu, Xichen Zhang, Ziyi He, Liantao Ma, Lequan Yu, 4 Aug 2025, HealthFlow: A Self-Evolving AI Agent with Meta Planning for Autonomous Healthcare Research, https://arxiv.org/abs/2508.02621
  • Enrique Valero-Leal, Pedro Larra\~naga and Concha Bielza, 4 Aug 2025, Actionable Counterfactual Explanations Using Bayesian Networks and Path Planning with Applications to Environmental Quality Improvement, https://arxiv.org/abs/2508.02634
  • Mikhail Andronov, Natalia Andronova, Michael Wand, J\"urgen Schmidhuber, Djork-Arn\'e Clevert, 2 Aug 2025, Fast and scalable retrosynthetic planning with a transformer neural network and speculative beam search, https://arxiv.org/abs/2508.01459
  • Krish Agarwal, Yuqian Jiang, Jiaheng Hu, Bo Liu, Peter Stone, 3 Aug 2025, L3M+P: Lifelong Planning with Large Language Models, https://arxiv.org/abs/2508.01917
  • Alexander Tuisov, Yonatan Vernik and Alexander Shleyfman, 3 Aug 2025, LLM-Generated Heuristics for AI Planning: Do We Even Need Domain-Independence Anymore?, https://arxiv.org/abs/2501.18784
  • Jungkoo Kang, 3 Aug 2025, Scaling LLM Planning: NL2FLOW for Parametric Problem Generation and Rigorous Evaluation, https://arxiv.org/abs/2507.02253
  • An T. Le, Khai Nguyen, Minh Nhat Vu, Jo\~ao Carvalho, Jan Peters, 2 Aug 2025, Model Tensor Planning, https://arxiv.org/abs/2505.01059
  • Zhichen Dong, Zhanhui Zhou, Zhixuan Liu, Chao Yang, Chaochao Lu, 4 Aug 2025, Emergent Response Planning in LLMs, https://arxiv.org/abs/2502.06258
  • Mikhail Soutchanski and Yongmei Liu, 26 Jul 2025, Planning with Dynamically Changing Domains, https://arxiv.org/abs/2508.02697
  • Michael Katz, Harsha Kokel, Sarath Sreedharan, 4 Aug 2025, Seemingly Simple Planning Problems are Computationally Challenging: The Countdown Game, https://arxiv.org/abs/2508.02900
  • Yutong Wang, Pengliang Ji, Kaixin Li, Baolong Bi, Tao Feng, and Guillaume Sartoretti, 5 Aug 2025, Beyond Policy Optimization: A Data Curation Flywheel for Sparse-Reward Long-Horizon Planning, https://arxiv.org/abs/2508.03018
  • Longling Geng and Edward Y. Chang, 5 Aug 2025, REALM-Bench: A Benchmark for Evaluating Multi-Agent Systems on Real-world, Dynamic Planning and Scheduling Tasks, https://arxiv.org/abs/2502.18836
  • Hamza El Alaoui, Atieh Taheri, Yi-Hao Peng, Jeffrey P. Bigham, 6 Aug 2025, StepWrite: Adaptive Planning for Speech-Driven Text Generation, https://arxiv.org/abs/2508.04011
  • Aniket Johri, Divyanshi Dwivedi, Mayukha Pal, 6 Aug 2025, Agentic-AI based Mathematical Framework for Commercialization of Energy Resilience in Electrical Distribution System Planning and Operation, https://arxiv.org/abs/2508.04170
  • Kim Hammar and Tansu Alpcan and Emil C. Lupu, 7 Aug 2025, Incident Response Planning Using a Lightweight Large Language Model with Reduced Hallucination, https://arxiv.org/abs/2508.05188
  • Hongyu Nie, Xu Liu, Zhaotong Tan, Sen Mei, and Wenbo Su, 7 Aug 2025, Unified Linear Parametric Map Modeling and Perception-aware Trajectory Planning for Mobile Robotics, https://arxiv.org/abs/2507.09340
  • Sahil Bansal, Sai Shruthi Sistla, Aarti Arikatala, Sebastian Schreiber, 7 Aug 2025, Planning Agents on an Ego-Trip: Leveraging Hybrid Ego-Graph Ensembles for Improved Tool Retrieval in Enterprise Task Planning, https://arxiv.org/abs/2508.05888
  • Michael Wehrli, Alicia Durrer, Paul Friedrich, Sidaty El Hadramy, Edwin Li, Luana Brahaj, Carol C. Hasler, Philippe C. Cattin, 8 Aug 2025, Towards MR-Based Trochleoplasty Planning, https://arxiv.org/abs/2508.06076
  • Yongchao Chen, Yilun Hao, Yang Zhang, Chuchu Fan, 7 Aug 2025, Code-as-Symbolic-Planner: Foundation Model-Based Robot Planning via Symbolic Code Generation, https://arxiv.org/abs/2503.01700
  • Masataro Asai, 11 Aug 2025, Bilevel MCTS for Amortized O(1) Node Selection in Classical Planning, https://arxiv.org/abs/2508.08385
  • Yuechen Wang, Yuming Qiao, Dan Meng, Jun Yang, Haonan Lu, Zhenyu Yang, Xudong Zhang, 12 Aug 2025, Efficient Agent: Optimizing Planning Capability for Multimodal Retrieval Augmented Generation, https://arxiv.org/abs/2508.08816
  • Maxence Boels, Harry Robertshaw, Thomas C Booth, Prokar Dasgupta, Alejandro Granados, Sebastien Ourselin, 12 Aug 2025, When Imitation Learning Outperforms Reinforcement Learning in Surgical Action Planning, https://arxiv.org/abs/2507.05011
  • Navin Sriram Ravie, Keerthi Vasan M, Asokan Thondiyath and Bijo Sebastian, 28 Apr 2025, QuickGrasp: Lightweight Antipodal Grasp Planning with Point Clouds, https://arxiv.org/abs/2504.19716
  • Kechen Li, Yaotian Tao, Ximing Wen, Quanwei Sun, Zifei Gong, Chang Xu, Xizhe Zhang, Tianbo Ji, 13 Aug 2025, GridRoute: A Benchmark for LLM-Based Route Planning with Cardinal Movement in Grid Environments, https://arxiv.org/abs/2505.24306
  • Qingqing Wang, Liqiang Xiao, Chang Chang, 14 Aug 2025, Learn to optimize for automatic proton PBS treatment planning for H&N cancers, https://arxiv.org/abs/2508.11085
  • David H. Chan, Mark Roberts, Dana S. Nau, 15 Aug 2025, Landmark-Assisted Monte Carlo Planning, https://arxiv.org/abs/2508.11493
  • Rowan Hodson, Bruce Bassett, Charel van Hoof, Benjamin Rosman, Mark Solms, Jonathan P. Shock, Ryan Smith, 14 Aug 2025, Sophisticated Learning: A novel algorithm for active learning during model-based planning, https://arxiv.org/abs/2308.08029
  • Yanming Liu, Xinyue Peng, Jiannan Cao, Yuwei Zhang, Xuhong Zhang, Sheng Cheng, Xun Wang, Jianwei Yin, Tianyu Du, 15 Aug 2025, Tool-Planner: Task Planning with Clusters across Multiple Tools, https://arxiv.org/abs/2406.03807
  • Michael Aichm\"uller, Hector Geffner, 15 Aug 2025, Sketch Decompositions for Classical Planning via Deep Reinforcement Learning, https://arxiv.org/abs/2412.08574
  • Kyle Brown, Dylan M. Asmar, Mac Schwager, and Mykel J. Kochenderfer, 15 Aug 2025, Large-Scale Multi-Robot Assembly Planning for Autonomous Manufacturing, https://arxiv.org/abs/2311.00192
  • Frazier N. Baker, Daniel Adu-Ampratwum, Reza Averly, Botao Yu, Huan Sun, Xia Ning, 16 Aug 2025, LARC: Towards Human-level Constrained Retrosynthesis Planning through an Agentic Framework, https://arxiv.org/abs/2508.11860
  • Chunliang Hua, Xiao Hu, Jiayang Sun, Zeyuan Yang, 18 Aug 2025, The Maximum Coverage Model and Recommendation System for UAV Vertiports Location Planning, https://arxiv.org/abs/2508.12651
  • Wenjie Chen, Wenbin Li, Di Yao, Xuying Meng, Chang Gong, Jingping Bi, 18 Aug 2025, GTool: Graph Enhanced Tool Planning with Large Language Model, https://arxiv.org/abs/2508.12725
  • Petr Anokhin, Roman Khalikov, Stefan Rebrikov, Viktor Volkov, Artyom Sorokin, Vincent Bissonnette, 18 Aug 2025, HeroBench: A Benchmark for Long-Horizon Planning and Structured Reasoning in Virtual Worlds, https://arxiv.org/abs/2508.12782
  • Giovanni Briglia, Francesco Fabiano, Stefano Mariani, 18 Aug 2025, Scaling Multi-Agent Epistemic Planning through GNN-Derived Heuristics, https://arxiv.org/abs/2508.12840
  • Qian Cao and Jielin Chen and Junchao Zhao and Rudi Stouffs, 15 Aug 2025, From Heuristics to Data: Quantifying Site Planning Layout Indicators with Deep Learning and Multi-Modal Data, https://arxiv.org/abs/2508.11723
  • Sangwoo Jeon, Juchul Shin, YeonJe Cho, Gyeong-Tae Kim and Seongwoo Kim, 16 Aug 2025, Integrating Symbolic RL Planning into a BDI-based Autonomous UAV Framework: System Integration and SIL Validation, https://arxiv.org/abs/2508.11890
  • Long Ma, Fangwei Zhong, Yizhou Wang, 18 Aug 2025, Reinforced Context Order Recovery for Adaptive Reasoning and Planning, https://arxiv.org/abs/2508.13070
  • Gokul Puthumanaillam, Aditya Penumarti, Manav Vora, Paulo Padrao, Jose Fuentes, Leonardo Bobadilla, Jane Shin, Melkior Ornik, 16 Aug 2025, Belief-Conditioned One-Step Diffusion: Real-Time Trajectory Planning with Just-Enough Sensing, https://arxiv.org/abs/2508.12166
  • Deqian Kong, Dehong Xu, Minglu Zhao, Bo Pang, Jianwen Xie, Andrew Lizarraga, Yuhao Huang, Sirui Xie, Ying Nian Wu, 18 Aug 2025, Latent Plan Transformer for Trajectory Abstraction: Planning as Latent Space Inference, https://arxiv.org/abs/2402.04647
  • Bernhard Jaeger and Daniel Dauner and Jens Bei{\ss}wenger and Simon Gerstenecker and Kashyap Chitta and Andreas Geiger, 18 Aug 2025, CaRL: Learning Scalable Planning Policies with Simple Rewards, https://arxiv.org/abs/2504.17838
  • Ronit Virwani and Ruchika Suryawanshi, 18 Aug 2025, LOOP: A Plug-and-Play Neuro-Symbolic Framework for Enhancing Planning in Autonomous Systems, https://arxiv.org/abs/2508.13371
  • Minh Hoang Nguyen, Van Dai Do, Dung Nguyen, Thin Nguyen, Hung Le, 19 Aug 2025, CausalPlan: Empowering Efficient LLM Multi-Agent Collaboration Through Causality-Driven Planning, https://arxiv.org/abs/2508.13721
  • Katharina Stein, Nils Hodel, Daniel Fi\v{s}er, J\"org Hoffmann, Michael Katz and Alexander Koller, 19 Aug 2025, Improved Generalized Planning with LLMs through Strategy Refinement and Reflection, https://arxiv.org/abs/2508.13876
  • Kim Hammar and Tao Li, 20 Aug 2025, Online Incident Response Planning under Model Misspecification through Bayesian Learning and Belief Quantization, https://arxiv.org/abs/2508.14385
  • Karin A. Olthof, Matteo Fusagli, Bianca G\"uttner, Tiziano Natali, Bram Westerink, Stefanie Speidel, Theo J.M. Ruers, Koert F.D. Kuhlmann, Andrey Zhylka, 19 Aug 2025, Automated surgical planning with nnU-Net: delineation of the anatomy in hepatobiliary phase MRI, https://arxiv.org/abs/2508.14133
  • Md Mainul Abrar, Xun Jia, Yujie Chi, 19 Aug 2025, New Insights into Automatic Treatment Planning for Cancer Radiotherapy Using Explainable Artificial Intelligence, https://arxiv.org/abs/2508.14229
  • Jo\~ao Vitor de Carvalho Silva and Douglas G. Macharet, 20 Aug 2025, Can LLM Agents Solve Collaborative Tasks? A Study on Urgency-Aware Planning and Coordination, https://arxiv.org/abs/2508.14635
  • Xiaowei Chi, Kuangzhi Ge, Jiaming Liu, Siyuan Zhou, Peidong Jia, Zichen He, Yuzhen Liu, Tingguang Li, Lei Han, Sirui Han, Shanghang Zhang, Yike Guo, 20 Aug 2025, MinD: Learning A Dual-System World Model for Real-Time Planning and Implicit Risk Analysis, https://arxiv.org/abs/2506.18897
  • Wei Yang, Jinwei Xiao, Hongming Zhang, Qingyang Zhang, Yanna Wang, Bo Xu, 21 Aug 2025, Coarse-to-Fine Grounded Memory for LLM Agent Planning, https://arxiv.org/abs/2508.15305
  • Bin Deng, Yizhe Feng, Zeming Liu, Qing Wei, Xiangrong Zhu, Shuai Chen, Yuanfang Guo, Yunhong Wang, 21 Aug 2025, RETAIL: Towards Real-world Travel Planning for Large Language Models, https://arxiv.org/abs/2508.15335
  • Alberto Pozanco, Marianela Morales, Daniel Borrajo, Manuela Veloso, 21 Aug 2025, Planning with Minimal Disruption, https://arxiv.org/abs/2508.15358
  • Deyu Zhang, Xicheng Zhang, Jiahao Li, Tingting Long, Xunhua Dai, Yongjian Fu, Jinrui Zhang, Ju Ren, and Yaoxue Zhang, 21 Aug 2025, LLM-Driven Self-Refinement for Embodied Drone Task Planning, https://arxiv.org/abs/2508.15501
  • Nikita Kachaev, Andrei Spiridonov, Andrey Gorodetsky, Kirill Muravyev, Nikita Oskolkov, Aditya Narendra, Vlad Shakhuro, Dmitry Makarov, Aleksandr I. Panov, Polina Fedotova, Alexey K. Kovalev, 21 Aug 2025, Mind and Motion Aligned: A Joint Evaluation IsaacSim Benchmark for Task Planning and Low-Level Policies in Mobile Manipulation, https://arxiv.org/abs/2508.15663
  • Yiheng Hu, Xiaoyang Wang, Qing Liu, Xiwei Xu, Qian Fu, Wenjie Zhang, Liming Zhu, 22 Aug 2025, MMAPG: A Training-Free Framework for Multimodal Multi-hop Question Answering via Adaptive Planning Graphs, https://arxiv.org/abs/2508.16051
  • Sijie Yang, Binyu Lei, Filip Biljecki, 22 Aug 2025, Urban Comfort Assessment in the Era of Digital Planning: A Multidimensional, Data-driven, and AI-assisted Framework, https://arxiv.org/abs/2508.16057
  • Hichem Cheriet, Khellat Kihel Badra, Chouraqui Samira, 22 Aug 2025, Comparative Analysis of UAV Path Planning Algorithms for Efficient Navigation in Urban 3D Environments, https://arxiv.org/abs/2508.16515
  • Bert de Vries, Wouter Nuijten, Thijs van de Laar, Wouter Kouw, Sepideh Adamiat, Tim Nisslbeck, Mykola Lukashchuk, Hoang Minh Huu Nguyen, Marco Hidalgo Araya, Raphael Tresor, Thijs Jenneskens, Ivana Nikoloska, Raaja Ganapathy Subramanian, Bart van Erp, Dmitry Bagaev and Albert Podusenko, 22 Aug 2025, Expected Free Energy-based Planning as Variational Inference, https://arxiv.org/abs/2504.14898
  • Yulison Herry Chrisnanto and Julian Evan Chrisnanto, 13 Aug 2025, Quantum-Inspired DRL Approach with LSTM and OU Noise for Cut Order Planning Optimization, https://arxiv.org/abs/2508.16611
  • Xing Wei, Yuqi Ouyang, 24 Aug 2025, GPG-HT: Generalized Policy Gradient with History-Aware Decision Transformer for Probabilistic Path Planning, https://arxiv.org/abs/2508.17218
  • Fan Ding, Xuewen Luo, Hwa Hui Tew, Ruturaj Reddy, Xikun Wang, Junn Yong Loo, 23 Aug 2025, Drive As You Like: Strategy-Level Motion Planning Based on A Multi-Head Diffusion Model, https://arxiv.org/abs/2508.16947
  • Jatin Nainani, Sankaran Vaidyanathan, Connor Watts, Andre N. Assis, Alice Rigg, 25 Aug 2025, Detecting and Characterizing Planning in Language Models, https://arxiv.org/abs/2508.18098
  • Arvi Jonnarth, Ola Johansson, Jie Zhao, Michael Felsberg, 23 Aug 2025, Sim-to-Real Transfer of Deep Reinforcement Learning Agents for Online Coverage Path Planning, https://arxiv.org/abs/2406.04920

LLM Long Term Memory

LLM Long Term Memory refers to having the LLM "remember" things that it has learned during inference. By default, an LLM is "stateless" and does not recall facts between queries. Short-term memory can be given via tracking conversational history as "context" for a query, but long term memory is the aim of having an LLM "learn" or "memorize" new facts. Note that this research area is about accuracy of the output, not about the speed optimization of LLM inference memory efficiency.

Research on LLM long term memory:

Agentic Workflow

Agentic workflow has some aspects of reasoning (e.g., planning, multi-step execution) combined with agent technologies. Papers on agentic workflow include:

Temporal Reasoning (Time-Based Logic)

AI models struggle with the concept of time and any sort of "temporal reasoning" that is based on time progression or causation over time.

  • Jonas Wallat, Adam Jatowt, Avishek Anand, March 2024, Temporal Blind Spots in Large Language Models, WSDM '24: Proceedings of the 17th ACM International Conference on Web Search and Data Mining, Pages 683–692, https://arxiv.org/abs/2401.12078, https://doi.org/10.1145/3616855.3635818, https://dl.acm.org/doi/abs/10.1145/3616855.3635818
  • Siheng Xiong, Ali Payani, Ramana Kompella, Faramarz Fekri, 22 Apr 2024 (v3), Large Language Models Can Learn Temporal Reasoning, https://arxiv.org/abs/2401.06853
  • Bowen Zhao, Zander Brumbaugh, Yizhong Wang, Hannaneh Hajishirzi, Noah A. Smith, 26 Feb 2024, Set the Clock: Temporal Alignment of Pretrained Language Models, https://arxiv.org/abs/2402.16797 Code: https://github.com/yizhongw/llm-temporal-alignment
  • 16 Nov 2023, Towards Robust Temporal Reasoning of Large Language Models via a Multi-Hop QA Dataset and Pseudo-Instruction Tuning, Qingyu Tan, Hwee Tou Ng, Lidong Bing, https://arxiv.org/abs/2311.09821
  • Yifu Qiu, Zheng Zhao, Yftah Ziser, Anna Korhonen, Edoardo M. Ponti, Shay B. Cohen, 16 Nov 2023 (v2), Are Large Language Models Temporally Grounded? https://arxiv.org/abs/2311.08398 Code: https://github.com/yfqiu-nlp/temporal-llms
  • Raghav Jain, Daivik Sojitra, Arkadeep Acharya, Sriparna Saha, Adam Jatowt, Sandipan Dandapat, December 2023, Do Language Models Have a Common Sense regarding Time? Revisiting Temporal Commonsense Reasoning in the Era of Large Language Models, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing https://aclanthology.org/2023.emnlp-main.418/ PDF: https://aclanthology.org/2023.emnlp-main.418.pdf
  • Yifan Wei, Yisong Su, Huanhuan Ma, Xiaoyan Yu, Fangyu Lei, Yuanzhe Zhang, Jun Zhao, Kang Liu, 8 Oct 2023, MenatQA: A New Dataset for Testing the Temporal Comprehension and Reasoning Abilities of Large Language Models, https://arxiv.org/abs/2310.05157
  • Himanshu Beniwal, Kowsik Nandagopan D, Mayank Singh, 19 Feb 2024, Remember This Event That Year? Assessing Temporal Information and Reasoning in Large Language Models, https://arxiv.org/abs/2402.11997
  • Bahare Fatemi, Mehran Kazemi, Anton Tsitsulin, Karishma Malkan, Jinyeong Yim, John Palowitch, Sungyong Seo, Jonathan Halcrow, Bryan Perozzi, 13 Jun 2024, Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning, https://arxiv.org/abs/2406.09170
  • Irwin Deng, Kushagra Dixit, Vivek Gupta, Dan Roth, 22 Jul 2024, Enhancing Temporal Understanding in LLMs for Semi-structured Tables, https://arxiv.org/abs/2407.16030
  • Dimitris Spathis, Fahim Kawsar, The first step is the hardest: pitfalls of representing and tokenizing temporal data for large language models, Journal of the American Medical Informatics Association, Volume 31, Issue 9, September 2024, Pages 2151–2158, https://doi.org/10.1093/jamia/ocae090 https://academic.oup.com/jamia/advance-article-abstract/doi/10.1093/jamia/ocae090/7702405?redirectedFrom=fulltext
  • Mayi Xu, Yunfeng Ning, Yongqi Li, Jianhao Chen, Jintao Wen, Yao Xiao, Shen Zhou, Birong Pan, Zepeng Bao, Xin Miao, Hankun Kang, Ke Sun, Tieyun Qian, 2 Jan 2025, Reasoning based on symbolic and parametric knowledge bases: a survey, https://arxiv.org/abs/2501.01030 (Extensive survey of reasoning from CoT to knowledge graphs to table-based reasoning.)
  • Yubin Ge, Salvatore Romeo, Jason Cai, Raphael Shu, Monica Sunkara, Yassine Benajiba, Yi Zhang, 3 Feb 2025, TReMu: Towards Neuro-Symbolic Temporal Reasoning for LLM-Agents with Memory in Multi-Session Dialogues, https://arxiv.org/abs/2502.01630
  • Jongho Kim, Seung-won Hwang, 17 Feb 2025, Counterfactual-Consistency Prompting for Relative Temporal Understanding in Large Language Models, https://arxiv.org/abs/2502.11425
  • Ningke Li, Yahui Song, Kailong Wang, Yuekang Li, Ling Shi, Yi Liu, Haoyu Wang, 19 Feb 2025, Detecting LLM Fact-conflicting Hallucinations Enhanced by Temporal-logic-based Reasoning, https://arxiv.org/abs/2502.13416
  • Yuhan Xie, William Cappelletti, Mahsa Shoaran and Pascal Frossard, 13 Aug 2025, rETF-semiSL: Semi-Supervised Learning for Neural Collapse in Temporal Data, https://arxiv.org/abs/2508.10147
  • Xuanhao Mu, G\"okhan Demirel, Yuzhe Zhang, Jianlei Liu, Thorsten Schlachter and Veit Hagenmeyer, 14 Aug 2025, Self-Supervised Temporal Super-Resolution of Energy Data using Generative Adversarial Transformer, https://arxiv.org/abs/2508.10587
  • Qianru Zhang, Xinyi Gao, Haixin Wang, Dong Huang, Siu-Ming Yiu and Hongzhi Yin, 14 Aug 2025, HGAurban: Heterogeneous Graph Autoencoding for Urban Spatial-Temporal Learning, https://arxiv.org/abs/2410.10915
  • Chandra Raskoti, Iftekharul Islam, Xuan Wang, and Weizi Li, 13 Aug 2025, MIAT: Maneuver-Intention-Aware Transformer for Spatio-Temporal Trajectory Prediction, https://arxiv.org/abs/2504.05059
  • Luca Salvatore Lorello, Nikolaos Manginas, Marco Lippi, Stefano Melacci, 23 Jul 2025, LTLZinc: a Benchmarking Framework for Continual Learning and Neuro-Symbolic Temporal Reasoning, https://arxiv.org/abs/2507.17482
  • Shaohan Li, Hao Yang, Min Chen, Xiaolin Qin, 23 Jul 2025, Met$^2$Net: A Decoupled Two-Stage Spatio-Temporal Forecasting Model for Complex Meteorological Systems, https://arxiv.org/abs/2507.17189
  • Tobias Morocutti, Jonathan Greif, Paul Primus, Florian Schmid, Gerhard Widmer, 23 Jul 2025, On Temporal Guidance and Iterative Refinement in Audio Source Separation, https://arxiv.org/abs/2507.17297
  • Weihua Gao, Chunxu Ren, Wenlong Niu, Xiaodong Peng, 23 Jul 2025, Temporal Point-Supervised Signal Reconstruction: A Human-Annotation-Free Framework for Weak Moving Target Detection, https://arxiv.org/abs/2507.17334
  • Jianhao Chen, Junyang Ren, Wentao Ding, Haoyuan Ouyang, Wei Hu, Yuzhong Qu, 23 Jul 2025, Conflict Detection for Temporal Knowledge Graphs:A Fast Constraint Mining Algorithm and New Benchmarks, https://arxiv.org/abs/2312.11053
  • Guangqiang Li, M. Amine Atoui and Xiangshun Li, 23 Jul 2025, Attention-Based Multiscale Temporal Fusion Network for Uncertain-Mode Fault Diagnosis in Multimode Processes, https://arxiv.org/abs/2504.05172
  • Pascal K\"undig, Fabio Sigrist, 23 Jul 2025, A Spatio-Temporal Machine Learning Model for Mortgage Credit Risk: Default Probabilities and Loan Portfolios, https://arxiv.org/abs/2410.02846
  • Xi Yang, Jiachen Wang, Song Han, Suining He, 21 Jul 2025, Micromobility Flow Prediction: A Bike Sharing Station-level Study via Multi-level Spatial-Temporal Attention Neural Network, https://arxiv.org/abs/2507.16020
  • Chang Li, Yaren Zhang, Haoran Lv, Qiong Cao, Chao Xue, Xiaodong He, 22 Jul 2025, Learning Temporal Abstractions via Variational Homomorphisms in Option-Induced Abstract MDPs, https://arxiv.org/abs/2507.16473
  • Zixiao Huang, Junhao Hu, Hao Lin, Chunyang Zhu, Yueran Tang, Quanlu Zhang, Zhen Guo, Zhenhua Li, Shengen Yan, Zhenhua Zhu, Guohao Dai, Yu Wang, 22 Jul 2025, Reducing GPU Memory Fragmentation via Spatio-Temporal Planning for Efficient Large-Scale Model Training, https://arxiv.org/abs/2507.16274
  • Alireza Dizaji, Benedict Aaron Tjandra, Mehrab Hamidi, Shenyang Huang, Guillaume Rabusseau, 22 Jul 2025, T-GRAB: A Synthetic Diagnostic Benchmark for Learning on Temporal Graphs, https://arxiv.org/abs/2507.10183
  • Shiyuan Zhang, Tong Li, Zhu Xiao, Hongyang Du, Kaibin Huang, 23 Jul 2025, LSDM: LLM-Enhanced Spatio-temporal Diffusion Model for Service-Level Mobile Traffic Prediction, https://arxiv.org/abs/2507.17795
  • Jianchao Wang, Qingfeng Li, Pengcheng Zheng, Xiaorong Pu, Yazhou Ren, 24 Jul 2025, ChronoSelect: Robust Learning with Noisy Labels via Dynamics Temporal Memory, https://arxiv.org/abs/2507.18183
  • Ruizhe Chen, Zhiting Fan, Tianze Luo, Heqing Zou, Zhaopeng Feng, Guiyang Xie, Hansheng Zhang, Zhuochen Wang, Zuozhu Liu, Huaijian Zhang, 24 Jul 2025, Datasets and Recipes for Video Temporal Grounding via Reinforcement Learning, https://arxiv.org/abs/2507.18100
  • Edward Fish and Andrew Gilbert, 24 Jul 2025, PLOT-TAL: Prompt Learning with Optimal Transport for Few-Shot Temporal Action Localization, https://arxiv.org/abs/2403.18915
  • Haoyang Li, Yuming Xu, Yiming Li, Hanmo Liu, Darian Li, Chen Jason Zhang, Lei Chen, Qing Li, 18 Jul 2025, When Speed meets Accuracy: an Efficient and Effective Graph Model for Temporal Link Prediction, https://arxiv.org/abs/2507.13825
  • Pedro Cabalar, Mart\'in Di\'eguez, Fran\c{c}ois Olivier, Torsten Schaub and Igor St\'ephan, 18 Jul 2025, Towards Constraint Temporal Answer Set Programming, https://arxiv.org/abs/2507.13958
  • Itay Katav, Aryeh Kontorovich, 18 Jul 2025, ParallelTime: Dynamically Weighting the Balance of Short- and Long-Term Temporal Dependencies, https://arxiv.org/abs/2507.13998
  • Jianhong Chen, Meng Zhao, Mostafa Reisi Gahrooei, Xubo Yue, 18 Jul 2025, Toward Temporal Causal Representation Learning with Tensor Decomposition, https://arxiv.org/abs/2507.14126
  • Sirui Wang, Zhou Guan, Bingxi Zhao, Tongjia Gu, 17 Jul 2025, CaSTFormer: Causal Spatio-Temporal Transformer for Driving Intention Prediction, https://arxiv.org/abs/2507.13425
  • Garapati Keerthana, Manik Gupta, 18 Jul 2025, DENSE: Longitudinal Progress Note Generation with Temporal Modeling of Heterogeneous Clinical Notes Across Hospital Visits, https://arxiv.org/abs/2507.14079
  • Haoyu He, Haozheng Luo, Yan Chen, Qi R. Wang, 18 Jul 2025, Efficient Temporal Tokenization for Mobility Prediction with Large Language Models, https://arxiv.org/abs/2507.14017
  • Jiayu Song, Mahmud Elahi Akhter, Dana Atzil Slonim, Maria Liakata, 18 Jul 2025, Temporal reasoning for timeline summarisation in social media, https://arxiv.org/abs/2501.00152
  • Lingyu Li, Yang Yao, Yixu Wang, Chubo Li, Yan Teng, Yingchun Wang, 21 Jul 2025, The Other Mind: How Language Models Exhibit Human Temporal Cognition, https://arxiv.org/abs/2507.15851
  • Xuetao Lin (1 and 2), Tianhao Peng (1 and 2), Peihong Dai (1 and 2), Yu Liang (3), Wenjun Wu (1 and 2) ((1) Beihang University, Beijing, China, (2) SKLCCSE, Beijing, China, (3) Beijing University of Technology, Beijing, China), 19 Jul 2025, Spatial-Temporal Transformer with Curriculum Learning for EEG-Based Emotion Recognition, https://arxiv.org/abs/2507.14698
  • Mehak Arora, Ayman Ali, Kaiyuan Wu, Carolyn Davis, Takashi Shimazui, Mahmoud Alwakeel, Victor Moas, Philip Yang, Annette Esper, Rishikesan Kamaleswaran, 19 Jul 2025, CXR-TFT: Multi-Modal Temporal Fusion Transformer for Predicting Chest X-ray Trajectories, https://arxiv.org/abs/2507.14766
  • Rabia Latief Bhat and Iqra Altaf Gillani, 21 Jul 2025, Spatio-Temporal Demand Prediction for Food Delivery Using Attention-Driven Graph Neural Networks, https://arxiv.org/abs/2507.15246
  • Matthew J. Bryan, Felix Schwock, Azadeh Yazdan-Shahmorad, Rajesh P N Rao, 21 Jul 2025, Temporal Basis Function Models for Closed-Loop Neural Stimulation, https://arxiv.org/abs/2507.15274
  • Xinxin Dong, Baoyun Peng, Haokai Ma, Yufei Wang, Zixuan Dong, Fei Hu, Xiaodong Wang, 20 Jul 2025, LeAdQA: LLM-Driven Context-Aware Temporal Grounding for Video Question Answering, https://arxiv.org/abs/2507.14784
  • Shaohang Wei, Wei Li, Feifan Song, Wen Luo, Tianyi Zhuang, Haochen Tan, Zhijiang Guo, Houfeng Wang, 19 Jul 2025, TIME: A Multi-level Benchmark for Temporal Reasoning of LLMs in Real-World Scenarios, https://arxiv.org/abs/2505.12891
  • Duygu Sezen Islakoglu, Jan-Christoph Kalo, 21 Jul 2025, ChronoSense: Exploring Temporal Understanding in Large Language Models with Time Intervals of Events, https://arxiv.org/abs/2501.03040
  • Luo Ji, Gao Liu, Mingyang Yin, Hongxia Yang, Jingren Zhou, 19 Jul 2025, Hierarchical Reinforcement Learning for Temporal Abstraction of Listwise Recommendation, https://arxiv.org/abs/2409.07416
  • Yijing Lin, Mengqi Huang, Shuhan Zhuang, Zhendong Mao, 20 Jul 2025, RealGeneral: Unifying Visual Generation via Temporal In-Context Learning with Video Models, https://arxiv.org/abs/2503.10406
  • Bo-Cheng Chiu, Jen-Jee Chen, Yu-Chee Tseng and Feng-Chi Chen, 21 Jul 2025, DaMO: A Data-Efficient Multimodal Orchestrator for Temporal Reasoning with Video LLMs, https://arxiv.org/abs/2506.11558
  • Yiming Yang, Yueru Luo, Bingkun He, Hongbin Lin, Suzhong Fu, Chao Zheng, Zhipeng Cao, Erlong Li, Chao Yan, Shuguang Cui, Zhen Li, 20 Jul 2025, TopoStreamer: Temporal Lane Segment Topology Reasoning in Autonomous Driving, https://arxiv.org/abs/2507.00709
  • Agnideep Aich, Ashit Baran Aich, Dipak C. Jain, 21 Jul 2025, Temporal Conformal Prediction (TCP): A Distribution-Free Statistical and Machine Learning Framework for Adaptive Risk Forecasting, https://arxiv.org/abs/2507.05470
  • Zhaoyu Chen, Hongnan Lin, Yongwei Nie, Fei Ma, Xuemiao Xu, Fei Yu, Chengjiang Long, 10 Aug 2025, Invert4TVG: A Temporal Video Grounding Framework with Inversion Tasks for Enhanced Action Understanding, https://arxiv.org/abs/2508.07388
  • Jie Li, Haoye Dong, Zhengyang Wu, Zetao Zheng, Mingrong Lin, 11 Aug 2025, Disentangling Multiplex Spatial-Temporal Transition Graph Representation Learning for Socially Enhanced POI Recommendation, https://arxiv.org/abs/2508.07649
  • Yiran Huang, Amirhossein Nouranizadeh, Christine Ahrends, Mengjia Xu, 9 Aug 2025, BrainATCL: Adaptive Temporal Brain Connectivity Learning for Functional Link Prediction and Age Estimation, https://arxiv.org/abs/2508.07106
  • Sichen Zhao, Wei Shao, Jeffrey Chan, Ziqi Xu, Flora Salim, 11 Aug 2025, FairDRL-ST: Disentangled Representation Learning for Fair Spatio-Temporal Mobility Prediction, https://arxiv.org/abs/2508.07518
  • Mohammed-Khalil Ghali, Cecil Pang, Oscar Molina, Carlos Gershenson-Garcia, Daehan Won, 24 Jul 2025, Forecasting Commodity Price Shocks Using Temporal and Semantic Fusion of Prices Signals and Agentic Generative AI Extracted Economic News, https://arxiv.org/abs/2508.06497
  • Zihao Sheng, Zilin Huang, Yen-Jung Chen, Yansong Qu, Yuhao Luo, Yue Leng, Sikai Chen, 9 Aug 2025, SafePLUG: Empowering Multimodal LLMs with Pixel-Level Insight and Temporal Grounding for Traffic Accident Understanding, https://arxiv.org/abs/2508.06763
  • Yanru Sun, Emadeldeen Eldele, Zongxia Xie, Yucheng Wang, Wenzhe Niu, Qinghua Hu, Chee Keong Kwoh, Min Wu, 10 Aug 2025, Adapting LLMs to Time Series Forecasting via Temporal Heterogeneity Modeling and Semantic Alignment, https://arxiv.org/abs/2508.07195
  • Chaohong Guo, Xun Mo, Yongwei Nie, Xuemiao Xu, Chao Xu, Fei Yu, and Chengjiang Long, 11 Aug 2025, TAR-TVG: Enhancing VLMs with Timestamp Anchor-Constrained Reasoning for Temporal Video Grounding, https://arxiv.org/abs/2508.07683
  • Zhuqiang Lu, Zhenfei Yin, Mengwei He, Zhihui Wang, Zicheng Liu, Zhiyong Wang and Kun Hu, 11 Aug 2025, B-VLLM: A Vision Large Language Model with Balanced Spatio-Temporal Tokens, https://arxiv.org/abs/2412.09919
  • Jiawen Qi, Chang Gao, Zhaochun Ren, Qinyu Chen, 25 Jul 2025, DeltaLLM: A Training-Free Framework Exploiting Temporal Sparsity for Efficient Edge LLM Inference, https://arxiv.org/abs/2507.19608
  • Pritom Ray Nobin, Imran Ahammad Rifat, 28 Jul 2025, STARN-GAT: A Multi-Modal Spatio-Temporal Graph Attention Network for Accident Severity Prediction, https://arxiv.org/abs/2507.20451
  • Yongzheng Liu, Yiming Wang, Po Xu, Yingjie Xu, Yuntian Chen, Dongxiao Zhang, 28 Jul 2025, BuildSTG: A Multi-building Energy Load Forecasting Method using Spatio-Temporal Graph Neural Network, https://arxiv.org/abs/2507.20838
  • Gongli Xi, Ye Tian, Yannan Hu, Yuchao Zhang, Yapeng Niu and Xiangyang Gong, 27 Jul 2025, Packet-Level DDoS Data Augmentation Using Dual-Stream Temporal-Field Diffusion, https://arxiv.org/abs/2507.20115
  • Javier Sol\'is-Garc\'ia, Bel\'en Vega-M\'arquez, Juan A. Nepomuceno, Isabel A. Nepomuceno-Chamorro, 26 Jul 2025, CoSTI: Consistency Models for (a faster) Spatio-Temporal Imputation, https://arxiv.org/abs/2501.19364
  • Dyuman Aditya, Colton Payne, Mario Leiva, Paulo Shakarian, 27 Jul 2025, Machine Learning Model Integration with Open World Temporal Logic for Process Automation, https://arxiv.org/abs/2506.17776
  • Lei Zheng, Ning Li, Weinan Zhang, Yong Yu, 27 Jul 2025, Retrieval and Distill: A Temporal Data Shift-Free Paradigm for Online Recommendation System, https://arxiv.org/abs/2404.15678
  • Yu Tai, Xinglong Wu, Hongwei Yang, Hui He, Duanjing Chen, Yuanming Shao and Weizhe Zhang, 28 Jul 2025, How to Bridge Spatial and Temporal Heterogeneity in Link Prediction? A Contrastive Method, https://arxiv.org/abs/2411.00612
  • Pallavi Zambare, Venkata Nikhil Thanikella, Ying Liu, 25 Jul 2025, Seeing Beyond Frames: Zero-Shot Pedestrian Intention Prediction with Raw Temporal Video and Multimodal Cues, https://arxiv.org/abs/2507.21161
  • Jing Ren, Suyu Ma, Hong Jia, Xiwei Xu, Ivan Lee, Haytham Fayek, Xiaodong Li, Feng Xia, 29 Jul 2025, LiteFat: Lightweight Spatio-Temporal Graph Learning for Real-Time Driver Fatigue Detection, https://arxiv.org/abs/2507.21756
  • Zhengpeng Feng, Clement Atzberger, Sadiq Jaffer, Jovana Knezevic, Silja Sormunen, Robin Young, Madeline C Lisaius, Markus Immitzer, David A. Coomes, Anil Madhavapeddy, Andrew Blake and Srinivasan Keshav, 29 Jul 2025, TESSERA: Temporal Embeddings of Surface Spectra for Earth Representation and Analysis, https://arxiv.org/abs/2506.20380
  • Boyuan Zheng and Victor W. Chu, 30 Jul 2025, Multi-Hazard Early Warning Systems for Agriculture with Featural-Temporal Explanations, https://arxiv.org/abs/2507.22962
  • Sofiane Bouaziz, Adel Hafiane, Raphael Canals, Rachid Nedjai, 30 Jul 2025, FuseTen: A Generative Model for Daily 10 m Land Surface Temperature Estimation from Spatio-Temporal Satellite Observations, https://arxiv.org/abs/2507.23154
  • Molly Wang, Kin.K Leung, 31 Jul 2025, Spatial-Temporal Reinforcement Learning for Network Routing with Non-Markovian Traffic, https://arxiv.org/abs/2507.22174
  • Shahla John, 30 Jul 2025, Efficient Spatial-Temporal Modeling for Real-Time Video Analysis: A Unified Framework for Action Recognition and Object Tracking, https://arxiv.org/abs/2507.22421
  • Mohammed Kamran, Maria Bernathova, Raoul Varga, Christian Singer, Zsuzsanna Bago-Horvath, Thomas Helbich, Georg Langs, Philipp Seeb\"ock, 1 Aug 2025, LesiOnTime -- Joint Temporal and Clinical Modeling for Small Breast Lesion Segmentation in Longitudinal DCE-MRI, https://arxiv.org/abs/2508.00496
  • Camille Bourgaux, Anton Gnatenko, Micha\"el Thomazo, 1 Aug 2025, Analysing Temporal Reasoning in Description Logics Using Formal Grammars, https://arxiv.org/abs/2508.00575
  • Mingyu Kang, Duxin Chen, Ning Meng, Gang Yan and Wenwu Yu, 1 Aug 2025, Identifying Unique Spatial-Temporal Bayesian Network without Markov Equivalence, https://arxiv.org/abs/2211.10085
  • Yujing Ke and Kevin George and Kathan Pandya and David Blumenthal and Maximilian Sprang and Gerrit Gro{\ss}mann and Sebastian Vollmer and David Antony Selby, 2 Aug 2025, BioDisco: Multi-agent hypothesis generation with dual-mode evidence, iterative feedback and temporal evaluation, https://arxiv.org/abs/2508.01285
  • Jingtian Yan, Stephen F. Smith, Jiaoyang Li, 2 Aug 2025, WinkTPG: An Execution Framework for Multi-Agent Path Finding Using Temporal Reasoning, https://arxiv.org/abs/2508.01495
  • Zijian Guo, \.Ilker I\c{s}{\i}k, H. M. Sabbir Ahmad, Wenchao Li, 3 Aug 2025, One Subgoal at a Time: Zero-Shot Generalization to Arbitrary Linear Temporal Logic Requirements in Multi-Task Reinforcement Learning, https://arxiv.org/abs/2508.01561
  • Dong Li, Yichen Niu, Ying Ai, Xiang Zou, Biqing Qi, Jianxing Liu, 3 Aug 2025, T-GRAG: A Dynamic GraphRAG Framework for Resolving Temporal Conflicts and Redundancy in Knowledge Retrieval, https://arxiv.org/abs/2508.01680
  • Zhenan Lin, Yuni Lai, Wai Lun Lo, Richard Tai-Chiu Hsung, Harris Sik-Ho Tsang, Xiaoyu Xue, Kai Zhou, Yulin Zhu, 25 Jul 2025, Multi-Grained Temporal-Spatial Graph Learning for Stable Traffic Flow Forecasting, https://arxiv.org/abs/2508.00884
  • Bang Hu, Changze Lv, Mingjie Li, Yunpeng Liu, Xiaoqing Zheng, Fengzhe Zhang, Wei cao, Fan Zhang, 4 Aug 2025, SpikeSTAG: Spatial-Temporal Forecasting via GNN-SNN Collaboration, https://arxiv.org/abs/2508.02069
  • Wei Hao, Bin Chong, Ronghua Ji, and Chen Hou, 4 Aug 2025, User Trajectory Prediction Unifying Global and Local Temporal Information, https://arxiv.org/abs/2508.02161
  • Erhang Zhang, Junyi Ma, Yin-Dong Zheng, Yixuan Zhou, Hesheng Wang, 4 Jun 2025, Zero-Shot Temporal Interaction Localization for Egocentric Videos, https://arxiv.org/abs/2506.03662
  • Jose M. S\'anchez Vel\'azquez, Mingbo Cai, Andrew Coney, \'Alvaro J. Garc\'ia- Tejedor, Alberto Nogales, 28 Jul 2025, Benefits of Feature Extraction and Temporal Sequence Analysis for Video Frame Prediction: An Evaluation of Hybrid Deep Learning Models, https://arxiv.org/abs/2508.00898
  • Weihong Li, Shaohua Dong, Haonan Lu, Yanhao Zhang, Heng Fan, Libo Zhang, 3 Aug 2025, DMTrack: Spatio-Temporal Multimodal Tracking via Dual-Adapter, https://arxiv.org/abs/2508.01592
  • Zhaoyu Hu, Hao Guo, Yuan Tian, Erpeng Xue, Jianyang Wang, Xianyang Qi, Hongxiang Lin, Lei Wang, Sheng Chen, 4 Aug 2025, Dynamic Forgetting and Spatio-Temporal Periodic Interest Modeling for Local-Life Service Recommendation, https://arxiv.org/abs/2508.02451
  • Yixuan He, Aaron Sandel, David Wipf, Mihai Cucuringu, John Mitani, Gesine Reinert, 3 Aug 2025, Learning to Fuse Temporal Proximity Networks: A Case Study in Chimpanzee Social Interactions, https://arxiv.org/abs/2502.00302
  • Fengbin Zhu, Junfeng Li, Liangming Pan, Wenjie Wang, Fuli Feng, Chao Wang, Huanbo Luan, Tat-Seng Chua, 3 Aug 2025, Towards Temporal-Aware Multi-Modal Retrieval Augmented Generation in Finance, https://arxiv.org/abs/2503.05185
  • Yihe Wang, Nadia Mammone, Darina Petrovsky, Alexandros T. Tzallas, Francesco C. Morabito, Xiang Zhang, 4 Aug 2025, ADformer: A Multi-Granularity Spatial-Temporal Transformer for EEG-Based Alzheimer Detection, https://arxiv.org/abs/2409.00032
  • Osama Mohammed, Jiaxin Pan, Mojtaba Nayyeri, Daniel Hern\'andez and Steffen Staab, 5 Aug 2025, Full-History Graphs with Edge-Type Decoupled Networks for Temporal Reasoning, https://arxiv.org/abs/2508.03251
  • Irene Ferfoglia, Simone Silvetti, Gaia Saveri, Laura Nenzi, Luca Bortolussi, 5 Aug 2025, Towards Interpretable Concept Learning over Time Series via Temporal Logic Semantics, https://arxiv.org/abs/2508.03269
  • Evangelos Sariyanidi, John D. Herrington, Lisa Yankowitz, Pratik Chaudhari, Theodore D. Satterthwaite, Casey J. Zampella, Robert T. Schultz, Russell T. Shinohara, Birkan Tunc, 29 Jul 2025, Measuring Dependencies between Biological Signals with Temporal Self-supervision, and its Limitations, https://arxiv.org/abs/2508.02703
  • Yi Zhang, Nikolaos Farmakidis, Ioannis Roumpos, Miltiadis Moralis-Pegios, Apostolos Tsakyridis, June Sang Lee, Bowei Dong, Yuhan He, Samarth Aggarwal, Nikolaos Pleros and Harish Bhaskaran, 5 Aug 2025, All-optical temporal integration mediated by subwavelength heat antennas, https://arxiv.org/abs/2505.04405
  • Amin Farajzadeh, Hongzhao Zheng, Sarah Dumoulin, Trevor Ha, Halim Yanikomeroglu, Amir Ghasemi, 5 Aug 2025, Data-Driven Spectrum Demand Prediction: A Spatio-Temporal Framework with Transfer Learning, https://arxiv.org/abs/2508.03863
  • Krishnakanta Barik and Goutam Paul, 6 Aug 2025, Quantum Temporal Fusion Transformer, https://arxiv.org/abs/2508.04048
  • Xiangzhe Xu, Guangyu Shen, Zian Su, Siyuan Cheng, Hanxi Guo, Lu Yan, Xuan Chen, Jiasheng Jiang, Xiaolong Jin, Chengpeng Wang, Zhuo Zhang, Xiangyu Zhang, 5 Aug 2025, ASTRA: Autonomous Spatial-Temporal Red-teaming for AI Software Assistants, https://arxiv.org/abs/2508.03936
  • Zhihao Wen, Yuan Fang, Pengcheng Wei, Fayao Liu, Zhenghua Chen, Min Wu, 6 Aug 2025, Temporal and Heterogeneous Graph Neural Network for Remaining Useful Life Prediction, https://arxiv.org/abs/2405.04336
  • Keivan Faghih Niresi, Ismail Nejjar, Olga Fink, 6 Aug 2025, Efficient Unsupervised Domain Adaptation Regression for Spatial-Temporal Sensor Fusion, https://arxiv.org/abs/2411.06917
  • Chin-Chia Michael Yeh, Xiran Fan, Zhimeng Jiang, Yujie Fan, Huiyuan Chen, Uday Singh Saini, Vivian Lai, Xin Dai, Junpeng Wang, Zhongfang Zhuang, Liang Wang, Yan Zheng, 6 Aug 2025, UltraSTF: Ultra-Compact Model for Large-Scale Spatio-Temporal Forecasting, https://arxiv.org/abs/2502.20634
  • Luis Mandl and Dibyajyoti Nayak and Tim Ricken and Somdatta Goswami, 7 Aug 2025, Physics-Informed Time-Integrated DeepONet: Temporal Tangent Space Operator Learning for High-Accuracy Inference, https://arxiv.org/abs/2508.05190
  • Shuonan Yang, Tailin Chen, Rahul Singh, Jiangbei Yue, Jianbo Jiao, Zeyu Fu, 6 Aug 2025, Revealing Temporal Label Noise in Multimodal Hateful Video Classification, https://arxiv.org/abs/2508.04900
  • Zhu Xu, Ting Lei, Zhimin Li, Guan Wang, Qingchao Chen, Yuxin Peng, Yang liu, 7 Aug 2025, TRKT: Weakly Supervised Dynamic Scene Graph Generation with Temporal-enhanced Relation-aware Knowledge Transferring, https://arxiv.org/abs/2508.04943
  • Long Yang, Lianqing Zheng, Wenjin Ai, Minghao Liu, Sen Li, Qunshu Lin, Shengyu Yan, Jie Bai, Zhixiong Ma, Tao Huang and Xichan Zhu, 7 Aug 2025, MetaOcc: Spatio-Temporal Fusion of Surround-View 4D Radar and Camera for 3D Occupancy Prediction with Dual Training Strategies, https://arxiv.org/abs/2501.15384
  • Serkan Sulun, Paula Viana, Matthew E. P. Davies, 7 Aug 2025, Video Soundtrack Generation by Aligning Emotions and Temporal Boundaries, https://arxiv.org/abs/2502.10154
  • Wenhao Dong, Yueyang Li, Weiming Zeng, Lei Chen, Hongjie Yan, Wai Ting Siok, and Nizhuan Wang, 7 Aug 2025, STARFormer: A Novel Spatio-Temporal Aggregation Reorganization Transformer of FMRI for Brain Disorder Diagnosis, https://arxiv.org/abs/2501.00378
  • Barak Gahtan, Alex M. Bronstein, 8 Aug 2025, Architecture-Aware Generalization Bounds for Temporal Networks: Theory and Fair Comparison Methodology, https://arxiv.org/abs/2508.06066
  • Yidong Wang, Xin Wang, Cunxiang Wang, Junfeng Fang, Qiufeng Wang, Jianing Chu, Xuran Meng, Shuxun Yang, Libo Qin, Yue Zhang, Wei Ye, Shikun Zhang, 8 Aug 2025, Temporal Self-Rewarding Language Models: Decoupling Chosen-Rejected via Past-Future, https://arxiv.org/abs/2508.06026
  • Sofiane Bouaziz, Adel Hafiane, Raphael Canals, Rachid Nedjai, 8 Aug 2025, WGAST: Weakly-Supervised Generative Network for Daily 10 m Land Surface Temperature Estimation via Spatio-Temporal Fusion, https://arxiv.org/abs/2508.06485
  • Zibo Liu, Zhe Jiang, Zelin Xu, Tingsong Xiao, Zhengkun Xiao, Yupu zhang, Haibo Wang, and Shigang Chen, 8 Aug 2025, Spatio-Temporal Partial Sensing Forecast for Long-term Traffic, https://arxiv.org/abs/2408.02689
  • Ignatius Rollere, Caspian Hartsfield, Seraphina Courtenay, Lucian Fenwick, Aurelia Grunwald, 8 Aug 2025, Algorithmic Segmentation and Behavioral Profiling for Ransomware Detection Using Temporal-Correlation Graphs, https://arxiv.org/abs/2501.17429
  • Abhishek Rajgaria, Kushagra Dixit, Mayank Vyas, Harshavardhan Kalalbandi, Dan Roth, Vivek Gupta, 7 Aug 2025, No Universal Prompt: Unifying Reasoning through Adaptive Prompting for Temporal Table Reasoning, https://arxiv.org/abs/2506.11246
  • Ningning Fu, Shengheng Liu, Weiliang Xie, Yongming Huang, 1 Aug 2025, Multi-grained spatial-temporal feature complementarity for accurate online cellular traffic prediction, https://arxiv.org/abs/2508.08281
  • Milad Sabouri, Masoud Mansoury, Kun Lin, Bamshad Mobasher, 11 Aug 2025, Temporal User Profiling with LLMs: Balancing Short-Term and Long-Term Preferences for Recommendations, https://arxiv.org/abs/2508.08454
  • Milad Sabouri, Masoud Mansoury, Kun Lin, Bamshad Mobasher, 11 Aug 2025, Using LLMs to Capture Users' Temporal Context for Recommendation, https://arxiv.org/abs/2508.08512
  • Edith Elkind, Tzeh Yuan Neoh, Nicholas Teh, 12 Aug 2025, Not in My Backyard! Temporal Voting Over Public Chores, https://arxiv.org/abs/2508.08810
  • Ziyi Guo and Yan Wang, 12 Aug 2025, Urban-STA4CLC: Urban Theory-Informed Spatio-Temporal Attention Model for Predicting Post-Disaster Commercial Land Use Change, https://arxiv.org/abs/2508.08976
  • Maxim A. Patratskiy, Alexey K. Kovalev, Aleksandr I. Panov, 12 Aug 2025, Spatial Traces: Enhancing VLA Models with Spatial-Temporal Understanding, https://arxiv.org/abs/2508.09032
  • Wen Wang, Bozhen Fang, Chenchen Jing, Yongliang Shen, Yangyi Shen, Qiuyu Wang, Hao Ouyang, Hao Chen, Chunhua Shen, 12 Aug 2025, Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models, https://arxiv.org/abs/2508.09138
  • Yunhua Pei and John Cartlidge and Anandadeep Mandal and Daniel Gold and Enrique Marcilio and Riccardo Mazzon, 12 Aug 2025, Cross-Modal Temporal Fusion for Financial Market Forecasting, https://arxiv.org/abs/2504.13522
  • Victor Shea-Jay Huang, Le Zhuo, Yi Xin, Zhaokai Wang, Fu-Yun Wang, Yuchi Wang, Renrui Zhang, Peng Gao, Hongsheng Li, 12 Aug 2025, TIDE : Temporal-Aware Sparse Autoencoders for Interpretable Diffusion Transformers in Image Generation, https://arxiv.org/abs/2503.07050
  • Yanlai Yang and Mengye Ren, 11 Aug 2025, Memory Storyboard: Leveraging Temporal Segmentation for Streaming Self-Supervised Learning from Egocentric Videos, https://arxiv.org/abs/2501.12254
  • Yue Yao, Zhen Xu, Youzhu Liu, Kunyuan Ma, Yuxiu Lin, Mohan Jiang, 13 Aug 2025, Integrating Feature Attention and Temporal Modeling for Collaborative Financial Risk Assessment, https://arxiv.org/abs/2508.09399
  • Faruk Alpay, Bugra Kilictas, Hamdi Alakkad, 13 Aug 2025, Temporal Anchoring in Deepening Embedding Spaces: Event-Indexed Projections, Drift, Convergence, and an Internal Computational Architecture, https://arxiv.org/abs/2508.09693
  • Wouter M. Kouw, 13 Aug 2025, Bayesian autoregression to optimize temporal Mat\'ern kernel Gaussian process hyperparameters, https://arxiv.org/abs/2508.09792
  • Jihang Wang, Dongcheng Zhao, Ruolin Chen, Qian Zhang, Yi Zeng, 15 Aug 2025, Boosting the Robustness-Accuracy Trade-off of SNNs by Robust Temporal Self-Ensemble, https://arxiv.org/abs/2508.11279
  • Changhong Jing, Yan Liu, Shuqiang Wang, Bruce X.B. Yu, Gong Chen, Zhejing Hu, Zhi Zhang, Yanyan Shen, 15 Aug 2025, PTSM: Physiology-aware and Task-invariant Spatio-temporal Modeling for Cross-Subject EEG Decoding, https://arxiv.org/abs/2508.11357
  • Ahmad Mousavi, Yeganeh Abdollahinejad, Roberto Corizzo, Nathalie Japkowicz, and Zois Boukouvalas, 15 Aug 2025, E-CaTCH: Event-Centric Cross-Modal Attention with Temporal Consistency and Class-Imbalance Handling for Misinformation Detection, https://arxiv.org/abs/2508.11197
  • Rahmat K. Adesunkanmi, Ashfaq Khokhar, Goce Trajcevski, Sohail Murad, 17 Aug 2025, Root Cause Analysis of Hydrogen Bond Separation in Spatio-Temporal Molecular Dynamics using Causal Models, https://arxiv.org/abs/2508.12500
  • Xiangxiang Cui, Min Zhao, Dongmei Zhi, Shile Qi, Vince D Calhoun, Jing Sui, 15 Aug 2025, BRIEF: BRain-Inspired network connection search with Extensive temporal feature Fusion enhances disease classification, https://arxiv.org/abs/2508.11732
  • Sishun Liu, Ke Deng, Xiuzhen Zhang, Yan Wang, 16 Aug 2025, Learning Marked Temporal Point Process Explanations based on Counterfactual and Factual Reasoning, https://arxiv.org/abs/2508.11943
  • Haolong Chen, Liang Zhang, Zhengyuan Xin, Guangxu Zhu, 17 Aug 2025, STM3: Mixture of Multiscale Mamba for Long-Term Spatio-Temporal Time-Series Prediction, https://arxiv.org/abs/2508.12247
  • Ismail Lamaakal, Chaymae Yahyati, Khalid El Makkaoui, Ibrahim Ouahbi, Yassine Maleh, 18 Aug 2025, TCUQ: Single-Pass Uncertainty Quantification from Temporal Consistency with Streaming Conformal Calibration for TinyML, https://arxiv.org/abs/2508.12905
  • Alicja Ziarko, Michal Bortkiewicz, Michal Zawalski, Benjamin Eysenbach and Piotr Milos, 18 Aug 2025, Contrastive Representations for Temporal Reasoning, https://arxiv.org/abs/2508.13113
  • Yueyang Liu, Lance Kennedy, Ruochen Kong, Joon-Seok Kim, Andreas Z\"ufle, 18 Aug 2025, Training Machine Learning Models on Human Spatio-temporal Mobility Data: An Experimental Study [Experiment Paper], https://arxiv.org/abs/2508.13135
  • Friedhelm Hamann, Emil Mededovic, Fabian G\"ulhan, Yuli Wu, Johannes Stegmaier, Jing He, Yiqing Wang, Kexin Zhang, Lingling Li, Licheng Jiao, Mengru Ma, Hongxiang Huang, Yuhao Yan, Hongwei Ren, Xiaopeng Lin, Yulong Huang, Bojun Cheng, Se Hyun Lee, Gyu Sung Ham, Kanghan Oh, Gi Hyun Lim, Boxuan Yang, Bowen Du, Guillermo Gallego, 18 Aug 2025, SIS-Challenge: Event-based Spatio-temporal Instance Segmentation Challenge at the CVPR 2025 Event-based Vision Workshop, https://arxiv.org/abs/2508.12813
  • Yangchen Pan, Junfeng Wen, Chenjun Xiao, Philip Torr, 18 Aug 2025, An MRP Formulation for Supervised Learning: Generalized Temporal Difference Learning Models, https://arxiv.org/abs/2404.15518
  • Zhiyuan Zheng, Jianpeng Qi, Jiantao Li, Guoqing Chao, Junyu Dong, Yanwei Yu, 18 Aug 2025, Efficient Discovery of Motif Transition Process for Large-Scale Temporal Graphs, https://arxiv.org/abs/2504.15979
  • Jiayu Fang, Zhiqi Shao, S T Boris Choy, Junbin Gao, 19 Aug 2025, STPFormer: A State-of-the-Art Pattern-Aware Spatio-Temporal Transformer for Traffic Forecasting, https://arxiv.org/abs/2508.13433
  • Su Chen, Xiaohua Qi, Xixun Lin, Yanmin Shang, Xiaolin Xu and Yangxi Li, 17 Aug 2025, Deep Graph Neural Point Process For Learning Temporal Interactive Networks, https://arxiv.org/abs/2508.13219
  • Tinh-Anh Nguyen-Nhu, Triet Dao Hoang Minh, Dat To-Thanh, Phuc Le-Gia, Tuan Vo-Lan, Tien-Huy Nguyen, 19 Aug 2025, STER-VLM: Spatio-Temporal With Enhanced Reference Vision-Language Models, https://arxiv.org/abs/2508.13470
  • Zongyuan Huang, Weipeng Wang, Shaoyu Huang, Marta C. Gonzalez, Yaohui Jin, Yanyan Xu, 19 Aug 2025, Where to Go Next Day: Multi-scale Spatial-Temporal Decoupled Model for Mid-term Human Mobility Prediction, https://arxiv.org/abs/2501.06561
  • Qianang Zhou, Junhui Hou, Meiyi Yang, Yongjian Deng, Youfu Li, Junlin Xiong, 19 Aug 2025, Spatially-guided Temporal Aggregation for Robust Event-RGB Optical Flow Estimation, https://arxiv.org/abs/2501.00838
  • Songyu Ke and Chenyu Wu and Yuxuan Liang and Xiuwen Yi and Yanping Sun and Junbo Zhang and Yu Zheng, 13 Aug 2025, GeoMAE: Masking Representation Learning for Spatio-Temporal Graph Forecasting with Missing Values, https://arxiv.org/abs/2508.14083
  • Donghwa Kang, Doohyun Kim, Sang-Ki Ko, Jinkyu Lee, Brent ByungHoon Kang, Hyeongboo Baek, 19 Aug 2025, STAS: Spatio-Temporal Adaptive Computation Time for Spiking Transformers, https://arxiv.org/abs/2508.14138
  • Lian Lian, Yilin Li, Song Han, Renzi Meng, Sibo Wang, Ming Wang, 20 Aug 2025, Artificial Intelligence-Based Multiscale Temporal Modeling for Anomaly Detection in Cloud Services, https://arxiv.org/abs/2508.14503
  • Jiafeng Xiong and Rizos Sakellariou, 20 Aug 2025, Graph Structure Learning with Temporal Graph Information Bottleneck for Inductive Representation Learning, https://arxiv.org/abs/2508.14859
  • Anushka A. Kore, Frank G. te Nijenhuis, Matthijs van der Sluijs, Wim van Zwam, Charles Majoie, Geert Lycklama \`a Nijeholt, Danny Ruijters, Frans Vos, Sandra Cornelissen, Ruisheng Su, Theo van Walsum, 19 Aug 2025, OccluNet: Spatio-Temporal Deep Learning for Occlusion Detection on DSA, https://arxiv.org/abs/2508.14286
  • Peiming Li, Ziyi Wang, Yulin Yuan, Hong Liu, Xiangming Meng, Junsong Yuan, Mengyuan Liu, 20 Aug 2025, UST-SSM: Unified Spatio-Temporal State Space Models for Point Cloud Video Modeling, https://arxiv.org/abs/2508.14604
  • Yutian Liu, Zhengyi Yang, Jiancan Wu, Xiang Wang, 20 Aug 2025, Enhancing Temporal Sensitivity of Large Language Model for Recommendation with Counterfactual Tuning, https://arxiv.org/abs/2507.03047
  • Jiacheng Hu, Bo Zhang, Ting Xu, Haifeng Yang, Min Gao, 20 Aug 2025, Structure-Aware Temporal Modeling for Chronic Disease Progression Prediction, https://arxiv.org/abs/2508.14942
  • Haodi Zhong, Liuxin Zou, Di Wang, Bo Wang, Zhenxing Niu, Quan Wang, 21 Aug 2025, EvoFormer: Learning Dynamic Graph-Level Representations with Structural and Temporal Bias Correction, https://arxiv.org/abs/2508.15378
  • H. I. Nurdin and C. A. Nijhuis, 21 Aug 2025, A Solvable Molecular Switch Model for Stable Temporal Information Processing, https://arxiv.org/abs/2508.15451
  • Haibo Wang, Zhiyang Xu, Yu Cheng, Shizhe Diao, Yufan Zhou, Yixin Cao, Qifan Wang, Weifeng Ge, Lifu Huang, 21 Aug 2025, Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models, https://arxiv.org/abs/2410.03290
  • Jihua Huang, Yi Yao and Ajay Divakaran, 21 Aug 2025, Transforming Causality: Transformer-Based Temporal Causal Discovery with Prior Knowledge Integration, https://arxiv.org/abs/2508.15928
  • Yujie Li, Zezhi Shao, Chengqing Yu, Tangwen Qian, Zhao Zhang, Yifan Du, Shaoming He, Fei Wang, Yongjun Xu, 22 Aug 2025, STA-GANN: A Valid and Generalizable Spatio-Temporal Kriging Approach, https://arxiv.org/abs/2508.16161
  • Nadia Asif and Zhiqing Hong and Shaogang Ren and Xiaonan Zhang and Xiaojun Shang and Yukun Yuan, 22 Aug 2025, MuST2-Learn: Multi-view Spatial-Temporal-Type Learning for Heterogeneous Municipal Service Time Estimation, https://arxiv.org/abs/2508.16503
  • Shunsuke Iwashita, Ning Ding, Keisuke Fujii, 25 Aug 2025, Evaluating Movement Initiation Timing in Ultimate Frisbee via Temporal Counterfactuals, https://arxiv.org/abs/2508.17611
  • Bangchao Deng, Lianhua Ji, Chunhua Chen, Xin Jing, Ling Ding, Bingqing QU, Pengyang Wang, Dingqi Yang, 14 Aug 2025, STRelay: A Universal Spatio-Temporal Relaying Framework for Location Prediction with Future Spatiotemporal Contexts, https://arxiv.org/abs/2508.16620
  • Weilin Ruan, Xilin Dang, Ziyu Zhou, Sisuo Lyu, Yuxuan Liang, 14 Aug 2025, A Retrieval Augmented Spatio-Temporal Framework for Traffic Prediction, https://arxiv.org/abs/2508.16623
  • Zhuding Liang, Jianxun Cui, Qingshuang Zeng, Feng Liu, Nenad Filipovic, Tijana Geroski, 21 Aug 2025, STGAtt: A Spatial-Temporal Unified Graph Attention Network for Traffic Flow Forecasting, https://arxiv.org/abs/2508.16685
  • Bicheng Wang and Junping Wang and Yibo Xue, 22 Aug 2025, Physics-Inspired Spatial Temporal Graph Neural Networks for Predicting Industrial Chain Resilience, https://arxiv.org/abs/2508.16836
  • YongKyung Oh, Dong-Young Lim, Sungil Kim, Alex Bui, 24 Aug 2025, TANDEM: Temporal Attention-guided Neural Differential Equations for Missingness in Time Series Classification, https://arxiv.org/abs/2508.17519
  • Hoyoung Lee, Wonbin Ahn, Suhwan Park, Jaehoon Lee, Minjae Kim, Sungdong Yoo, Taeyoon Lim, Woohyung Lim, Yongjae Lee, 23 Aug 2025, THEME : Enhancing Thematic Investing with Semantic Stock Representations and Temporal Dynamics, https://arxiv.org/abs/2508.16936
  • Ziyao Shangguan, Chuhan Li, Yuxuan Ding, Yanan Zheng, Yilun Zhao, Tesca Fitzgerald, Arman Cohan, 25 Aug 2025, TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models, https://arxiv.org/abs/2410.23266
  • Xiuyuan Cheng, Zheng Dong, Yao Xie, 22 Aug 2025, Deep spatio-temporal point processes: Advances and new directions, https://arxiv.org/abs/2504.06364

AGI Research

General research on achieving Artifical General Intelligence (AGI):

  • Tao Feng, Chuanyang Jin, Jingyu Liu, Kunlun Zhu, Haoqin Tu, Zirui Cheng, Guanyu Lin, Jiaxuan You, 16 May 2024, How Far Are We From AGI, https://arxiv.org/abs/2405.10313
  • Nathan Lambert, APR 18, 2024, Llama 3: Scaling open LLMs to AGI, https://www.interconnects.ai/p/llama-3-and-scaling-open-llms
  • jbetke, June 3, 2024, General Intelligence (2024), https://nonint.com/2024/06/03/general-intelligence-2024/
  • Steve Yadlowsky, Lyric Doshi, Nilesh Tripuraneni, Nov 2023, Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models, https://arxiv.org/abs/2311.00871
  • Denise Holt, Jan 29, 2024, “Deep Learning is Rubbish” — Karl Friston & Yann LeCun Face Off at Davos 2024 World Economic Forum, 𝐀𝐈 𝐦𝐨𝐧𝐤𝐬.𝐢𝐨, https://medium.com/aimonks/deep-learning-is-rubbish-karl-friston-yann-lecun-face-off-at-davos-2024-world-economic-forum-494e82089d22
  • Hayden Field, June 20, 2024, OpenAI competitor Anthropic announces its most powerful AI yet, CNBC, https://www.cnbc.com/2024/06/20/anthropic-claude-3point5-sonnet-ai-announced.html
  • Arjun Kharpal, June 21, 2024, SoftBank CEO says AI that is 10,000 times smarter than humans will come out in 10 years, CNBC, https://www.cnbc.com/2024/06/21/softbank-ceo-predicts-ai-that-is-10000-times-smarter-than-humans-.html
  • Rahul Verma, June 21, 2024, OpenAI's GPT-5 Pushed Back To Late 2025, But Promises Ph.D.-Level Abilities, https://in.mashable.com/tech/77593/openais-gpt-5-pushed-back-to-late-2025-but-promises-phd-level-abilities
  • Ignacio de Gregorio, June 2024, Mixture-of-Agents Beats ChatGPT-4o: Collaboration is Intelligence, https://medium.com/@ignacio.de.gregorio.noblejas/mixture-of-agents-beats-chatgpt-4o-6470a74f1525
  • Rachel Metz, July 12, 2024, OpenAI Scale Ranks Progress Toward ‘Human-Level’ Problem Solving: The company believes its technology is approaching the second level of five on the path to artificial general intelligence, Bloomberg, https://www.bloomberg.com/news/articles/2024-07-11/openai-sets-levels-to-track-progress-toward-superintelligent-ai?sref=P6Q0mxvj
  • Anna Tong and Katie Paul July 16, 2024, Exclusive: OpenAI working on new reasoning technology under code name ‘Strawberry’, https://www.reuters.com/technology/artificial-intelligence/openai-working-new-reasoning-technology-under-code-name-strawberry-2024-07-12/
  • Ethan Mollick, May 12, 2024, Superhuman? What does it mean for AI to be better than a human? And how can we tell? https://www.oneusefulthing.org/p/superhuman
  • Zarif Bin Akhtar, Mapping Generative Artificial Intelligence (GAI's) Exciting Future: From Gemini to Q* and Beyond, https://publications.eai.eu/index.php/airo/article/view/5962 https://doi.org/10.4108/airo.5962 PDF: https://publications.eai.eu/index.php/airo/article/view/5962/3329
  • Jack Dymond, August 2024, Progressive Intelligence for Low-Power Devices, Ph.D. Thesis, Faculty of Engineering and Physical Sciences, School of Electronics and Computer Science, University of Southampton, https://eprints.soton.ac.uk/492900/1/JackDymond-Final-Thesis.pdf
  • Rohin Shah, Seb Farquhar, Anca Dragan, 21st Aug 2024, AGI Safety and Alignment at Google DeepMind: A Summary of Recent Work, https://www.alignmentforum.org/posts/79BPxvSsjzBkiSyTq/agi-safety-and-alignment-at-google-deepmind-a-summary-of
  • Roy Lo, June 13, 2024, Defining AI 2.0: Beyond Generative AI, https://www.linkedin.com/pulse/defining-ai-20-beyond-generative-roy-lo-tbvie/
  • Ryan McNeal, Aug 27, 2024, ChatGPT and GPT-4 could get a sweet upgrade this fall with 'strawberry', https://www.androidauthority.com/openai-strawberry-ai-3475682/
  • Vishal Rajput, Jul 8, 2024, Why LLMs Can’t Plan And Unlikely To Reach AGI? https://medium.com/aiguys/why-llms-cant-plan-and-unlikely-to-reach-agi-642bda3e0aa3
  • Lareina Yee, June 7, 2024, Gen AI: A cognitive industrial revolution, https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/gen-ai-a-cognitive-industrial-revolution
  • Martin_Casado, Aug 31, 2024, Tweet (State of LLMs) https://threadreaderapp.com/thread/1829905130512400775.html
  • Gian Segato, September 2024, The dawn of a new startup era, https://giansegato.com/essays/dawn-new-startup-era
  • Anson Ho, Tamay Besiroglu, Ege Erdil, David Owen, Robi Rahman, Zifan Carl Guo, David Atkinson, Neil Thompson, Jaime Sevilla, Mar 12, 2024, Algorithmic Progress in Language Models, Epoch AI, https://epochai.org/blog/algorithmic-progress-in-language-models
  • Anson Ho, Tamay Besiroglu, Ege Erdil, David Owen, Robi Rahman, Zifan Carl Guo, David Atkinson, Neil Thompson, Jaime Sevilla, 9 Mar 2024, Algorithmic progress in language models, https://arxiv.org/abs/2403.05812
  • Alberto Romero. Sep 10, 2024, Big News: OpenAI to Launch AI Model That Can Reason in 2 Weeks, https://www.thealgorithmicbridge.com/p/big-news-openai-to-launch-ai-model
  • David Gilmore, Sep 2024, When will AI outthink humans? https://davidvgilmore.com/writings/outthinking-ai (Interesting analysis of all the GPUs in the world and when they will "out-think" all the human knowledge workers, predicting a range of years from 2028 to 2035, depending on assumptions.)
  • Chloe Berger, October 2, 2024, Mark Cuban says his puppy is ‘smarter than AI is today’, https://fortune.com/2024/10/01/mark-cuban-dog-puppy-smarter-than-ai/
  • Julia Love and Rachel Metz, October 2, 2024, Google Is Working on Reasoning AI, Chasing OpenAI’s Efforts, https://www.bloomberg.com/news/articles/2024-10-02/google-is-working-on-reasoning-ai-chasing-openai-s-efforts
  • Samantha Kelly, Sept. 29, 2024, 'Superintelligent' AI Is Only a Few Thousand Days Away: OpenAI CEO Sam Altman, https://www.cnet.com/tech/services-and-software/superintelligent-ai-is-only-a-few-thousand-days-away-openai-ceo-sam-altman/
  • Tianyang Zhong, Zhengliang Liu, Yi Pan, Yutong Zhang, Yifan Zhou, Shizhe Liang, Zihao Wu, Yanjun Lyu, Peng Shu, Xiaowei Yu, Chao Cao, Hanqi Jiang, Hanxu Chen, Yiwei Li, Junhao Chen, Huawen Hu, Yihen Liu, Huaqin Zhao, Shaochen Xu, Haixing Dai, Lin Zhao, Ruidong Zhang, Wei Zhao, Zhenyuan Yang, Jingyuan Chen, Peilong Wang, Wei Ruan, Hui Wang, Huan Zhao, Jing Zhang, Yiming Ren, Shihuan Qin, Tong Chen, Jiaxi Li, Arif Hassan Zidan, Afrar Jahin, Minheng Chen, Sichen Xia, Jason Holmes, Yan Zhuang, Jiaqi Wang, Bochen Xu, Weiran Xia, Jichao Yu, Kaibo Tang, Yaxuan Yang, Bolun Sun, Tao Yang, Guoyu Lu, Xianqiao Wang, Lilong Chai, He Li, Jin Lu, Lichao Sun, Xin Zhang, Bao Ge, Xintao Hu, Lian Zhang, Hua Zhou, Lu Zhang, Shu Zhang, Ninghao Liu, Bei Jiang, Linglong Kong, Zhen Xiang, Yudan Ren, Jun Liu, Xi Jiang, Yu Bao, Wei Zhang, Xiang Li, Gang Li, Wei Liu, Dinggang Shen, Andrea Sikora, Xiaoming Zhai, Dajiang Zhu, Tianming Liu, 27 Sep 2024, Evaluation of OpenAI o1: Opportunities and Challenges of AGI, https://arxiv.org/abs/2409.18486
  • https://www.cio.com/article/3567138/ai-native-software-engineering-may-be-closer-than-developers-think.html
  • Ignacio de Gregorio Noblejas, October 20, 2024, The Anti-LLM Revolution Begins,https://thetechoasis.beehiiv.com/p/the-anti-llm-revolution-begins
  • Aki Ranin, Sep 2, 2024, The Code Canaries Are Singing — Our Path Toward AGI: How the fate of human software developers reveals our path toward AGI, https://akiranin.medium.com/the-code-canaries-are-singing-our-path-toward-agi-6c234cae0189
  • Will Lockett Nov 2024, Apple Calls BS On The AI Revolution, They aren’t late to the AI game; they are just the only sceptical big tech company. https://medium.com/predict/apple-calls-bullshit-on-the-ai-revolution-ae38fdf83392
  • Anthony Ha, Nov 2024, OpenAI reportedly developing new strategies to deal with AI improvement slowdown, https://techcrunch.com/2024/11/09/openai-reportedly-developing-new-strategies-to-deal-with-ai-improvement-slowdown/
  • Michael Nuñez, November 11, 2024, AI’s math problem: FrontierMath benchmark shows how far technology still has to go, https://venturebeat.com/ai/ais-math-problem-frontiermath-benchmark-shows-how-far-technology-still-has-to-go/
  • Kyle Orland, 13 Nov 2024, What if AI doesn’t just keep getting better forever? New reports highlight fears of diminishing returns for traditional LLM training. https://arstechnica.com/ai/2024/11/what-if-ai-doesnt-just-keep-getting-better-forever/
  • Gary Marcus, Nov 25, 2024, A new AI scaling law shell game? Scaling laws ain’t what they used to be, https://garymarcus.substack.com/p/a-new-ai-scaling-law-shell-game
  • Brian Merchant, Dec 2024, AI Generated Business: The Rise of AGI and the Rush to Find a Working Business Model, https://ainowinstitute.org/general/ai-generated-business
  • David Luan, Pieter Abbeel, December 09, 2024, Amazon opens new AI lab in San Francisco focused on long-term research bets. The Amazon AGI SF Lab will focus on developing new foundational capabilities for enabling useful AI agents. https://www.amazon.science/blog/amazon-opens-new-ai-lab-in-san-francisco-focused-on-long-term-research-bets
  • Deirdre Bosa, Jasmine Wu, Dec 11 2024, The limits of intelligence — Why AI advancement could be slowing down, https://www.cnbc.com/2024/12/11/why-ai-advancement-could-be-slowing-down.html
  • Alberto Romero, Dec 21, 2024, OpenAI o3 Model Is a Message From the Future: Update All You Think You Know About AI. Incredible, a miracle, more than just a better state-of-the-art AI model. https://www.thealgorithmicbridge.com/p/openai-o3-model-is-a-message-from
  • Sabrina Ortiz, Dec. 20, 2024, OpenAI unveils its most advanced o3 reasoning model on its last day of 'shipmas', https://www.zdnet.com/article/openai-unveils-its-most-advanced-o3-reasoning-model-on-its-last-day-of-shipmas/
  • Akash Bajwa, Jan 06, 2025, Test-Time Search: A Path To AGI: Stacking Scaling Laws And Reward Engineering, https://akashbajwa.substack.com/p/test-time-search-a-path-to-agi
  • Duncan Anderson, Jan 2025, The wall that wasn’t: Benchmark results for the latest AI models suggest that any “scaling wall” has already been breached and we’re on the path to AGI. https://medium.com/barnacle-labs/the-wall-that-wasnt-62c617f66ad4
  • Alhassan Mumuni, Fuseini Mumuni, 6 Jan 2025, Large language models for artificial general intelligence (AGI): A survey of foundational principles and approaches, https://arxiv.org/abs/2501.03151
  • Jeffrey Anthony, Jan 2025, No GPT-5 in 2025 and No AGI — Ever. The Triadic Nature of Meaning-Making and the Fallacy of AI’s Understanding. https://medium.com/@WeWillNotBeFlattened/no-gpt-5-in-2025-and-no-agi-ever-aa9384efdbe5
  • Ndea, Jan 16, 2025, Ndea is building frontier AI systems that blend intuitive pattern recognition and formal reasoning into a unified architecture., https://ndea.com/
  • Akash Bajwa Jan 27, 2025, The Post-R1 World: AI Economics Have Irreversibly Changed, https://akashbajwa.substack.com/p/the-post-r1-world
  • Mohit Sewak, Ph.D., January 29, 2025, Achieving General Intelligence (AGI) and Super Intelligence (ASI): Pathways, Uncertainties, and Ethical Concerns, https://towardsai.net/p/l/achieving-general-intelligence-agi-and-super-intelligence-asi-pathways-uncertainties-and-ethical-concerns
  • Alberto Romero, Feb 06, 2025, AGI Is Already Here—It’s Just Not Evenly Distributed: Or: why you should learn to prompt AI models, https://open.substack.com/pub/thealgorithmicbridge/p/agi-is-already-hereits-just-not-evenly
  • Arjun Kharpal, Feb 6 2025, ‘Dangerous proposition’: Top scientists warn of out-of-control AI, https://www.cnbc.com/2025/02/07/dangerous-proposition-top-scientists-warn-of-out-of-control-ai.html
  • Nikhil Anand, Feb 2025, Why I think DeepSeek-R1 just revealed the path to AGI. Here’s a visual explanation of exactly what makes DeepSeek-R1 so good. https://ai.gopubby.com/why-i-think-deepseek-r1-just-revealed-the-path-to-agi-d0add267197d
  • Sam Altman, Feb 10, 2025, Three Observations, https://blog.samaltman.com/three-observations (Talks about scaling laws, inference costs reducing, and AGI. One of them: "The cost to use a given level of AI falls about 10x every 12 months, and lower prices lead to much more use. ")
  • Tobias Schnabel, Kiran Tomlinson, Adith Swaminathan, Jennifer Neville, 19 May 2025 (v2), Lost in Transmission: When and Why LLMs Fail to Reason Globally, https://arxiv.org/abs/2505.08140
  • Apoorv Agrawal, May 23, 2025, Why Cars Drive Themselves Before Computers Do: Robocars are ready; robot secretaries aren’t… yet, https://apoorv03.com/p/autonomy
  • Parshin Shojaee, Maxwell Horton, Iman Mirzadeh, Samy Bengio, Keivan Alizadeh, June 2025, The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity, Apple, https://machinelearning.apple.com/research/illusion-of-thinking https://ml-site.cdn-apple.com/papers/the-illusion-of-thinking.pdf
  • Dr. Ashish Bamania, June 2025, Apple’s New Research Shows That LLM Reasoning Is Completely Broken: A deep dive into Apple research that exposes the flawed thinking process in state-of-the-art Reasoning LLMs, https://ai.gopubby.com/apples-new-research-shows-that-llm-reasoning-is-completely-broken-47b5be71a06a
  • François Chollet, 25 Nov 2019 (v2), On the Measure of Intelligence, https://arxiv.org/abs/1911.01547
  • Kenneth Payne, Baptiste Alloui-Cros, 3 Jul 2025, Strategic Intelligence in Large Language Models: Evidence from evolutionary Game Theory, https://arxiv.org/abs/2507.02618
  • David De Cremer and Garry Kasparov, March 18, 2021, AI Should Augment Human Intelligence, Not Replace It https://hbr.org/2021/03/ai-should-augment-human-intelligence-not-replace-it
  • Eli Amdur, Nov 25, 2023, Jobs AI Just Can't Do, Forbes https://www.forbes.com/sites/eliamdur/2023/11/25/jobs-ai-just-cant-do/
  • Maryville University Online. June 6, 2024 Artificial Intelligence vs. Human Intelligence, https://online.maryville.edu/blog/ai-vs-human-intelligence/
  • Human- versus Artificial Intelligence - PMC - PubMed Central https://pmc.ncbi.nlm.nih.gov/articles/PMC8108480/ PDF: https://pmc.ncbi.nlm.nih.gov/articles/PMC8108480/pdf/frai-04-622364.pdf (Good article on the nature of intelligence.)
  • AI Won't Replace Humans – Here's The Surprising Reason Why https://www.forbes.com/sites/bernardmarr/2024/11/28/ai-wont-replace-humans--heres-the-surprising-reason-why/
  • Shneiderman, B. (2020). Human-Centered Artificial Intelligence: Reliable, Safe & Trustworthy. International Journal of Human–Computer Interaction, 36(6), 495–504. https://doi.org/10.1080/10447318.2020.1741118 https://www.tandfonline.com/doi/full/10.1080/10447318.2020.1741118
  • Joe McKendrick, July 23, 2025, Will AI think like humans? We're not even close - and we're asking the wrong question, https://www.zdnet.com/article/will-ai-think-like-humans-were-not-even-close-and-were-asking-the-wrong-question/
  • Webb Wright, July 1, 2025, Meta's new AI lab aims to deliver 'personal superintelligence for everyone' - whatever that means, https://www.zdnet.com/article/metas-new-ai-lab-aims-to-deliver-personal-superintelligence-for-everyone-whatever-that-means/
  • Lester Mapp, June 17, 2025 , Apple's 'The Illusion of Thinking' is shocking - but here's what it missed, https://www.zdnet.com/article/apples-the-illusion-of-thinking-is-shocking-but-heres-what-it-missed/
  • Sabrina Ortiz, June 17, 2025, What Apple's controversial research paper really tells us about LLMs, https://www.zdnet.com/article/what-apples-controversial-research-paper-really-tells-us-about-llms/
  • Zvi, 11th Jun 2025, Give Me a Reason(ing Model), https://www.lesswrong.com/posts/tnc7YZdfGXbhoxkwj/give-me-a-reason-ing-model
  • Peter Wildeford, Aug 08, 2025, GPT-5: a small step for intelligence, a giant leap for normal people: GPT-5 focuses on where the money is - everyday users, not AI elites, https://peterwildeford.substack.com/p/gpt-5-a-small-step-for-intelligence
  • Kenneth Wolters, Aug 12, 2025, No AGI in Sight: What This Means for LLMs, https://kennethwolters.com/posts/no-agi/
  • Mark Zilberman, 13 Aug 2025, Extending the Entropic Potential of Events for Uncertainty Quantification and Decision-Making in Artificial Intelligence, https://arxiv.org/abs/2508.10241
  • Yi Dong, Yusuke Muraoka, Scott Shi, and Yi Zhang, 14 Aug 2025, MM-Food-100K: A 100,000-Sample Multimodal Food Intelligence Dataset with Verifiable Provenance, https://arxiv.org/abs/2508.10429
  • Silvia Garc\'ia-M\'endez, Francisco de Arriba-P\'erez, 8 Aug 2025, Detecting and explaining postpartum depression in real-time with generative artificial intelligence, https://arxiv.org/abs/2508.10025
  • Yuksel Aydin, 9 Aug 2025, Cognitive Cybersecurity for Artificial Intelligence: Guardrail Engineering with CCS-7, https://arxiv.org/abs/2508.10033
  • Nitin Rai, Nathan S. Boyd, Gary E. Vallad, Arnold W. Schumann, 13 Aug 2025, Improving watermelon (Citrullus lanatus) disease classification with generative artificial intelligence (GenAI)-based synthetic and real-field images via a custom EfficientNetV2-L model, https://arxiv.org/abs/2508.10156
  • Amine Tellache, Abdelaziz Amara Korba, Amdjed Mokhtari, Horea Moldovan, Yacine Ghamri-Doudane, 14 Aug 2025, Advancing Autonomous Incident Response: Leveraging LLMs and Cyber Threat Intelligence, https://arxiv.org/abs/2508.10677
  • Kei-Sing Ng, 13 Aug 2025, On the Definition of Intelligence, https://arxiv.org/abs/2507.22423
  • Ilias Chatzistefanidis, Navid Nikaein, 23 Jul 2025, Symbiotic Agents: A Novel Paradigm for Trustworthy AGI-driven Networks, https://arxiv.org/abs/2507.17695
  • C.H.E. Jordaan, M. van der Stelt, T.J.J. Maal, V.M.A. Stirler, R. Leijendekkers, T. Kachman, G.A. de Jong, 30 Apr 2025, Evaluating Artificial Intelligence Algorithms for the Standardization of Transtibial Prosthetic Socket Shape Design, https://arxiv.org/abs/2507.16818
  • Guang Gao, Jianan Wang, Jinbo Zuo, Junnan Jiang, Jingfan Zhang, Xianwen Zeng, Yuejiang Zhu, Lianyang Ma, Ke Chen, Minhua Sheng, Ruirui Zhang, Zhaohui An, 23 Jul 2025, Towards Human-level Intelligence via Human-like Whole-Body Manipulation, https://arxiv.org/abs/2507.17141
  • Georgios Mappouras, 23 Jul 2025, Turing Test 2.0: The General Intelligence Threshold, https://arxiv.org/abs/2505.19550
  • Obumneme Zimuzor Nwafor and Mohammed Abdul Majeed Al Hooti, 23 Jul 2025, Artificial Intelligence for Green Hydrogen Yield Prediction and Site Suitability using SHAP-Based Composite Index: Focus on Oman, https://arxiv.org/abs/2507.14219
  • Yanjun Zheng, Xiyang Du, Longfei Liao, Xiaoke Zhao, Zhaowen Zhou, Bo Zhang, Jiawei Liu, Xiang Qi, Zhe Li, Zhiqiang Zhang, Wei Wang and Peng Zhang, 23 Jul 2025, Agentar-Fin-R1: Enhancing Financial Intelligence through Domain Expertise, Training Efficiency, and Advanced Reasoning, https://arxiv.org/abs/2507.16802
  • Shai Shalev-Shwartz and Amnon Shashua, 13 Jul 2025, From Reasoning to Super-Intelligence: A Search-Theoretic Perspective, https://arxiv.org/abs/2507.15865
  • Simon Ouellette, 17 Jul 2025, Out-of-Distribution Generalization in the ARC-AGI Domain: Comparing Execution-Guided Neural Program Synthesis and Test-Time Fine-Tuning, https://arxiv.org/abs/2507.15877
  • Andy E. Williams, 18 Jul 2025, The Recursive Coherence Principle: A Formal Constraint on Scalable Intelligence, Alignment, and Reasoning Architecture, https://arxiv.org/abs/2507.15880
  • Li-Hsiang Shen, Jyun-Jhe Huang, 22 Jul 2025, CHIMERA: Compressed Hybrid Intelligence for Twin-Model Enhanced Multi-Agent Deep Reinforcement Learning for Multi-Functional RIS-Assisted Space-Air-Ground Integrated Networks, https://arxiv.org/abs/2507.16204
  • Xin-De Wang, Zhi-Rui Chen, Peng-Jie Guo, Ze-Feng Gao, Cheng Mu, Zhong-Yi Lu, 22 Jul 2025, Perovskite-R1: A Domain-Specialized LLM for Intelligent Discovery of Precursor Additives and Experimental Design, https://arxiv.org/abs/2507.16307
  • Christian D. Blakely, 22 Jul 2025, Symbolic Graph Intelligence: Hypervector Message Passing for Learning Graph-Level Patterns with Tsetlin Machines, https://arxiv.org/abs/2507.16537
  • Zixu Wang, Yuhan Wang, Junfei Ma, Fuyuan Wu, Junchi Yan, Xiaohui Yuan, Zhe Zhang, Jie Zhang, 22 Jul 2025, Predictive Hydrodynamic Simulations for Laser Direct-drive Implosion Experiments via Artificial Intelligence, https://arxiv.org/abs/2507.16227
  • S.-Y. Zhang, J. Tian, S.-L. Liu, H.-M. Zhang, H.-Y. Bai, Y.-C. Hu, W.-H. Wang, 22 Jul 2025, Constructing material network representations for intelligent amorphous alloys design, https://arxiv.org/abs/2507.16336
  • Jinming Hu, Hassan Nawaz, Yuting Rui, Lijie Chi, Arif Ullah, Pavlo O. Dral, 22 Jul 2025, Aitomia: Your Intelligent Assistant for AI-Driven Atomistic and Quantum Chemical Simulations, https://arxiv.org/abs/2505.08195
  • Shanghai AI Lab: Yicheng Bao, Guanxu Chen, Mingkang Chen, Yunhao Chen, Chiyu Chen, Lingjie Chen, Sirui Chen, Xinquan Chen, Jie Cheng, Yu Cheng, Dengke Deng, Yizhuo Ding, Dan Ding, Xiaoshan Ding, Yi Ding, Zhichen Dong, Lingxiao Du, Yuyu Fan, Xinshun Feng, Yanwei Fu, Yuxuan Gao, Ruijun Ge, Tianle Gu, Lujun Gui, Jiaxuan Guo, Qianxi He, Yuenan Hou, Xuhao Hu, Hong Huang, Kaichen Huang, Shiyang Huang, Yuxian Jiang, Shanzhe Lei, Jie Li, Lijun Li, Hao Li, Juncheng Li, Xiangtian Li, Yafu Li, Lingyu Li, Xueyan Li, Haotian Liang, Dongrui Liu, Qihua Liu, Zhixuan Liu, Bangwei Liu, Huacan Liu, Yuexiao Liu, Zongkai Liu, Chaochao Lu, Yudong Lu, Xiaoya Lu, Zhenghao Lu, Qitan Lv, Caoyuan Ma, Jiachen Ma, Xiaoya Ma, Zhongtian Ma, Lingyu Meng, Ziqi Miao, Yazhe Niu, Yuezhang Peng, Yuan Pu, Han Qi, Chen Qian, Xingge Qiao, et al. (50 additional authors not shown), 24 Jul 2025, SafeWork-R1: Coevolving Safety and Intelligence under the AI-45$^{\circ}$ Law, https://arxiv.org/abs/2507.18576
  • Alberto Marchisio and Muhammad Shafique, 24 Jul 2025, Neuromorphic Computing for Embodied Intelligence in Autonomous Systems: Current Trends, Challenges, and Future Directions, https://arxiv.org/abs/2507.18139
  • Zhangqi Liu, 22 Jul 2025, Human-AI Co-Creation: A Framework for Collaborative Design in Intelligent Systems, https://arxiv.org/abs/2507.17774
  • Alberto Hern\'andez-Espinosa, Luan Ozelim, Felipe S. Abrah\~ao, Hector Zenil, 24 Jul 2025, SuperARC: An Agnostic Test for Narrow, General, and Super Intelligence Based On the Principles of Recursive Compression and Algorithmic Probability, https://arxiv.org/abs/2503.16743
  • Hanzhi Zhou, Erik Hornberger, Pengsheng Guo, Xiyou Zhou, Saiwen Wang, Xin Wang, Yifei He, Xuankai Chang, Rene Rauch, Louis D'hauwe, John Peebles, Alec Doane, Kohen Chia, Jenna Thibodeau, Zi-Yi Dou, Yuanyang Zhang, Ruoming Pang, Reed Li, Zhifeng Chen, Jeremy Warner, Zhaoyang Xu, Sophy Lee, David Mizrahi, Ramsey Tantawi, Chris Chaney, Kelsey Peterson, Jun Qin, Alex Dombrowski, Mira Chiang, Aiswarya Raghavan, Gerard Casamayor, Qibin Chen, Aonan Zhang, Nathalie Tran, Jianyu Wang, Hang Su, Thomas Voice, Alessandro Pappalardo, Brycen Wershing, Prasanth Yadla, Rui Li, Priyal Chhatrapati, Ismael Fernandez, Yusuf Goren, Xin Zheng, Forrest Huang, Tao Lei, Eray Yildiz, Alper Kokmen, Gokul Santhanam, Areeba Kamal, Kaan Elgin, Dian Ang Yap, Jeremy Liu, Peter Gray, Howard Xing, Kieran Liu, Matteo Ronchi, et al. (337 additional authors not shown), 17 Jul 2025, Apple Intelligence Foundation Language Models: Tech Report 2025, https://arxiv.org/abs/2507.13575
  • Shuiguang Deng, Di Yu, Changze Lv, Xin Du, Linshan Jiang, Xiaofan Zhao, Wentao Tong, Xiaoqing Zheng, Weijia Fang, Peng Zhao, Gang Pan, Schahram Dustdar, Albert Y. Zomaya, 18 Jul 2025, Edge Intelligence with Spiking Neural Networks, https://arxiv.org/abs/2507.14069
  • Maria Tsfasman, Ramin Ghorbani, Catholijn M. Jonker, Bernd Dudzik, 18 Jul 2025, The Emotion-Memory Link: Do Memorability Annotations Matter for Intelligent Systems?, https://arxiv.org/abs/2507.14084
  • Michael Timothy Bennett, 18 Jul 2025, What the F*ck Is Artificial General Intelligence?, https://arxiv.org/abs/2503.23923
  • Nick Byrd, 17 Jul 2025, Strategic Reflectivism In Intelligent Systems, https://arxiv.org/abs/2505.22987
  • Abhishek Sriram, Neal Tuffy, 18 Jul 2025, Accelerating RF Power Amplifier Design via Intelligent Sampling and ML-Based Parameter Tuning, https://arxiv.org/abs/2507.11928
  • Haobo Yang, Shiyan Zhang, Zhuoyi Yang, Xinyu Zhang, Jilong Guo, Zongyou Yang, Jun Li, 18 Jul 2025, Entropy Loss: An Interpretability Amplifier of 3D Object Detection Network for Intelligent Driving, https://arxiv.org/abs/2409.00839
  • Rahul Kabali, 4 Jul 2025, The Free Will Equation: Quantum Field Analogies for AGI, https://arxiv.org/abs/2507.14154
  • Elio Grande, 20 Jul 2025, The Endless Tuning. An Artificial Intelligence Design To Avoid Human Replacement and Trace Back Responsibilities, https://arxiv.org/abs/2507.14909
  • Mohammad Mashayekhi, Sara Ahmadi Majd, Arian AmirAmjadi, Parsa Hosseini, 20 Jul 2025, Clinical Semantic Intelligence (CSI): Emulating the Cognitive Framework of the Expert Clinician for Comprehensive Oral Disease Diagnosis, https://arxiv.org/abs/2507.15140
  • Qianchao Wang, Yuxuan Ding, Chuanzhen Jia, Zhe Li, Yaping Du, 21 Jul 2025, Explainable Artificial Intelligence based Soft Evaluation Indicator for Arc Fault Diagnosis, https://arxiv.org/abs/2507.15239
  • Julien Pourcel, C\'edric Colas, Pierre-Yves Oudeyer, 10 Jul 2025, Self-Improving Language Models for Evolutionary Program Synthesis: A Case Study on ARC-AGI, https://arxiv.org/abs/2507.14172
  • Yajiao Dai, Jun Li, Zhen Mei, Yiyang Ni, Shi Jin, Zengxiang Li, Sheng Guo, Wei Xiang, 12 Jul 2025, Semi-Supervised Federated Learning via Dual Contrastive Learning and Soft Labeling for Intelligent Fault Diagnosis, https://arxiv.org/abs/2507.14181
  • Federico Mason, Tommaso Zugno, Matteo Drago, Marco Giordani, Mate Boban, and Michele Zorzi, 15 Jul 2025, PRATA: A Framework to Enable Predictive QoS in Vehicular Networks via Artificial Intelligence, https://arxiv.org/abs/2507.14211
  • Craig S Wright, 16 Jul 2025, Cognitive Castes: Artificial Intelligence, Epistemic Stratification, and the Dissolution of Democratic Discourse, https://arxiv.org/abs/2507.14218
  • Shayan Rokhva, Babak Teimourpour, 19 Jul 2025, Artificial Intelligence in the Food Industry: Food Waste Estimation based on Computer Vision, a Brief Case Study in a University Dining Hall, https://arxiv.org/abs/2507.14662
  • Yiming Li, Shuo Shao, Yu He, Junfeng Guo, Tianwei Zhang, Zhan Qin, Pin-Yu Chen, Michael Backes, Philip Torr, Dacheng Tao, Kui Ren, 19 Jul 2025, Rethinking Data Protection in the (Generative) Artificial Intelligence Era, https://arxiv.org/abs/2507.03034
  • Chengshuai Zhao, Zhen Tan, Pingchuan Ma, Dawei Li, Bohan Jiang, Yancheng Wang, Yingzhen Yang, Huan Liu, 13 Aug 2025 (v3), Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens, https://arxiv.org/abs/2508.01191
  • Haifeng Li, Wang Guo, Haiyang Wu, Mengwei Wu, Jipeng Zhang, Qing Zhu, Yu Liu, Xin Huang, Chao Tao, 9 Aug 2025, Remote Sensing Image Intelligent Interpretation with the Language-Centered Perspective: Principles, Methods and Challenges, https://arxiv.org/abs/2508.06832
  • Aswin Paul, Moein Khajehnejad, Forough Habibollahi, Brett J. Kagan, Adeel Razi, 9 Aug 2025, Simulating Biological Intelligence: Active Inference with Experiment-Informed Generative Model, https://arxiv.org/abs/2508.06980
  • Yi Tang, Kaini Wang, Yang Chen, Guangquan Zhou, 10 Aug 2025, EndoAgent: A Memory-Guided Reflective Agent for Intelligent Endoscopic Vision-to-Decision Reasoning, https://arxiv.org/abs/2508.07292
  • Mubaris Nadeem, Johannes Zenkert, Lisa Bender, Christian Weber, Madjid Fathi, 11 Aug 2025, KIRETT: Knowledge-Graph-Based Smart Treatment Assistant for Intelligent Rescue Operations, https://arxiv.org/abs/2508.07834
  • Rachel K. Luu and Jingyu Deng and Mohammed Shahrudin Ibrahim and Nam-Joon Cho and Ming Dao and Subra Suresh and Markus J. Buehler, 8 Aug 2025, Generative Artificial Intelligence Extracts Structure-Function Relationships from Plants for New Materials, https://arxiv.org/abs/2508.06591
  • Hyeonuk Nam, 11 Aug 2025, Auditory Intelligence: Understanding the World Through Sound, https://arxiv.org/abs/2508.07829
  • Yuyang Zhou, Guang Cheng, Kang Du, Zihan Chen, Yuyu Zhao, 11 Aug 2025, Toward Intelligent and Secure Cloud: Large Language Model Empowered Proactive Defense, https://arxiv.org/abs/2412.21051
  • Manish Verma, Vivek Sharma, Vishal Singh, 27 Jul 2025, Artificial Intelligence In Patent And Market Intelligence: A New Paradigm For Technology Scouting, https://arxiv.org/abs/2507.20322
  • Huan-ang Gao, Jiayi Geng, Wenyue Hua, Mengkang Hu, Xinzhe Juan, Hongzhang Liu, Shilong Liu, Jiahao Qiu, Xuan Qi, Yiran Wu, Hongru Wang, Han Xiao, Yuhang Zhou, Shaokun Zhang, Jiayi Zhang, Jinyu Xiang, Yixiong Fang, Qiwen Zhao, Dongrui Liu, Qihan Ren, Cheng Qian, Zhenghailong Wang, Minda Hu, Huazheng Wang, Qingyun Wu, Heng Ji, Mengdi Wang, 28 Jul 2025, A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence, https://arxiv.org/abs/2507.21046
  • Obumneme Nwafor and Mohammed Abdul Majeed Al Hooti, 21 Jul 2025, Machine Learning Risk Intelligence for Green Hydrogen Investment: Insights for Duqm R3 Auction, https://arxiv.org/abs/2507.19529
  • Jianping Yao, Son N. Tran, Hieu Nguyen, Samantha Sawyer, and Rocco Longo, 27 Jul 2025, Wine Characterisation with Spectral Information and Predictive Artificial Intelligence, https://arxiv.org/abs/2507.20114
  • Kimi Team: Yifan Bai, Yiping Bao, Guanduo Chen, Jiahao Chen, Ningxin Chen, Ruijue Chen, Yanru Chen, Yuankun Chen, Yutian Chen, Zhuofu Chen, Jialei Cui, Hao Ding, Mengnan Dong, Angang Du, Chenzhuang Du, Dikang Du, Yulun Du, Yu Fan, Yichen Feng, Kelin Fu, Bofei Gao, Hongcheng Gao, Peizhong Gao, Tong Gao, Xinran Gu, Longyu Guan, Haiqing Guo, Jianhang Guo, Hao Hu, Xiaoru Hao, Tianhong He, Weiran He, Wenyang He, Chao Hong, Yangyang Hu, Zhenxing Hu, Weixiao Huang, Zhiqi Huang, Zihao Huang, Tao Jiang, Zhejun Jiang, Xinyi Jin, Yongsheng Kang, Guokun Lai, Cheng Li, Fang Li, Haoyang Li, Ming Li, Wentao Li, Yanhao Li, Yiwei Li, Zhaowei Li, Zheming Li, Hongzhan Lin, Xiaohan Lin, Zongyu Lin, Chengyin Liu, Chenyu Liu, Hongzhang Liu, Jingyuan Liu, Junqi Liu, Liang Liu, Shaowei Liu, T.Y. Liu, Tianwei Liu, et al. (103 additional authors not shown), 28 Jul 2025, Kimi K2: Open Agentic Intelligence, https://arxiv.org/abs/2507.20534
  • Matin Aghaei, Mohammad Ali Alomrani, Yingxue Zhang, Mahdi Biparva, 26 Jul 2025, When Engineering Outruns Intelligence: A Re-evaluation of Instruction-Guided Navigation, https://arxiv.org/abs/2507.20021
  • Marta Sidorkiewicz, Karolina Kr\'olikowska, Berenika Dyczek, Edyta Pijet-Migon, Anna Dubel, 2 Jul 2025, Artificial intelligence for sustainable wine industry: AI-driven management in viticulture, wine production and enotourism, https://arxiv.org/abs/2507.21098
  • Jae Wan Shim, 21 Jul 2025, Measuring and Analyzing Intelligence via Contextual Uncertainty in Large Language Models using Information-Theoretic Metrics, https://arxiv.org/abs/2507.21129
  • Ashley Rector, Keaton Minor, Kamden Minor, Jeff McCormack, Beth Breeden, Ryan Nowers, Jay Dorris, 29 Jul 2025, Validating Pharmacogenomics Generative Artificial Intelligence Query Prompts Using Retrieval-Augmented Generation (RAG), https://arxiv.org/abs/2507.21453
  • Bin Liu, 29 Jul 2025, Exploring the Link Between Bayesian Inference and Embodied Intelligence: Toward Open Physical-World Embodied AI Systems, https://arxiv.org/abs/2507.21589
  • Andrew Kiruluta, Andreas Lemos, and Priscilla Burity, 27 Jul 2025, Operator-Based Machine Intelligence: A Hilbert Space Framework for Spectral Learning and Symbolic Reasoning, https://arxiv.org/abs/2507.21189
  • Abir Ray, 28 Jul 2025, EdgeAgentX-DT: Integrating Digital Twins and Generative AI for Resilient Edge Intelligence in Tactical Networks, https://arxiv.org/abs/2507.21196
  • Leonard Dung and Max Hellrigel-Holderbaum, 29 Jul 2025, Against racing to AGI: Cooperation, deterrence, and catastrophic risks, https://arxiv.org/abs/2507.21839
  • Maximilian Ferle, Jonas Ader, Thomas Wiemers, Nora Grieb, Adrian Lindenmeyer, Hans-Jonas Meyer, Thomas Neumuth, Markus Kreuz, Kristin Reiche, Maximilian Merz, 29 Jul 2025, Unsupervised risk factor identification across cancer types and data modalities via explainable artificial intelligence, https://arxiv.org/abs/2506.12944
  • Hubert Baniecki and Przemyslaw Biecek, 28 Jul 2025, Adversarial attacks and defenses in explainable artificial intelligence: A survey, https://arxiv.org/abs/2306.06123
  • Huizi Yu, Jiayan Zhou, Lingyao Li, Shan Chen, Jack Gallifant, Anye Shi, Xiang Li, Jingxian He, Wenyue Hua, Mingyu Jin, Guang Chen, Yang Zhou, Zhao Li, Trisha Gupte, Ming-Li Chen, Zahra Azizi, Yongfeng Zhang, Yanqiu Xing, Themistocles L. Danielle S. Bitterman, Themistocles L. Assimes, Xin Ma, Lin Lu, Lizhou Fan, 29 Jul 2025, Simulated patient systems are intelligent when powered by large language model-based AI agents, https://arxiv.org/abs/2409.18924
  • Kees van Deemter, 29 Jul 2025, My Life in Artificial Intelligence: People, anecdotes, and some lessons learnt, https://arxiv.org/abs/2504.04142
  • Arushi Goel and Sreyan Ghosh and Jaehyeon Kim and Sonal Kumar and Zhifeng Kong and Sang-gil Lee and Chao-Han Huck Yang and Ramani Duraiswami and Dinesh Manocha and Rafael Valle and Bryan Catanzaro, 28 Jul 2025, Audio Flamingo 3: Advancing Audio Intelligence with Fully Open Large Audio Language Models, https://arxiv.org/abs/2507.08128
  • Olga Vershinina, Jacopo Sabbatinelli, Anna Rita Bonfigli, Dalila Colombaretti, Angelica Giuliani, Mikhail Krivonosov, Arseniy Trukhanov, Claudio Franceschi, Mikhail Ivanchenko, Fabiola Olivieri, 31 Jul 2025, Explainable artificial intelligence model predicting the risk of all-cause mortality in patients with type 2 diabetes mellitus, https://arxiv.org/abs/2507.23491
  • Kathleen Mealey, Jonathan A. Karr Jr., Priscila Saboia Moreira, Paul R. Brenner, Charles F. Vardeman II, 24 Jul 2025, Trusted Knowledge Extraction for Operations and Maintenance Intelligence, https://arxiv.org/abs/2507.22935
  • Shaofei Cai, Zhancun Mu, Haiwen Xia, Bowei Zhang, Anji Liu, Yitao Liang, 31 Jul 2025, Scalable Multi-Task Reinforcement Learning for Generalizable Spatial Intelligence in Visuomotor Agents, https://arxiv.org/abs/2507.23698
  • Hanwen Zhang, Ruichen Zhang, Wei Zhang, Dusit Niyato, Yonggang Wen, Chunyan Miao, 31 Jul 2025, Advancing Generative Artificial Intelligence and Large Language Models for Demand Side Management with Internet of Electric Vehicles, https://arxiv.org/abs/2501.15544
  • Chunan Tong, 31 Jul 2025, An Efficient Intelligent Semi-Automated Warehouse Inventory Stocktaking System, https://arxiv.org/abs/2309.12365
  • Jirui Yang, Zheyu Lin, Zhihui Lu, Yinggui Wang, Lei Wang, Tao Wei, Xin Du, Shuhan Yang, 31 Jul 2025, CEE: An Inference-Time Jailbreak Defense for Embodied Intelligence via Subspace Concept Rotation, https://arxiv.org/abs/2504.13201
  • Matthieu Queloz, 29 Jul 2025, Explainability Through Systematicity: The Hard Systematicity Challenge for Artificial Intelligence, https://arxiv.org/abs/2507.22197
  • Matej \v{S}progar, 30 Jul 2025, AGITB: A Signal-Level Benchmark for Evaluating Artificial General Intelligence, https://arxiv.org/abs/2504.04430
  • Yining Hong, Rui Sun, Bingxuan Li, Xingcheng Yao, Maxine Wu, Alexander Chien, Da Yin, Ying Nian Wu, Zhecan James Wang, Kai-Wei Chang, 29 Jul 2025, Embodied Web Agents: Bridging Physical-Digital Realms for Integrated Agent Intelligence, https://arxiv.org/abs/2506.15677
  • Marc Schmitt, 30 Jul 2025, Strategic Integration of Artificial Intelligence in the C-Suite: The Role of the Chief AI Officer, https://arxiv.org/abs/2407.10247
  • Wil M.P. van der Aalst, 31 Jul 2025, No AI Without PI! Object-Centric Process Mining as the Enabler for Generative, Predictive, and Prescriptive Artificial Intelligence, https://arxiv.org/abs/2508.00116
  • Mohit Gupta, Debjit Bhowmick, Rhys Newbury, Meead Saberi, Shirui Pan and Ben Beck, 31 Jul 2025, INSPIRE-GNN: Intelligent Sensor Placement to Improve Sparse Bicycling Network Prediction via Reinforcement Learning Boosted Graph Neural Networks, https://arxiv.org/abs/2508.00141
  • Guilherme Guerino and Luiz Rodrigues and Luana Bianchiniand Mariana Alves and Marcelo Marinho and Thomaz Veloso and Valmir Macario and Diego Dermeval and Thales Vieira and Ig Bittencourt and Seiji Isotani, 31 Jul 2025, A Mixed User-Centered Approach to Enable Augmented Intelligence in Intelligent Tutoring Systems: The Case of MathAIde app, https://arxiv.org/abs/2508.00103
  • Rajpreet Singh, Vidhi Kothari, 1 Aug 2025, Composable OS Kernel Architectures for Autonomous Intelligence, https://arxiv.org/abs/2508.00604
  • S. V. Chekanov and H. Kjellerstrand, 1 Aug 2025, Discovering the underlying analytic structure within Standard Model constants using artificial intelligence, https://arxiv.org/abs/2507.00225
  • Christopher Wissuchek, Patrick Zschech, 7 Jul 2025, Exploring Agentic Artificial Intelligence Systems: Towards a Typological Framework, https://arxiv.org/abs/2508.00844
  • Boheng Liu and Ziyu Li and Xia Wu, 4 Aug 2025, Neuromorphic Computing with Multi-Frequency Oscillations: A Bio-Inspired Approach to Artificial Intelligence, https://arxiv.org/abs/2508.02191
  • Zhuo Yang, Jiaqing Xie, Shuaike Shen, Daolang Wang, Yeyun Chen, Ben Gao, Shuzhou Sun, Biqing Qi, Dongzhan Zhou, Lei Bai, Linjiang Chen, Shufei Zhang, Jun Jiang, Tianfan Fu, Yuqiang Li, 2 Aug 2025, SpectrumWorld: Artificial Intelligence Foundation for Spectroscopy, https://arxiv.org/abs/2508.01188
  • Md Zahidul Islam, Md Shafiqur Rahman, Md Sumsuzoha, Babul Sarker, Md Rafiqul Islam, Mahfuz Alam and Sanjib Kumar Shil, 2 Aug 2025, Cryptocurrency Price Forecasting Using Machine Learning: Building Intelligent Financial Prediction Models, https://arxiv.org/abs/2508.01419
  • Sangjun Park, Tony Q.S. Quek, Hyowoon Seo, 4 Aug 2025, Pigeon-SL: Robust Split Learning Framework for Edge Intelligence under Malicious Clients, https://arxiv.org/abs/2508.02235
  • Songlin Xu, Xinyu Zhang, 9 Jul 2025, Cognitive Exoskeleton: Augmenting Human Cognition with an AI-Mediated Intelligent Visual Feedback, https://arxiv.org/abs/2508.00846
  • Bang Liu, Xinfeng Li, Jiayi Zhang, Jinlin Wang, Tanjin He, Sirui Hong, Hongzhang Liu, Shaokun Zhang, Kaitao Song, Kunlun Zhu, Yuheng Cheng, Suyuchen Wang, Xiaoqiang Wang, Yuyu Luo, Haibo Jin, Peiyan Zhang, Ollie Liu, Jiaqi Chen, Huan Zhang, Zhaoyang Yu, Haochen Shi, Boyan Li, Dekun Wu, Fengwei Teng, Xiaojun Jia, Jiawei Xu, Jinyu Xiang, Yizhang Lin, Tianming Liu, Tongliang Liu, Yu Su, Huan Sun, Glen Berseth, Jianyun Nie, Ian Foster, Logan Ward, Qingyun Wu, Yu Gu, Mingchen Zhuge, Xinbing Liang, Xiangru Tang, Haohan Wang, Jiaxuan You, Chi Wang, Jian Pei, Qiang Yang, Xiaoliang Qi, Chenglin Wu, 2 Aug 2025, Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems, https://arxiv.org/abs/2504.01990
  • Andrew G. Breithaupt, Michael Weiner, Alice Tang, Katherine L. Possin, Marina Sirota, James Lah, Allan I. Levey, Pascal Van Hentenryck, Reza Zandehshahvar, Marilu Luisa Gorno-Tempini, Joseph Giorgio, Jingshen Wang, Andreas M. Rauschecker, Howard J. Rosen, Rachel L. Nosheny, Bruce L. Miller, Pedro Pinheiro-Chagas, 1 Aug 2025, Integrating Generative Artificial Intelligence in ADRD: A Roadmap for Streamlining Diagnosis and Care in Neurodegenerative Diseases, https://arxiv.org/abs/2502.06842
  • Haris Khan, Shumaila Asif, Hassan Nasir, Kamran Aziz Bhatti, Shahzad Amin Sheikh, 1 Aug 2025, Advances in Intelligent Hearing Aids: Deep Learning Approaches to Selective Noise Cancellation, https://arxiv.org/abs/2507.07043
  • AgiBot-World-Contributors, Qingwen Bu, Jisong Cai, Li Chen, Xiuqi Cui, Yan Ding, Siyuan Feng, Shenyuan Gao, Xindong He, Xuan Hu, Xu Huang, Shu Jiang, Yuxin Jiang, Cheng Jing, Hongyang Li, Jialu Li, Chiming Liu, Yi Liu, Yuxiang Lu, Jianlan Luo, Ping Luo, Yao Mu, Yuehan Niu, Yixuan Pan, Jiangmiao Pang, Yu Qiao, Guanghui Ren, Cheng Ruan, Jiaqi Shan, Yongjian Shen, Chengshi Shi, Mingkang Shi, Modi Shi, Chonghao Sima, Jianheng Song, Huijie Wang, Wenhao Wang, Dafeng Wei, Chengen Xie, Guo Xu, Junchi Yan, Cunbiao Yang, Lei Yang, Shukai Yang, Maoqing Yao, Jia Zeng, Chi Zhang, Qinglin Zhang, Bin Zhao, Chengyue Zhao, Jiaqi Zhao, Jianchao Zhu, 4 Aug 2025, AgiBot World Colosseo: A Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems, https://arxiv.org/abs/2503.06669
  • Xingdan Wang, Jiayi He, Zhiqing Tang, Jianxiong Guo, Jiong Lou, Liping Qian, Tian Wang, Weijia Jia, 5 Aug 2025, Adaptive AI Agent Placement and Migration in Edge Intelligence Systems, https://arxiv.org/abs/2508.03345
  • Davide Gabrielli, Bardh Prenkaj, Paola Velardi, Stefano Faralli, 5 Aug 2025, AI on the Pulse: Real-Time Health Anomaly Detection with Wearable and Ambient Intelligence, https://arxiv.org/abs/2508.03436
  • Albertus Denny Handoko and Riko I Made, 5 Aug 2025, Artificial Intelligence and Generative Models for Materials Discovery -- A Review, https://arxiv.org/abs/2508.03278
  • Siddhant Deshpande, Yalemzerf Getnet, Waltenegus Dargie, 4 Aug 2025, Detection of Intelligent Tampering in Wireless Electrocardiogram Signals Using Hybrid Machine Learning, https://arxiv.org/abs/2507.06402
  • Qiguang Chen, Mingda Yang, Libo Qin, Jinhao Liu, Zheng Yan, Jiannan Guan, Dengyun Peng, Yiyan Ji, Hanjing Li, Mengkang Hu, Yimeng Zhang, Yihao Liang, Yuhang Zhou, Jiaqi Wang, Zhi Chen, Wanxiang Che, 5 Aug 2025, AI4Research: A Survey of Artificial Intelligence for Scientific Research, https://arxiv.org/abs/2507.01903
  • Charles L. Wang, Trisha Singhal, Ameya Kelkar, Jason Tuo, 5 Aug 2025, MI9 -- Agent Intelligence Protocol: Runtime Governance for Agentic AI Systems, https://arxiv.org/abs/2508.03858
  • Wesley Brewer, Murali Meena Gopalakrishnan, Matthias Maiterth, Aditya Kashi, Jong Youl Choi, Pei Zhang, Stephen Nichols, Riccardo Balin, Miles Couchman, Stephen de Bruyn Kops, P.K. Yeung, Daniel Dotson, Rohini Uma-Vaideswaran, Sarp Oral, Feiyi Wang, 5 Aug 2025, Intelligent Sampling of Extreme-Scale Turbulence Datasets for Accurate and Efficient Spatiotemporal Model Training, https://arxiv.org/abs/2508.03872
  • Anna Romanova, 5 Aug 2025, Development of management systems using artificial intelligence systems and machine learning methods for boards of directors (preprint, unofficial translation), https://arxiv.org/abs/2508.03769
  • Fardis Nadimi, Payam Abdisarabshali, Kasra Borazjani, Jacob Chakareski, Seyyedali Hosseinalipour, 5 Aug 2025, Multi-Modal Multi-Task Federated Foundation Models for Next-Generation Extended Reality Systems: Towards Privacy-Preserving Distributed Intelligence in AR/VR/MR, https://arxiv.org/abs/2506.05683
  • Nan Li, Wanting Yang, Marie Siew, Zehui Xiong, Binbin Chen, Shiwen Mao, Kwok-Yan Lam, 6 Aug 2025, Edge-Assisted Collaborative Fine-Tuning for Multi-User Personalized Artificial Intelligence Generated Content (AIGC), https://arxiv.org/abs/2508.04745
  • Zhaowei Wang and Yunsong Huang and Weicheng Liu and Hui-Ming Wang, 7 Aug 2025, Anti-Jamming Sensing with Distributed Reconfigurable Intelligent Metasurface Antennas, https://arxiv.org/abs/2508.04964
  • Bo Wen, 7 Aug 2025, A Framework for Inherently Safer AGI through Language-Mediated Active Inference, https://arxiv.org/abs/2508.05766
  • Christian Meske, Justin Brenne, Erdi Uenal, Sabahat Oelcer and Ayseguel Doganguen, 8 Aug 2025, From Explainable to Explanatory Artificial Intelligence: Toward a New Paradigm for Human-Centered Explanations through Generative AI, https://arxiv.org/abs/2508.06352
  • Xiangzhe Xu, Shiwei Feng, Zian Su, Chengpeng Wang, Xiangyu Zhang, 8 Aug 2025, Position: Intelligent Coding Systems Should Write Programs with Justifications, https://arxiv.org/abs/2508.06017
  • Mojtaba Valipour, Kelly Zheng, James Lowman, Spencer Szabados, Mike Gartner, and Bobby Braswell, 8 Aug 2025, AGI for the Earth, the path, possibilities and how to evaluate intelligence of models that work with Earth Observation Data?, https://arxiv.org/abs/2508.06057
  • Ruben Laukkonen, Fionn Inglis, Shamil Chandaria, Lars Sandved-Smith, Edmundo Lopez-Sola, Jakob Hohwy, Jonathan Gold, Adam Elwood, 8 Aug 2025, Contemplative Artificial Intelligence, https://arxiv.org/abs/2504.15125
  • Ahmed Tlili, 9 Aug 2025, Between Fear and Desire, the Monster Artificial Intelligence (AI): Analysis through the Lenses of Monster Theory, https://arxiv.org/abs/2508.08318
  • Sejin Kim, Sundong Kim, 12 Aug 2025, System~2 Reasoning for Human--AI Alignment: Generality and Adaptivity via ARC-AGI, https://arxiv.org/abs/2410.07866
  • Ratun Rahman, 12 Aug 2025, Federated Learning: A Survey on Privacy-Preserving Collaborative Intelligence, https://arxiv.org/abs/2504.17703
  • Jared Edward Reser, 11 Aug 2025, Artificial Intelligence Software Structured to Simulate Human Working Memory, Mental Imagery, and Mental Continuity, https://arxiv.org/abs/2204.05138
  • Jing Liu, Yao Du, Kun Yang, Jiaqi Wu, Yan Wang, Xiping Hu, Zehua Wang, Yang Liu, Peng Sun, Azzedine Boukerche, Victor C.M. Leung, 12 Aug 2025, Edge-Cloud Collaborative Computing on Distributed Intelligence and Model Optimization: A Survey, https://arxiv.org/abs/2505.01821
  • Sundong Kim, 12 Aug 2025, The Othello AI Arena: Evaluating Intelligent Systems Through Limited-Time Adaptation to Unseen Boards, https://arxiv.org/abs/2508.09292
  • Meiping Wang, Jian Zhong, Rongduo Han, Liming Kang, Zhengkun Shi, Xiao Liang, Xing Lin, Nan Gao, Haining Zhang, 13 Aug 2025, An Automated Multi-Modal Evaluation Framework for Mobile Intelligent Assistants, https://arxiv.org/abs/2508.09507
  • Manuel Herrador, 13 Aug 2025, The PacifAIst Benchmark:Would an Artificial Intelligence Choose to Sacrifice Itself for Human Safety?, https://arxiv.org/abs/2508.09762
  • Changyuan Zhao, Guangyuan Liu, Ruichen Zhang, Yinqiu Liu, Jiacheng Wang, Jiawen Kang, Dusit Niyato, Zan Li, Xuemin (Sherman) Shen, Zhu Han, Sumei Sun, Chau Yuen, Dong In Kim, 13 Aug 2025, Edge General Intelligence Through World Models and Agentic AI: Fundamentals, Solutions, and Challenges, https://arxiv.org/abs/2508.09561
  • Xuanru Zhou, Cheng Li, Shuqiang Wang, Ye Li, Tao Tan, Hairong Zheng, and Shanshan Wang, 7 Aug 2025, Generative Artificial Intelligence in Medical Imaging: Foundations, Progress, and Clinical Translation, https://arxiv.org/abs/2508.09177
  • Abdolazim Rezaei, Mehdi Sookhak, Mahboobeh Haghparast, 7 Aug 2025, RL-MoE: An Image-Based Privacy Preserving Approach In Intelligent Transportation System, https://arxiv.org/abs/2508.09186
  • Fan Zhang, Zebang Cheng, Chong Deng, Haoxuan Li, Zheng Lian, Qian Chen, Huadai Liu, Wen Wang, Yi-Fan Zhang, Renrui Zhang, Ziyu Guo, Zhihong Zhu, Hao Wu, Haixin Wang, Yefeng Zheng, Xiaojiang Peng, Xian Wu, Kun Wang, Xiangang Li, Jieping Ye, Pheng-Ann Heng, 11 Aug 2025, MME-Emotion: A Holistic Evaluation Benchmark for Emotional Intelligence in Multimodal Large Language Models, https://arxiv.org/abs/2508.09210
  • Ronald Carvalho Boadana, Ademir Guimar\~aes da Costa Junior, Ricardo Rios, F\'abio Santos da Silva, 7 Aug 2025, LLM-Based Intelligent Agents for Music Recommendation: A Comparison with Classical Content-Based Filtering, https://arxiv.org/abs/2508.11671
  • Vincent C. M\"uller and Nick Bostrom, 9 Aug 2025, Future progress in artificial intelligence: A survey of expert opinion, https://arxiv.org/abs/2508.11681
  • Kiruthika Balakrishnan, Durgadevi Velusamy, Hana E. Hinkle, Zhi Li, Karthikeyan Ramasamy, Hikmat Khan, Srini Ramaswamy, and Pir Masoom Shah, 15 Aug 2025, Artificial Intelligence in Rural Healthcare Delivery: Bridging Gaps and Enhancing Equity through Innovation, https://arxiv.org/abs/2508.11738
  • E. Ulises Moya-S\'anchez, Abraham S\'anchez-Perez, Ra\'ul Nanclares Da Veiga, Alejandro Zarate-Mac\'ias, Edgar Villareal, Alejandro S\'anchez-Montes, Edtna Jauregui-Ulloa, H\'ector Moreno and Ulises Cort\'es, 17 Aug 2025, Design and Validation of a Responsible Artificial Intelligence-based System for the Referral of Diabetic Retinopathy Patients, https://arxiv.org/abs/2508.12506
  • Zhongang Cai, Yubo Wang, Qingping Sun, Ruisi Wang, Chenyang Gu, Wanqi Yin, Zhiqian Lin, Zhitao Yang, Chen Wei, Xuanke Shi, Kewang Deng, Xiaoyang Han, Zukai Chen, Jiaqi Li, Xiangyu Fan, Hanming Deng, Lewei Lu, Bo Li, Ziwei Liu, Quan Wang, Dahua Lin, Lei Yang, 18 Aug 2025, Has GPT-5 Achieved Spatial Intelligence? An Empirical Study, https://arxiv.org/abs/2508.13142
  • Ruitao Chen, Mozhang Guo, Jinge Li, 17 Aug 2025, Towards Infant Sleep-Optimized Driving: Synergizing Wearable and Vehicle Sensing in Intelligent Cruise Control, https://arxiv.org/abs/2506.06459
  • Aleksandr Algazinov, Joydeep Chandra, and Matt Laing, 18 Aug 2025, INSIGHT: A Survey of In-Network Systems for Intelligent, High-Efficiency AI and Topology Optimization, https://arxiv.org/abs/2505.24269
  • Xiao-Wen Yang, Jie-Jing Shao, Lan-Zhe Guo, Bo-Wen Zhang, Zhi Zhou, Lin-Han Jia, Wang-Zhou Dai and Yu-Feng Li, 19 Aug 2025, Neuro-Symbolic Artificial Intelligence: Towards Improving the Reasoning Abilities of Large Language Models, https://arxiv.org/abs/2508.13678
  • Soumyadeep Dhar, 19 Aug 2025, The Collaboration Paradox: Why Generative AI Requires Both Strategic Intelligence and Operational Stability in Supply Chain Management, https://arxiv.org/abs/2508.13942
  • Mahmoud Nazzal, Khoa Nguyen, Deepak Vungarala, Ramtin Zand, Shaahin Angizi, Hai Phan, Abdallah Khreishah, 23 Jul 2025, FedChip: Federated LLM for Artificial Intelligence Accelerator Chip Design, https://arxiv.org/abs/2508.13162
  • Hunter McNichols, Fareya Ikram, Andrew Lan, 19 Aug 2025, The StudyChat Dataset: Student Dialogues With ChatGPT in an Artificial Intelligence Course, https://arxiv.org/abs/2503.07928
  • Jesmin Jahan Tithi and Hanjiang Wu and Avishaii Abuhatzera and Fabrizio Petrini, 19 Aug 2025, Scaling Intelligence: Designing Data Centers for Next-Gen Language Models, https://arxiv.org/abs/2506.15006
  • Kevin Xu, Risto Miikkulainen, 19 Aug 2025, Neural Cellular Automata for ARC-AGI, https://arxiv.org/abs/2506.15746
  • Lian Lian, Yilin Li, Song Han, Renzi Meng, Sibo Wang, Ming Wang, 20 Aug 2025, Artificial Intelligence-Based Multiscale Temporal Modeling for Anomaly Detection in Cloud Services, https://arxiv.org/abs/2508.14503
  • Md Mainul Abrar, Xun Jia, Yujie Chi, 19 Aug 2025, New Insights into Automatic Treatment Planning for Cancer Radiotherapy Using Explainable Artificial Intelligence, https://arxiv.org/abs/2508.14229
  • Sabab Aosaf, Muhammad Ali Nayeem, Afsana Haque, M Sohel Rahmana, 21 Aug 2025, Computational Intelligence based Land-use Allocation Approaches for Mixed Use Areas, https://arxiv.org/abs/2508.15240
  • Johannes Schleiss and Anke Manukjan and Michelle Ines Bieber and Sebastian Lang and Sebastian Stober, 18 Aug 2025, Designing an Interdisciplinary Artificial Intelligence Curriculum for Engineering: Evaluation and Insights from Experts, https://arxiv.org/abs/2508.14921
  • John E. Hummel and Rachel F. Heaton, 20 Aug 2025, From Basic Affordances to Symbolic Thought: A Computational Phylogenesis of Biological Intelligence, https://arxiv.org/abs/2508.15082
  • Ben Dickson, August 19, 2025, LLMs generate ‘fluent nonsense’ when reasoning outside their training zone, https://venturebeat.com/ai/llms-generate-fluent-nonsense-when-reasoning-outside-their-training-zone/
  • Zhifeng Yang, Peizong Wu, 17 Aug 2025, Research on intelligent generation of structural demolition suggestions based on multi-model collaboration, https://arxiv.org/abs/2508.15820
  • Aparna Singh, Geetanjali Rathee, Chaker Abdelaziz Kerrache, Mohamed Chahine Ghanem, 22 Aug 2025, A Relay-Chain-Powered Ciphertext-Policy Attribute-Based Encryption in Intelligent Transportation Systems, https://arxiv.org/abs/2508.16189
  • Yang Deng, Zifeng Ren, An Zhang, Tat-Seng Chua, 22 Aug 2025, Towards Goal-oriented Intelligent Tutoring Systems in Online Education, https://arxiv.org/abs/2312.10053
  • Shouwei Ruan, Liyuan Wang, Caixin Kang, Qihui Zhu, Songming Liu, Xingxing Wei and Hang Su, 24 Aug 2025, From reactive to cognitive: brain-inspired spatial intelligence for embodied agents, https://arxiv.org/abs/2508.17198
  • R\'enald Gesnot, 15 Aug 2025, The Impact of Artificial Intelligence on Human Thought, https://arxiv.org/abs/2508.16628
  • Hongrak Pak, Ali Mostafavi, 21 Aug 2025, Situational Awareness as the Imperative Capability for Disaster Resilience in the Era of Complex Hazards and Artificial Intelligence, https://arxiv.org/abs/2508.16669
  • Jussi S. Jauhiainen, Aurora Toppari, 22 Aug 2025, Generative Artificial Intelligence and Agents in Research and Teaching, https://arxiv.org/abs/2508.16701
  • Anton Ludwig Bonin, Pawel Robert Smolinski, Jacek Winiarski, 22 Aug 2025, Exploring the Impact of Generative Artificial Intelligence on Software Development in the IT Sector: Preliminary Findings on Productivity, Efficiency and Job Security, https://arxiv.org/abs/2508.16811
  • Zixuan Dong, Baoyun Peng, Yufei Wang, Lin Liu, Xinxin Dong, Yunlong Cao, Xiaodong Wang, 25 Aug 2025, See What You Need: Query-Aware Visual Intelligence through Reasoning-Perception Loops, https://arxiv.org/abs/2508.17932
  • Kaushik Ravi, Andreas Br\"uck, 25 Aug 2025, Citizen Centered Climate Intelligence: Operationalizing Open Tree Data for Urban Cooling and Eco-Routing in Indian Cities, https://arxiv.org/abs/2508.17648
  • Yajing Yang, Qian Liu, Min-Yen Kan, 23 Aug 2025, DataTales: A Benchmark for Real-World Intelligent Data Narration, https://arxiv.org/abs/2410.17859
  • Christopher J. Mungall and Adnan Malik and Daniel R. Korn and Justin T. Reese and Noel M. O'Boyle, Noel and Janna Hastings, 24 Aug 2025, Chemical classification program synthesis using generative artificial intelligence, https://arxiv.org/abs/2505.18470
  • Maryam Ahang, Todd Charter, Mostafa Abbasi, Maziyar Khadivi, Oluwaseyi Ogunfowora, Homayoun Najjaran, 22 Aug 2025, Intelligent Condition Monitoring of Industrial Plants: An Overview of Methodologies and Uncertainty Management Strategies, https://arxiv.org/abs/2401.10266
  • He Hu, Yucheng Zhou, Lianzhong You, Hongbo Xu, Qianning Wang, Zheng Lian, Fei Richard Yu, Fei Ma, Laizhong Cui, 25 Aug 2025, EmoBench-M: Benchmarking Emotional Intelligence for Multimodal Large Language Models, https://arxiv.org/abs/2502.04424

General Research on Reasoning Techniques

See the long list of AI reasoning research papers.

Reasoning and CoT Efficiency Topics

Blog articles on reasoning efficiency:

More research information on general efficiency optimization techniques for reasoning models:

Efficiency optimizations to Chain-of-Thought include:

AI Books from Aussie AI



The Sweetest Lesson: Your Brain Versus AI The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
  • Your brain is 50 times bigger than the best AI engines.
  • Truly intelligent AI will require more compute!
  • Another case of the bitter lesson?
  • Maybe it's the opposite of that: the sweetest lesson.

Get your copy from Amazon: The Sweetest Lesson



RAG Optimization RAG Optimization: Accurate and Efficient LLM Applications: new book on RAG architectures:
  • Smarter RAG
  • Faster RAG
  • Cheaper RAG
  • Agentic RAG
  • RAG reasoning

Get your copy from Amazon: RAG Optimization



Generative AI in C++ Generative AI Applications book:
  • Deciding on your AI project
  • Planning for success and safety
  • Designs and LLM architectures
  • Expediting development
  • Implementation and deployment

Get your copy from Amazon: Generative AI Applications



Generative AI in C++ Generative AI programming book:
  • Generative AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++



CUDA C++ Optimization CUDA C++ Optimization book:
  • Faster CUDA C++ kernels
  • Optimization tools & techniques
  • Compute optimization
  • Memory optimization

Get your copy from Amazon: CUDA C++ Optimization



CUDA C++ Optimization CUDA C++ Debugging book:
  • Debugging CUDA C++ kernels
  • Tools & techniques
  • Self-testing & reliability
  • Common GPU kernel bugs

Get your copy from Amazon: CUDA C++ Debugging

More AI Research

Read more about: