Aussie AI
Reasoning
-
Last Updated 12 December, 2024
-
by David Spuler, Ph.D.
Reasoning is a key part of intelligence, and much work is ongoing to improve higher-level reasoning of AI models. Examples include solving mathematical problems or performing multi-step planning such as booking a holiday.
There are two main categories of methods to improve reasoning ability:
- Training methods ("white box reasoning")
- Multi-step inference methods ("black box reasoning")
White Box Reasoning. The first idea is ways to train an LLM better, so that it has improved results on "reasoning" and "generalization" tasks. Fine-tuning on a more specialized subset of relevant data is a particular submethod of this area. There has been much improvement in this area, in both the capabilities of high-end large SOTA models and also at the other end of the spectrum with Small Language Models (SLMs). See more about training methods.
Black Box Reasoning. The second idea is to treat the LLM as a "black box" and try to use more LLM calls. These are called "few-shot" or "many-shot" or "multi-step" reasoning methods. Chain-of-thought is the best known of these methods, having been adopted by OpenAI for the "o1" models released in September, 2024. However, multi-step reasoning is a longstanding area of research, with much overlap with prompt engineering techniques, and there are numerous methods of doing this type of multiple calls to LLMs in the literature:
- Chain-of-thought (CoT)
- Self-reflection
- Skeleton-of-thought
- Best-of-N (BoN) method
- Self-consistency decoding
- Programmatic prompting
- Tree-of-Thoughts (ToT) prompting
- Chain-of-Symbols (CoS) prompting
- Graph-of-Thoughts (GoT)
- Chain-of-Table prompting
- Thread-of-Thought (ThoT) prompting
- System 2 Attention (S2A) prompting
- Chain-of-Verification (CoVe) prompting
- ReAct prompting (reason-and-act)
- Rephrase-and-Respond (RaR) prompting
- Chain-of-Knowledge (CoK) prompting
- Contrastive Chain-of-Thought (CCoT) prompting
- Program of Thoughts (PoT) prompting
- Structured Chain-of-Thought (SCoT) prompting
- Chain-of-Code (CoC) prompting
- Take a Step Back prompting
Also related to these areas are the various other ways to have the LLM give a "better" answer, even if it's not really using improved reasoning. The simplest ideas include prompt engineering techniques to give the LLM a better query, RAG architectures and Retrieval Augmented Language Models (RALM) to give an LLM more relevant source data, and also dynamic tool usage integrations to generalize the LLM's capabilities to handle answers that require computations. Also relevant is the research on improving answers by fixing specific LLM limitations such as hallucinations, mathematical problem solving difficulties, and language wordplay (in)abilities.
Multi-Step Inference for Reasoning
- Lingjiao Chen, Jared Quincy Davis, Boris Hanin, Peter Bailis, Ion Stoica, Matei Zaharia, James Zou, 4 Jun 2024 (v2), Are More LLM Calls All You Need? Towards Scaling Laws of Compound Inference Systems, https://arxiv.org/abs/2403.02419
- Justin Chih-Yao Chen, Archiki Prasad, Swarnadeep Saha, Elias Stengel-Eskin, Mohit Bansal, 18 Sep 2024, MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning, https://arxiv.org/abs/2409.12147 https://github.com/dinobby/MAgICoRe
- Xiaohan Xu, Chongyang Tao, Tao Shen, Can Xu, Hongbo Xu, Guodong Long, Jian-guang Lou, 29 Feb 2024 (v2), Re-Reading Improves Reasoning in Large Language Models, https://arxiv.org/abs/2309.06275
- Artem Shelamanov, Sep 2024, Why OpenAI’s o1 Model Is A Scam, https://pub.towardsai.net/why-openais-o1-model-is-a-scam-eb3356c3d70e
- Rinon Gal, Adi Haviv, Yuval Alaluf, Amit H. Bermano, Daniel Cohen-Or, Gal Chechik, 2 Oct 2024, ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation, https://arxiv.org/abs/2410.01731 https://comfygen-paper.github.io/
- Yi Cheng, Xiao Liang, Yeyun Gong, Wen Xiao, Song Wang, Yuji Zhang, Wenjun Hou, Kaishuai Xu, Wenge Liu, Wenjie Li, Jian Jiao, Qi Chen, Peng Cheng, Wayne Xiong, Integrative Decoding: Improve Factuality via Implicit Self-consistency, 3 Oct 2024 (v2), https://arxiv.org/abs/2410.01556 (Prepends a previous response to improve decoding accuracy.)
- Zhenwen Liang, Ye Liu, Tong Niu, Xiangliang Zhang, Yingbo Zhou, Semih Yavuz, 5 Oct 2024, Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification, https://arxiv.org/abs/2410.05318
- Sonya Huang, Pat Grady, and o1, Sequoia, October 9, 2024 Generative AI’s Act o1, https://www.sequoiacap.com/article/generative-ais-act-o1/
- Yingqian Cui, Pengfei He, Xianfeng Tang, Qi He, Chen Luo, Jiliang Tang, Yue Xing, 21 Oct 2024, A Theoretical Understanding of Chain-of-Thought: Coherent Reasoning and Error-Aware Demonstration, https://arxiv.org/abs/2410.16540
- Jiangming Liu, Matt Gardner, Shay B. Cohen, Mirella Lapata, 7 Jun 2021 (v2), Multi-Step Inference for Reasoning Over Paragraphs, https://arxiv.org/abs/2004.02995
- Aditya Kalyanpur, Kailash Karthik Saravanakumar, Victor Barres, CJ McFate, Lori Moon, Nati Seifu, Maksim Eremeev, Jose Barrera, Abraham Bautista-Castillo, Eric Brown, David Ferrucci 24 Jul 2024 (v4), Multi-step Inference over Unstructured Data https://arxiv.org/abs/2406.17987
- Banghao Chen, Zhaofeng Zhang, Nicolas Langrené, Shengxin Zhu, 5 Sep 2024 (v5), Unleashing the potential of prompt engineering in Large Language Models: a comprehensive review, https://arxiv.org/abs/2310.14735
- Xiaodong Liu, Kevin Duh, Jianfeng Gao, 30 Mar 2019 (v2), Stochastic Answer Networks for Natural Language Inference, https://arxiv.org/abs/1804.07888
- TED, Oct 2024, Multi-Step Reasoning Agents, https://tedai-sanfrancisco.ted.com/glossary/multi-step-reasoning-agents/
- Yao Fu, Hao Peng, Ashish Sabharwal, Peter Clark, Tushar Khot, 30 Jan 2023 (v2), Complexity-Based Prompting for Multi-Step Reasoning, https://arxiv.org/abs/2210.00720
- Junting Lu, Oct 2024 (accessed), Awesome-LLM-Reasoning-Techniques, https://github.com/Junting-Lu/Awesome-LLM-Reasoning-Techniques
- Cameron R. Wolfe, Dec 23, 2023, Tree of Thoughts Prompting. Solving multi-step problems with LLMs via deliberate planning and exploration, https://towardsdatascience.com/tree-of-thoughts-prompting-65a3e51f9ac4
- Data Camp, Jul 10, 2024, Chain-of-Thought Prompting: Step-by-Step Reasoning with LLMs, https://www.datacamp.com/tutorial/chain-of-thought-prompting
- Pankaj, Dec 21, 2023, Chain of Thought Prompting: Guiding LLMs Step-by-Step, https://medium.com/@pankaj_pandey/chain-of-thought-prompting-guiding-llms-step-by-step-e6eac32d02d8
- Cobus Greyling, Aug 2, 2023, 12 Prompt Engineering Techniques, https://cobusgreyling.medium.com/12-prompt-engineering-techniques-644481c857aa
- Cameron R. Wolfe, Aug 21, 2023, Tree of Thoughts Prompting. Solving multi-step problems with LLMs via deliberate planning and exploration, https://cameronrwolfe.substack.com/p/tree-of-thoughts-prompting
- Cameron R. Wolfe, Jan 3, 2024, Graph-Based Prompting and Reasoning with Language Models. Understanding graph of thoughts prompting and several variants… https://towardsdatascience.com/graph-based-prompting-and-reasoning-with-language-models-d6acbcd6b3d8
- Jason Wei and Denny Zhou, May 11, 2022, Language Models Perform Reasoning via Chain of Thought, https://research.google/blog/language-models-perform-reasoning-via-chain-of-thought/
- Cameron R. Wolfe, Jul 24, 2023, Chain of Thought Prompting for LLMs: A practical and simple approach for “reasoning” with LLMs, https://towardsdatascience.com/chain-of-thought-prompting-for-llms-33c963eead38
- Siwei Wu, Zhongyuan Peng, Xinrun Du, Tuney Zheng, Minghao Liu, Jialong Wu, Jiachen Ma, Yizhi Li, Jian Yang, Wangchunshu Zhou, Qunshu Lin, Junbo Zhao, Zhaoxiang Zhang, Wenhao Huang, Ge Zhang, Chenghua Lin, J.H. Liu, 22 Oct 2024 (v2), A Comparative Study on Reasoning Patterns of OpenAI's o1 Model, https://arxiv.org/abs/2410.13639
- Arun Shankar, Oct 2024, Designing Cognitive Architectures: Agentic Workflow Patterns from Scratch, https://medium.com/google-cloud/designing-cognitive-architectures-agentic-workflow-patterns-from-scratch-63baa74c54bc
- Tanay Jaipuria, Oct 29, 2024, OpenAI's o-1 and inference-time scaling laws, https://www.tanayj.com/p/openais-o-1-and-inference-time-scaling
- Jinlin Wang, Suyuchen Wang, Ziwen Xia, Sirui Hong, Yun Zhu, Bang Liu, Chenglin Wu, 28 Oct 2024, FACT: Examining the Effectiveness of Iterative Context Rewriting for Multi-fact Retrieval, https://arxiv.org/abs/2410.21012
- Latent Space, Nov 05, 2024, Inference, Fast and Slow. When System 1/System 2 analogies are not enough: The 6 types of LLM inference https://www.latent.space/p/inference-fast-and-slow
- Shanghaoran Quan, Tianyi Tang, Bowen Yu, An Yang, Dayiheng Liu, Bofei Gao, Jianhong Tu, Yichang Zhang, Jingren Zhou, Junyang Lin, 31 Oct 2024, Language Models can Self-Lengthen to Generate Long Texts, https://arxiv.org/abs/2410.23933?
- LangChain, Nov 7, 2024. SCIPE - Systematic Chain Improvement and Problem Evaluation, https://blog.langchain.dev/scipe-systematic-chain-improvement-and-problem-evaluation/ https://github.com/garg-ankush/scipe/tree/main
- X Wang, L Mu, J Zhang, H Xu, 2024, Multi-pass Decoding for Grammatical Error Correction, Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 9904–9916, November 12-16, 2024, https://aclanthology.org/2024.emnlp-main.553.pdf
- Siyun Zhao, Yuqing Yang, Zilong Wang, Zhiyuan He, Luna K. Qiu, Lili Qiu, 23 Sep 2024, Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely, https://arxiv.org/abs/2409.14924
- Guowei Xu, Peng Jin, Li Hao, Yibing Song, Lichao Sun, Li Yuan, 15 Nov 2024, LLaVA-o1: Let Vision Language Models Reason Step-by-Step, https://arxiv.org/abs/2411.10440
- Carl Franzen, November 20, 2024, DeepSeek’s first reasoning model R1-Lite-Preview turns heads, beating OpenAI o1 performance, https://venturebeat.com/ai/deepseeks-first-reasoning-model-r1-lite-preview-turns-heads-beating-openai-o1-performance/
- mshumer, Nov 2024, Open Reasoning Engine, https://github.com/mshumer/OpenReasoningEngine
- Eric Horvitz , Harsha Nori , Naoto Usuyama , November 27, 2024 Advances in run-time strategies for next-generation foundation models, Microsoft Research Blog, https://www.microsoft.com/en-us/research/blog/advances-in-run-time-strategies-for-next-generation-foundation-models/
- Harsha Nori, Naoto Usuyama, Nicholas King, Scott Mayer McKinney, Xavier Fernandes, Sheng Zhang, Eric Horvitz, 6 Nov 2024, From Medprompt to o1: Exploration of Run-Time Strategies for Medical Challenge Problems and Beyond, https://arxiv.org/abs/2411.03590
- Hieu Tran, Zonghai Yao, Junda Wang, Yifan Zhang, Zhichao Yang, Hong Yu, 5 Dec 2024 (v2), RARE: Retrieval-Augmented Reasoning Enhancement for Large Language Models, https://arxiv.org/abs/2412.02830
- Zhe Chen, Weiyun Wang, Yue Cao, Yangzhou Liu, Zhangwei Gao, Erfei Cui, Jinguo Zhu, Shenglong Ye, Hao Tian, Zhaoyang Liu, Lixin Gu, Xuehui Wang, Qingyun Li, Yimin Ren, Zixuan Chen, Jiapeng Luo, Jiahao Wang, Tan Jiang, Bo Wang, Conghui He, Botian Shi, Xingcheng Zhang, Han Lv, Yi Wang, Wenqi Shao, Pei Chu, Zhongying Tu, Tong He, Zhiyong Wu, Huipeng Deng, Jiaye Ge, Kai Chen, Min Dou, Lewei Lu, Xizhou Zhu, Tong Lu, Dahua Lin, Yu Qiao, Jifeng Dai, Wenhai Wang, 6 Dec 2024, Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling, https://arxiv.org/abs/2412.05271
General Research on Intelligence
What does it mean to be smart? There are various answers to this, and it's a very nuanced question.
Research on intelligence or "smartness" of AI systems:
- Liangming Pan, Michael Saxon, Wenda Xu, Deepak Nathani, Xinyi Wang, William Yang Wang, May 03 2024, Automatically Correcting Large Language Models: Surveying the Landscape of Diverse Automated Correction Strategies, https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00660/120911
- MC Planning, 2024, Can Language Models Be Used in Multistep Commonsense Planning Domains? Artificial General Intelligence https://link.springer.com/book/10.1007/978-3-031-33469-6#page=288
- JESSICA STILLMAN, APRIL 9, 2024, Scientists Pitted 4-Year-Olds Against AI. The Kids Crushed the Machines at This 1 Crucial Skill, https://www.inc-aus.com/jessica-stillman/scientists-pitted-4-year-olds-against-ai-kids-crushed-machines-1-skill.html (AI engines failed at using unusual objects for tasks, such as using something else to bang a nail that wasn't a hammer, i.e., a type of reasoning or thinking creatively.)
- Diana Hu, 29/03/2024, Building AI Models is faster and cheaper than you probably think, Y Combinator, https://www.ycombinator.com/blog/building-ai-models
- David Spuler, March 2024, Chapter 43. Overview of AI Research, Generative AI in C++: Coding Transformers and LLMs, https://www.amazon.com/dp/B0CXJKCWX9
- Rachel Metz, July 12, 2024, OpenAI Scale Ranks Progress Toward ‘Human-Level’ Problem Solving: The company believes its technology is approaching the second level of five on the path to artificial general intelligence, Bloomberg, https://www.bloomberg.com/news/articles/2024-07-11/openai-sets-levels-to-track-progress-toward-superintelligent-ai?sref=P6Q0mxvj
Chain-of-Thought (CoT) Reasoning
Research papers on chain-of-thought (CoT) for reasoning:
- Maciej Besta, Florim Memedi, Zhenyu Zhang, Robert Gerstenberger, Guangyuan Piao, Nils Blach, Piotr Nyczyk, Marcin Copik, Grzegorz Kwaśniewski, Jürgen Müller, Lukas Gianinazzi, Ales Kubicek, Hubert Niewiadomski, Aidan O'Mahony, Onur Mutlu, Torsten Hoefler, 5 Apr 2024, Demystifying Chains, Trees, and Graphs of Thoughts, https://arxiv.org/abs/2401.14295 http://htor.ethz.ch/publications/img/besta-topologies.pdf
- Jacob Pfau, William Merrill, Samuel R. Bowman, 24 Apr 2024, Let's Think Dot by Dot: Hidden Computation in Transformer Language Models, https://arxiv.org/abs/2404.15758
- Hongxuan Zhang, Zhining Liu, Jiaqi Zheng, Chenyi Zhuang, Jinjie Gu, Guihai Chen, Nov 2023, Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads to Answers Faster, https://arxiv.org/abs/2311.08263
- Hunter Lightman, Vineet Kosaraju, Yura Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, Karl Cobbe, May 2023, Let's Verify Step by Step, https://arxiv.org/abs/2305.20050
- Xuan Zhang, Chao Du, Tianyu Pang, Qian Liu, Wei Gao, Min Lin, 13 Jun 2024, Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs, https://arxiv.org/abs/2406.09136 Code: https://github.com/sail-sg/CPO
- kipply's blog, 2023-03-30, Transformer Taxonomy (the last lit review), https://kipp.ly/transformer-taxonomy/ (Papers for all the Transformer architectures and milestone papers for the major optimization improvements on them.)
- Daniel Lopes, June 21, 2024, A Comprehensive Guide to Text Prompt Engineering Techniques, https://journal.daniellopes.dev/p/practical-prompt-engineering-notes
- Wenxiao Wang, Wei Chen, Yicong Luo, Yongliu Long, Zhengkai Lin, Liye Zhang, Binbin Lin, Deng Cai, Xiaofei He, 15 Feb 2024, Model Compression and Efficient Inference for Large Language Models: A Survey, https://arxiv.org/abs/2402.09748
- Hao Zhou, Chengming Hu, Ye Yuan, Yufei Cui, Yili Jin, Can Chen, Haolun Wu, Dun Yuan, Li Jiang, Di Wu, Xue Liu, Charlie Zhang, Xianbin Wang, Jiangchuan Liu, 17 May 2024, Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities, https://arxiv.org/abs/2405.10825
- Yu Wang, Shiwan Zhao, Zhihu Wang, Heyuan Huang, Ming Fan, Yubo Zhang, Zhixing Wang, Haijun Wang, Ting Liu, 5 Sep 2024, Strategic Chain-of-Thought: Guiding Accurate Reasoning in LLMs through Strategy Elicitation, https://arxiv.org/abs/2409.03271
- Asankhaya Sharma (codelion), Sep 2024, Optillm: Optimizing inference proxy for LLMs, https://github.com/codelion/optillm
- Ziqi Jin, Wei Lu, 6 Sep 2024, Self-Harmonized Chain of Thought, https://arxiv.org/abs/2409.04057
- Pranab Sahoo, Ayush Kumar Singh, Sriparna Saha, Vinija Jain, Samrat Mondal, Aman Chadha, 5 Feb 2024, A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications, https://arxiv.org/abs/2402.07927
- Shizhe Diao, Pengcheng Wang, Yong Lin, Rui Pan, Xiang Liu, Tong Zhang, 21 Jul 2024 (v5), Active Prompting with Chain-of-Thought for Large Language Models, https://arxiv.org/abs/2302.12246 https://github.com/shizhediao/active-prompt
- Zhuosheng Zhang, Aston Zhang, Mu Li, Alex Smola, 7 Oct 2022, Automatic Chain of Thought Prompting in Large Language Models, https://arxiv.org/abs/2210.03493 https://github.com/amazon-research/auto-cot
- Louis Bouchard, Sep 12, 2024, OpenAI's o1 Model: The Future of Reasoning AI? What Sets It Apart, How OpenAI's o1 Model Thinks Through Problems (And Why It's Slower), https://www.louisbouchard.ai/openai-o1/
- OpenAI, September 12, 2024, Learning to Reason with LLMs, https://openai.com/index/learning-to-reason-with-llms/
- Emilia David, September 12, 2024, How to prompt on OpenAI’s new o1 models, https://venturebeat.com/ai/how-to-prompt-on-openai-o1/ (Prompt engineering is different for o1, such as "don't use chain of thought.")
- Du Phan, Matthew D. Hoffman, David Dohan, Sholto Douglas, Tuan Anh Le, Aaron Parisi, Pavel Sountsov, Charles Sutton, Sharad Vikram, Rif A. Saurous, 28 Nov 2023, Training Chain-of-Thought via Latent-Variable Inference, https://arxiv.org/abs/2312.02179
- Trung Quoc Luong, Xinbo Zhang, Zhanming Jie, Peng Sun, Xiaoran Jin, Hang Li, 27 Jun 2024 (v2), ReFT: Reasoning with Reinforced Fine-Tuning, https://arxiv.org/abs/2401.08967
- Tianqiao Liu, Zui Chen, Zitao Liu, Mi Tian, Weiqi Luo, 13 Sep 2024, Expediting and Elevating Large Language Model Reasoning via Hidden Chain-of-Thought Decoding, https://arxiv.org/abs/2409.08561
- Zayne Sprague, Fangcong Yin, Juan Diego Rodriguez, Dongwei Jiang, Manya Wadhwa, Prasann Singhal, Xinyu Zhao, Xi Ye, Kyle Mahowald, Greg Durrett, 18 Sep 2024, To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning, https://arxiv.org/abs/2409.12183
- Santosh Kumar Radha, Yasamin Nouri Jelyani, Ara Ghukasyan, Oktay Goktas, 19 Sep 2024, Iteration of Thought: Leveraging Inner Dialogue for Autonomous Large Language Model Reasoning, https://arxiv.org/abs/2409.12618
- Artem Shelamanov, Sep 2024, Why OpenAI’s o1 Model Is A Scam, https://pub.towardsai.net/why-openais-o1-model-is-a-scam-eb3356c3d70e
- Chung-Yu Wang, Alireza DaghighFarsoodeh, Hung Viet Pham, 24 Sep 2024, Task-oriented Prompt Enhancement via Script Generation, https://arxiv.org/abs/2409.16418
- Cassandra A. Cohen, William W. Cohen, 17 Sep 2024, Watch Your Steps: Observable and Modular Chains of Thought, https://arxiv.org/abs/2409.15359
- Tongxuan Liu, Wenjiang Xu, Weizhe Huang, Xingyu Wang, Jiaxing Wang, Hailong Yang, Jing Li, 26 Sep 2024, Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in Large Language Models, https://arxiv.org/abs/2409.17539
- Zhenwen Liang, Ye Liu, Tong Niu, Xiangliang Zhang, Yingbo Zhou, Semih Yavuz, 5 Oct 2024, Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification, https://arxiv.org/abs/2410.05318
- Qiguang Chen, Libo Qin, Jiaqi Wang, Jinxuan Zhou, Wanxiang Che, 8 Oct 2024, Unlocking the Boundaries of Thought: A Reasoning Granularity Framework to Quantify and Optimize Chain-of-Thought, https://arxiv.org/abs/2410.05695 https://github.com/LightChen233/reasoning-granularity
- Yingqian Cui, Pengfei He, Xianfeng Tang, Qi He, Chen Luo, Jiliang Tang, Yue Xing, 21 Oct 2024, A Theoretical Understanding of Chain-of-Thought: Coherent Reasoning and Error-Aware Demonstration, https://arxiv.org/abs/2410.16540
- Banghao Chen, Zhaofeng Zhang, Nicolas Langrené, Shengxin Zhu, 5 Sep 2024 (v5), Unleashing the potential of prompt engineering in Large Language Models: a comprehensive review, https://arxiv.org/abs/2310.14735
- Data Camp, Jul 10, 2024, Chain-of-Thought Prompting: Step-by-Step Reasoning with LLMs, https://www.datacamp.com/tutorial/chain-of-thought-prompting
- Pankaj, Dec 21, 2023, Chain of Thought Prompting: Guiding LLMs Step-by-Step, https://medium.com/@pankaj_pandey/chain-of-thought-prompting-guiding-llms-step-by-step-e6eac32d02d8
- Jason Wei and Denny Zhou, May 11, 2022, Language Models Perform Reasoning via Chain of Thought, https://research.google/blog/language-models-perform-reasoning-via-chain-of-thought/
- Cameron R. Wolfe, Jul 24, 2023, Chain of Thought Prompting for LLMs: A practical and simple approach for “reasoning” with LLMs, https://towardsdatascience.com/chain-of-thought-prompting-for-llms-33c963eead38
- Siwei Wu, Zhongyuan Peng, Xinrun Du, Tuney Zheng, Minghao Liu, Jialong Wu, Jiachen Ma, Yizhi Li, Jian Yang, Wangchunshu Zhou, Qunshu Lin, Junbo Zhao, Zhaoxiang Zhang, Wenhao Huang, Ge Zhang, Chenghua Lin, J.H. Liu, 22 Oct 2024 (v2), A Comparative Study on Reasoning Patterns of OpenAI's o1 Model, https://arxiv.org/abs/2410.13639
- Tanay Jaipuria, Oct 29, 2024, OpenAI's o-1 and inference-time scaling laws, https://www.tanayj.com/p/openais-o-1-and-inference-time-scaling
- Junda Wu, Xintong Li, Ruoyu Wang, Yu Xia, Yuxin Xiong, Jianing Wang, Tong Yu, Xiang Chen, Branislav Kveton, Lina Yao, Jingbo Shang, Julian McAuley, 31 Oct 2024, OCEAN: Offline Chain-of-thought Evaluation and Alignment in Large Language Models, https://arxiv.org/abs/2410.23703
- Siyun Zhao, Yuqing Yang, Zilong Wang, Zhiyuan He, Luna K. Qiu, Lili Qiu, 23 Sep 2024, Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely, https://arxiv.org/abs/2409.14924
- Guowei Xu, Peng Jin, Li Hao, Yibing Song, Lichao Sun, Li Yuan, 15 Nov 2024, LLaVA-o1: Let Vision Language Models Reason Step-by-Step, https://arxiv.org/abs/2411.10440
- Carl Franzen, November 20, 2024, DeepSeek’s first reasoning model R1-Lite-Preview turns heads, beating OpenAI o1 performance, https://venturebeat.com/ai/deepseeks-first-reasoning-model-r1-lite-preview-turns-heads-beating-openai-o1-performance/
- Yu Zhao, Huifeng Yin, Bo Zeng, Hao Wang, Tianqi Shi, Chenyang Lyu, Longyue Wang, Weihua Luo, Kaifu Zhang, 21 Nov 2024, Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions, https://arxiv.org/abs/2411.14405
- Jun Gao, Yongqi Li, Ziqiang Cao, Wenjie Li, 29 Nov 2024, Interleaved-Modal Chain-of-Thought, https://arxiv.org/abs/2411.19488 (Using CoT on a multimodal/vision model.)
- Hieu Tran, Zonghai Yao, Junda Wang, Yifan Zhang, Zhichao Yang, Hong Yu, 5 Dec 2024 (v2), RARE: Retrieval-Augmented Reasoning Enhancement for Large Language Models, https://arxiv.org/abs/2412.02830
- Tiernan Ray, Dec. 10, 2024, How Cerebras boosted Meta's Llama to 'frontier model' performance The company also demonstrates initial training of a one-trillion-parameter AI model on a single machine using conventional DDR5 memory chips. https://www.zdnet.com/article/how-cerebras-boosted-metas-llama-to-frontier-model-performance/
- Shibo Hao, Sainbayar Sukhbaatar, DiJia Su, Xian Li, Zhiting Hu, Jason Weston, Yuandong Tian, 9 Dec 2024, Training Large Language Models to Reason in a Continuous Latent Space, https://arxiv.org/abs/2412.06769
- Ben Dickson, December 10, 2024, OpenAI’s o1 model doesn’t show its thinking, giving open source an advantage, https://venturebeat.com/ai/heres-how-openai-o1-might-lose-ground-to-open-source-models/
- Zhe Chen, Weiyun Wang, Yue Cao, Yangzhou Liu, Zhangwei Gao, Erfei Cui, Jinguo Zhu, Shenglong Ye, Hao Tian, Zhaoyang Liu, Lixin Gu, Xuehui Wang, Qingyun Li, Yimin Ren, Zixuan Chen, Jiapeng Luo, Jiahao Wang, Tan Jiang, Bo Wang, Conghui He, Botian Shi, Xingcheng Zhang, Han Lv, Yi Wang, Wenqi Shao, Pei Chu, Zhongying Tu, Tong He, Zhiyong Wu, Huipeng Deng, Jiaye Ge, Kai Chen, Min Dou, Lewei Lu, Xizhou Zhu, Tong Lu, Dahua Lin, Yu Qiao, Jifeng Dai, Wenhai Wang, 6 Dec 2024, Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling, https://arxiv.org/abs/2412.05271
Skeleton-of-Thought
Skeleton-of-thought is a technique with dual aims of smarter reasoning and faster inference. The idea is to generate an outline that is a list of points, and then have the LLM process each sub-point in parallel. This allows both a more focused answer to that issue, and a faster parallelization of shorter token length answers.
Research on skeleton-of-thought reasoning includes:
- L. Zheng, L. Yin, Z. Xie, J. Huang, C. Sun, C. H. Yu, S. Cao, C. Kozyrakis, I. Stoica, J. E. Gonzalez et al., Dec 2023, Efficiently programming large language models using SGLang, arXiv preprint arXiv:2312.07104, 2023, https://arxiv.org/abs/2312.07104 (Uses a radix attention method, a trie or prefix tree, for KV caching.)
- Xuefei Ning , Zinan Lin , November 17, 2023 Skeleton-of-Thought: Parallel decoding speeds up and improves LLM output, Microsoft Research Blog, https://www.microsoft.com/en-us/research/blog/skeleton-of-thought-parallel-decoding-speeds-up-and-improves-llm-output/ Code: https://github.com/imagination-research/sot/
- S. Jin, Y. Wu, H. Zheng, Q. Zhang, M. Lentz, Z. M. Mao, A. Prakash, F. Qian, and D. Zhuo, “Adaptive skeleton graph decoding,” arXiv preprint arXiv:2402.12280, 2024. https://arxiv.org/abs/2402.12280
- M. Liu, A. Zeng, B. Wang, P. Zhang, J. Tang, and Y. Dong, “Apar: Llms can do auto-parallel auto-regressive decoding,” arXiv preprint arXiv:2401.06761, 2024. https://arxiv.org/abs/2401.06761
- 8 Jun 2024 (v2), A Survey on Efficient Inference for Large Language Models, Zixuan Zhou, Xuefei Ning, Ke Hong, Tianyu Fu, Jiaming Xu, Shiyao Li, Yuming Lou, Luning Wang, Zhihang Yuan, Xiuhong Li, Shengen Yan, Guohao Dai, Xiao-Ping Zhang, Yuhan Dong, Yu Wang, https://arxiv.org/abs/2404.14294
- Mahsa Khoshnoodi, Vinija Jain, Mingye Gao, Malavika Srikanth, Aman Chadha, 24 May 2024 (v2), A Comprehensive Survey of Accelerated Generation Techniques in Large Language Models, https://arxiv.org/abs/2405.13019
- Steven Kolawole, KeshavSanthanam, Virginia Smith, Pratiksha Thaker, Nov 2024, Extracting Parallelism from LargeLanguageModelQueries, https://openreview.net/pdf?id=CZHt9kLS5S
Reflection
Reflection, or self-reflection, is a type of reasoning where the LLM takes an extra step to "reflect" on its own answers. This is a type of multi-step reasoning method, where the LLM is admonished to improve its own answers. There are different variants of self-reflection for training improvement or inference improvement.
Research papers on reflection:
- Cogni Down Under, Sep 2024, Reflection 70B: The AI That Thinks Before It Speaks, https://medium.com/@cognidownunder/reflection-70b-the-ai-that-thinks-before-it-speaks-8a70d3a0e38a
- Asankhaya Sharma (codelion), Sep 2024, Optillm: Optimizing inference proxy for LLMs, https://github.com/codelion/optillm
- Pranab Sahoo, Ayush Kumar Singh, Sriparna Saha, Vinija Jain, Samrat Mondal, Aman Chadha, 5 Feb 2024, A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications, https://arxiv.org/abs/2402.07927
- Lingjiao Chen, Jared Quincy Davis, Boris Hanin, Peter Bailis, Ion Stoica, Matei Zaharia, James Zou, 4 Jun 2024 (v2), Are More LLM Calls All You Need? Towards Scaling Laws of Compound Inference Systems, https://arxiv.org/abs/2403.02419
- Siyun Zhao, Yuqing Yang, Zilong Wang, Zhiyuan He, Luna K. Qiu, Lili Qiu, 23 Sep 2024, Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely, https://arxiv.org/abs/2409.14924
- Arun Shankar, Oct 2024, Designing Cognitive Architectures: Agentic Workflow Patterns from Scratch, https://medium.com/google-cloud/designing-cognitive-architectures-agentic-workflow-patterns-from-scratch-63baa74c54bc
- Anita Kirkovska, David Vargas, Jul 11, 2024, Agentic Workflows in 2024: The ultimate guide, https://www.vellum.ai/blog/agentic-workflows-emerging-architectures-and-design-patterns
- A. Singh, A. Ehtesham, S. Kumar and T. T. Khoei, "Enhancing AI Systems with Agentic Workflows Patterns in Large Language Model," 2024 IEEE World AI IoT Congress (AIIoT), Seattle, WA, USA, 2024, pp. 527-532, doi: 10.1109/AIIoT61789.2024.10578990. https://ieeexplore.ieee.org/abstract/document/10578990
- Chawla, Chhavi; Chatterjee, Siddharth; Gadadinni, Sanketh Siddanna; Verma, Pulkit; Banerjee, Sourav, 2024, Agentic AI: The building blocks of sophisticated AI business applications, Journal of AI, Robotics & Workplace Automation, Volume 3 / Number 3 / Summer 2024, pp. 1-15(15), Henry Stewart Publications, DOI: https://doi.org/10.69554/XEHZ1946 https://www.ingentaconnect.com/content/hsp/airwa/2024/00000003/00000003/art00001
- Yu Zhao, Huifeng Yin, Bo Zeng, Hao Wang, Tianqi Shi, Chenyang Lyu, Longyue Wang, Weihua Luo, Kaifu Zhang, 21 Nov 2024, Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions, https://arxiv.org/abs/2411.14405
- mshumer, Nov 2024, Open Reasoning Engine, https://github.com/mshumer/OpenReasoningEngine
Multi-Step Methods
Research on "multi-step" methods for reasoning in general, include:
- Zehui Chen, Kuikun Liu, Qiuchen Wang, Jiangning Liu, Wenwei Zhang, Kai Chen, Feng Zhao, 29 Jul 2024, MindSearch: Mimicking Human Minds Elicits Deep AI Searcher, https://arxiv.org/abs/2407.20183 Code: https://github.com/InternLM/MindSearch Project: https://mindsearch.netlify.app
- Bradley Brown, Jordan Juravsky, Ryan Ehrlich, Ronald Clark, Quoc V. Le, Christopher Ré, Azalia Mirhoseini, 31 Jul 2024, Large Language Monkeys: Scaling Inference Compute with Repeated Sampling, https://arxiv.org/abs/2407.21787 (Generating multiple answers by repeated inference queries, and then using a verifier to choose the best one, which is shown to greatly increase overall accuracy.)
- David Gu, July 18, 2024, Text Compression for Efficient Language Generation, Master’s Thesis, Distributed Computing Group, Computer Engineering and Networks Laboratory, ETH Zürich, https://pub.tik.ee.ethz.ch/students/2023-HS/MA-2023-19.pdf (Training and inference at the sentence level, including caching of embeddings per sentence, which also has the side-effect of compressing the input prompts and reducing computation analogously to token pruning.)
- Ignacio de Gregorio, Aug 2024, Grokking, a New Form of Reasoning, https://medium.com/@ignacio.de.gregorio.noblejas/grokking-a-new-form-of-reasoning-6785ea89d2ec
- Asankhaya Sharma (codelion), Sep 2024, Optillm: Optimizing inference proxy for LLMs, https://github.com/codelion/optillm
- Justin Chih-Yao Chen, Archiki Prasad, Swarnadeep Saha, Elias Stengel-Eskin, Mohit Bansal, 18 Sep 2024, MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning, https://arxiv.org/abs/2409.12147 https://github.com/dinobby/MAgICoRe
- Xiaohan Xu, Chongyang Tao, Tao Shen, Can Xu, Hongbo Xu, Guodong Long, Jian-guang Lou, 29 Feb 2024 (v2), Re-Reading Improves Reasoning in Large Language Models, https://arxiv.org/abs/2309.06275
- TED, Oct 2024, Multi-Step Reasoning Agents, https://tedai-sanfrancisco.ted.com/glossary/multi-step-reasoning-agents/
- Yao Fu, Hao Peng, Ashish Sabharwal, Peter Clark, Tushar Khot, 30 Jan 2023 (v2), Complexity-Based Prompting for Multi-Step Reasoning, https://arxiv.org/abs/2210.00720
- Junting Lu, Oct 2024 (accessed), Awesome-LLM-Reasoning-Techniques, https://github.com/Junting-Lu/Awesome-LLM-Reasoning-Techniques
- Cameron R. Wolfe, Dec 23, 2023, Tree of Thoughts Prompting. Solving multi-step problems with LLMs via deliberate planning and exploration, https://towardsdatascience.com/tree-of-thoughts-prompting-65a3e51f9ac4
- Data Camp, Jul 10, 2024, Chain-of-Thought Prompting: Step-by-Step Reasoning with LLMs, https://www.datacamp.com/tutorial/chain-of-thought-prompting
- Pankaj, Dec 21, 2023, Chain of Thought Prompting: Guiding LLMs Step-by-Step, https://medium.com/@pankaj_pandey/chain-of-thought-prompting-guiding-llms-step-by-step-e6eac32d02d8
- Cobus Greyling, Aug 2, 2023, 12 Prompt Engineering Techniques, https://cobusgreyling.medium.com/12-prompt-engineering-techniques-644481c857aa
- Cameron R. Wolfe, Aug 21, 2023, Tree of Thoughts Prompting. Solving multi-step problems with LLMs via deliberate planning and exploration, https://cameronrwolfe.substack.com/p/tree-of-thoughts-prompting
- Cameron R. Wolfe, Jan 3, 2024, Graph-Based Prompting and Reasoning with Language Models. Understanding graph of thoughts prompting and several variants… https://towardsdatascience.com/graph-based-prompting-and-reasoning-with-language-models-d6acbcd6b3d8
- Jason Wei and Denny Zhou, May 11, 2022, Language Models Perform Reasoning via Chain of Thought, https://research.google/blog/language-models-perform-reasoning-via-chain-of-thought/
- Cameron R. Wolfe, Jul 24, 2023, Chain of Thought Prompting for LLMs: A practical and simple approach for “reasoning” with LLMs, https://towardsdatascience.com/chain-of-thought-prompting-for-llms-33c963eead38
- Tanay Jaipuria, Oct 29, 2024, OpenAI's o-1 and inference-time scaling laws, https://www.tanayj.com/p/openais-o-1-and-inference-time-scaling
- Jinlin Wang, Suyuchen Wang, Ziwen Xia, Sirui Hong, Yun Zhu, Bang Liu, Chenglin Wu, 28 Oct 2024, FACT: Examining the Effectiveness of Iterative Context Rewriting for Multi-fact Retrieval, https://arxiv.org/abs/2410.21012
- Carl Franzen, November 20, 2024, DeepSeek’s first reasoning model R1-Lite-Preview turns heads, beating OpenAI o1 performance, https://venturebeat.com/ai/deepseeks-first-reasoning-model-r1-lite-preview-turns-heads-beating-openai-o1-performance/
- mshumer, Nov 2024, Open Reasoning Engine, https://github.com/mshumer/OpenReasoningEngine
- Eric Horvitz , Harsha Nori , Naoto Usuyama , November 27, 2024 Advances in run-time strategies for next-generation foundation models, Microsoft Research Blog, https://www.microsoft.com/en-us/research/blog/advances-in-run-time-strategies-for-next-generation-foundation-models/
- Harsha Nori, Naoto Usuyama, Nicholas King, Scott Mayer McKinney, Xavier Fernandes, Sheng Zhang, Eric Horvitz, 6 Nov 2024, From Medprompt to o1: Exploration of Run-Time Strategies for Medical Challenge Problems and Beyond, https://arxiv.org/abs/2411.03590
- Hieu Tran, Zonghai Yao, Junda Wang, Yifan Zhang, Zhichao Yang, Hong Yu, 5 Dec 2024 (v2), RARE: Retrieval-Augmented Reasoning Enhancement for Large Language Models, https://arxiv.org/abs/2412.02830
- Zhe Chen, Weiyun Wang, Yue Cao, Yangzhou Liu, Zhangwei Gao, Erfei Cui, Jinguo Zhu, Shenglong Ye, Hao Tian, Zhaoyang Liu, Lixin Gu, Xuehui Wang, Qingyun Li, Yimin Ren, Zixuan Chen, Jiapeng Luo, Jiahao Wang, Tan Jiang, Bo Wang, Conghui He, Botian Shi, Xingcheng Zhang, Han Lv, Yi Wang, Wenqi Shao, Pei Chu, Zhongying Tu, Tong He, Zhiyong Wu, Huipeng Deng, Jiaye Ge, Kai Chen, Min Dou, Lewei Lu, Xizhou Zhu, Tong Lu, Dahua Lin, Yu Qiao, Jifeng Dai, Wenhai Wang, 6 Dec 2024, Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling, https://arxiv.org/abs/2412.05271
Planning (as part of Reasoning)
Having an LLM know how to make a plan is part of intelligence. Here are some papers specifically on the aspect of "planning" as part of reasoning:
- Myeonghwa Lee, Seonho An, Min-Soo Kim, 18 Jun 2024, PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makers, https://arxiv.org/abs/2406.12430 Code: https://github.com/myeon9h/PlanRAG
- Vishal Rajput, Apr 11, 2024, What’s next for AI: AI agentic workflows? https://medium.com/aiguys/next-for-llms-and-rag-ai-agentic-workflows-1869ba0a6796
- Zehui Chen, Kuikun Liu, Qiuchen Wang, Jiangning Liu, Wenwei Zhang, Kai Chen, Feng Zhao, 29 Jul 2024, MindSearch: Mimicking Human Minds Elicits Deep AI Searcher, https://arxiv.org/abs/2407.20183 Code: https://github.com/InternLM/MindSearch Project: https://mindsearch.netlify.app
- Daniel Cao, Michael Katz, Harsha Kokel, Kavitha Srinivas, Shirin Sohrabi, 21 Aug 2024, Automating Thought of Search: A Journey Towards Soundness and Completeness, https://arxiv.org/abs/2408.11326
- Vishal Rajput, Jul 8, 2024, Why LLMs Can’t Plan And Unlikely To Reach AGI? https://medium.com/aiguys/why-llms-cant-plan-and-unlikely-to-reach-agi-642bda3e0aa3
- Evan Wang, Federico Cassano, Catherine Wu, Yunfeng Bai, Will Song, Vaskar Nath, Ziwen Han, Sean Hendryx, Summer Yue, Hugh Zhang, 5 Sep 2024, Planning In Natural Language Improves LLM Search For Code Generation, https://arxiv.org/abs/2409.03733
- Yongjing Yin, Junran Ding, Kai Song, Yue Zhang, 17 Sep 2024, Semformer: Transformer Language Models with Semantic Planning, https://arxiv.org/abs/2409.11143
- Chung-Yu Wang, Alireza DaghighFarsoodeh, Hung Viet Pham, 24 Sep 2024, Task-oriented Prompt Enhancement via Script Generation, https://arxiv.org/abs/2409.16418
- LangChain, Jul 20, 2024, Planning for Agents, https://blog.langchain.dev/planning-for-agents/
- A. Singh, A. Ehtesham, S. Kumar and T. T. Khoei, "Enhancing AI Systems with Agentic Workflows Patterns in Large Language Model," 2024 IEEE World AI IoT Congress (AIIoT), Seattle, WA, USA, 2024, pp. 527-532, doi: 10.1109/AIIoT61789.2024.10578990. https://ieeexplore.ieee.org/abstract/document/10578990
- Chawla, Chhavi; Chatterjee, Siddharth; Gadadinni, Sanketh Siddanna; Verma, Pulkit; Banerjee, Sourav, 2024, Agentic AI: The building blocks of sophisticated AI business applications, Journal of AI, Robotics & Workplace Automation, Volume 3 / Number 3 / Summer 2024, pp. 1-15(15), Henry Stewart Publications, DOI: https://doi.org/10.69554/XEHZ1946 https://www.ingentaconnect.com/content/hsp/airwa/2024/00000003/00000003/art00001
- Jian Xie, Kexun Zhang, Jiangjie Chen, Siyu Yuan, Kai Zhang, Yikai Zhang, Lei Li, Yanghua Xiao, 16 Oct 2024, Revealing the Barriers of Language Agents in Planning, https://arxiv.org/abs/2410.12409
Agentic Workflow
- Arun Shankar, Oct 2024, Designing Cognitive Architectures: Agentic Workflow Patterns from Scratch, https://medium.com/google-cloud/designing-cognitive-architectures-agentic-workflow-patterns-from-scratch-63baa74c54bc
- AI Agent Workflows: A Complete Guide on Whether to Build With LangGraph or LangChain, Sandi Besen, Oct 2024, https://towardsdatascience.com/ai-agent-workflows-a-complete-guide-on-whether-to-build-with-langgraph-or-langchain-117025509fa0
- Anita Kirkovska, David Vargas, Jul 11, 2024, Agentic Workflows in 2024: The ultimate guide, https://www.vellum.ai/blog/agentic-workflows-emerging-architectures-and-design-patterns
- Shuofei Qiao, Runnan Fang, Zhisong Qiu, Xiaobin Wang, Ningyu Zhang, Yong Jiang, Pengjun Xie, Fei Huang, Huajun Chen, 10 Oct 2024, Benchmarking Agentic Workflow Generation, https://arxiv.org/abs/2410.07869
- A. Singh, A. Ehtesham, S. Kumar and T. T. Khoei, "Enhancing AI Systems with Agentic Workflows Patterns in Large Language Model," 2024 IEEE World AI IoT Congress (AIIoT), Seattle, WA, USA, 2024, pp. 527-532, doi: 10.1109/AIIoT61789.2024.10578990. https://ieeexplore.ieee.org/abstract/document/10578990
- Chawla, Chhavi; Chatterjee, Siddharth; Gadadinni, Sanketh Siddanna; Verma, Pulkit; Banerjee, Sourav, 2024, Agentic AI: The building blocks of sophisticated AI business applications, Journal of AI, Robotics & Workplace Automation, Volume 3 / Number 3 / Summer 2024, pp. 1-15(15), Henry Stewart Publications, DOI: https://doi.org/10.69554/XEHZ1946 https://www.ingentaconnect.com/content/hsp/airwa/2024/00000003/00000003/art00001
- Jiayi Zhang, Jinyu Xiang, Zhaoyang Yu, Fengwei Teng, Xionghui Chen, Jiaqi Chen, Mingchen Zhuge, Xin Cheng, Sirui Hong, Jinlin Wang, Bingnan Zheng, Bang Liu, Yuyu Luo, Chenglin Wu, 14 Oct 2024, AFlow: Automating Agentic Workflow Generation, https://arxiv.org/abs/2410.10762 https://github.com/geekan/MetaGPT
- Ruixuan Xiao, Wentao Ma, Ke Wang, Yuchuan Wu, Junbo Zhao, Haobo Wang, Fei Huang, Yongbin Li, 21 Jun 2024, FlowBench: Revisiting and Benchmarking Workflow-Guided Planning for LLM-based Agents, https://arxiv.org/abs/2406.14884
- Dawei Gao, Zitao Li, Xuchen Pan, Weirui Kuang, Zhijian Ma, Bingchen Qian, Fei Wei, Wenhao Zhang, Yuexiang Xie, Daoyuan Chen, Liuyi Yao, Hongyi Peng, Zeyu Zhang, Lin Zhu, Chen Cheng, Hongzhu Shi, Yaliang Li, Bolin Ding, Jingren Zhou, 20 May 2024 (v2), AgentScope: A Flexible yet Robust Multi-Agent Platform, https://arxiv.org/abs/2402.14034 https://github.com/modelscope/agentscope
Temporal Reasoning (Time-Based Logic)
AI models struggle with the concept of time and any sort of "temporal reasoning" that is based on time progression or causation over time.
- Jonas Wallat, Adam Jatowt, Avishek Anand, March 2024, Temporal Blind Spots in Large Language Models, WSDM '24: Proceedings of the 17th ACM International Conference on Web Search and Data Mining, Pages 683–692, https://arxiv.org/abs/2401.12078, https://doi.org/10.1145/3616855.3635818, https://dl.acm.org/doi/abs/10.1145/3616855.3635818
- Siheng Xiong, Ali Payani, Ramana Kompella, Faramarz Fekri, 22 Apr 2024 (v3), Large Language Models Can Learn Temporal Reasoning, https://arxiv.org/abs/2401.06853
- Bowen Zhao, Zander Brumbaugh, Yizhong Wang, Hannaneh Hajishirzi, Noah A. Smith, 26 Feb 2024, Set the Clock: Temporal Alignment of Pretrained Language Models, https://arxiv.org/abs/2402.16797 Code: https://github.com/yizhongw/llm-temporal-alignment
- 16 Nov 2023, Towards Robust Temporal Reasoning of Large Language Models via a Multi-Hop QA Dataset and Pseudo-Instruction Tuning, Qingyu Tan, Hwee Tou Ng, Lidong Bing, https://arxiv.org/abs/2311.09821
- Yifu Qiu, Zheng Zhao, Yftah Ziser, Anna Korhonen, Edoardo M. Ponti, Shay B. Cohen, 16 Nov 2023 (v2), Are Large Language Models Temporally Grounded? https://arxiv.org/abs/2311.08398 Code: https://github.com/yfqiu-nlp/temporal-llms
- Raghav Jain, Daivik Sojitra, Arkadeep Acharya, Sriparna Saha, Adam Jatowt, Sandipan Dandapat, December 2023, Do Language Models Have a Common Sense regarding Time? Revisiting Temporal Commonsense Reasoning in the Era of Large Language Models, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing https://aclanthology.org/2023.emnlp-main.418/ PDF: https://aclanthology.org/2023.emnlp-main.418.pdf
- Yifan Wei, Yisong Su, Huanhuan Ma, Xiaoyan Yu, Fangyu Lei, Yuanzhe Zhang, Jun Zhao, Kang Liu, 8 Oct 2023, MenatQA: A New Dataset for Testing the Temporal Comprehension and Reasoning Abilities of Large Language Models, https://arxiv.org/abs/2310.05157
- Himanshu Beniwal, Kowsik Nandagopan D, Mayank Singh, 19 Feb 2024, Remember This Event That Year? Assessing Temporal Information and Reasoning in Large Language Models, https://arxiv.org/abs/2402.11997
- Bahare Fatemi, Mehran Kazemi, Anton Tsitsulin, Karishma Malkan, Jinyeong Yim, John Palowitch, Sungyong Seo, Jonathan Halcrow, Bryan Perozzi, 13 Jun 2024, Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning, https://arxiv.org/abs/2406.09170
- Irwin Deng, Kushagra Dixit, Vivek Gupta, Dan Roth, 22 Jul 2024, Enhancing Temporal Understanding in LLMs for Semi-structured Tables, https://arxiv.org/abs/2407.16030
- Dimitris Spathis, Fahim Kawsar, The first step is the hardest: pitfalls of representing and tokenizing temporal data for large language models, Journal of the American Medical Informatics Association, Volume 31, Issue 9, September 2024, Pages 2151–2158, https://doi.org/10.1093/jamia/ocae090 https://academic.oup.com/jamia/advance-article-abstract/doi/10.1093/jamia/ocae090/7702405?redirectedFrom=fulltext
General Research on Reasoning Techniques
- Junlin Wang, Siddhartha Jain, Dejiao Zhang, Baishakhi Ray, Varun Kumar, Ben Athiwaratkun, 11 Jun 2024 (v2), Reasoning in Token Economies: Budget-Aware Evaluation of LLM Reasoning Strategies, https://arxiv.org/abs/2406.06461
- Xuan Zhang, Chao Du, Tianyu Pang, Qian Liu, Wei Gao, Min Lin, 13 Jun 2024, Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs, https://arxiv.org/abs/2406.09136 Code: https://github.com/sail-sg/CPO
- Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, Karthik Narasimhan, 3 Dec 2023 (v2), Tree of Thoughts: Deliberate Problem Solving with Large Language Models, https://arxiv.org/abs/2305.10601 Code: https://github.com/princeton-nlp/tree-of-thought-llm
- Hayden Field, June 20, 2024, OpenAI competitor Anthropic announces its most powerful AI yet, CNBC, https://www.cnbc.com/2024/06/20/anthropic-claude-3point5-sonnet-ai-announced.html
- Wai-Chung Kwan, Xingshan Zeng, Yuxin Jiang, Yufei Wang, Liangyou Li, Lifeng Shang, Xin Jiang, Qun Liu, Kam-Fai Wong, 30 Jan 2024, MT-Eval: A Multi-Turn Capabilities Evaluation Benchmark for Large Language Models, https://arxiv.org/abs/2401.16745 Code: https://github.com/KwanWaiChung/MT-Eval
- M.Besta, N. Blach, A. Kubicek, R. Gerstenberger, M. Podstawski, L. Gianinazzi, J. Gajda, T. Lehmann, H. Niewiadomski, P. Nyczyk et al., “Graph of thoughts: Solving elaborate problems with large language models,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 16, 2024, pp. 17682–17690. https://arxiv.org/abs/2308.09687
- Q. Sun, Z. Yin, X. Li, Z. Wu, X. Qiu, and L. Kong, “Corex: Pushing the boundaries of complex reasoning through multi model collaboration,” arXiv preprint arXiv:2310.00280, 2023. https://arxiv.org/abs/2310.00280
- Tianle Li, Wei-Lin Chiang, Lisa Dunlap, May 20, 2024, Introducing Hard Prompts Category in Chatbot Arena, https://lmsys.org/blog/2024-05-17-category-hard/
- Myeonghwa Lee, Seonho An, Min-Soo Kim, 18 Jun 2024, PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makers, https://arxiv.org/abs/2406.12430 Code: https://github.com/myeon9h/PlanRAG
- Rahul Verma, June 21, 2024, OpenAI's GPT-5 Pushed Back To Late 2025, But Promises Ph.D.-Level Abilities, https://in.mashable.com/tech/77593/openais-gpt-5-pushed-back-to-late-2025-but-promises-phd-level-abilities
- Krista Opsahl-Ong, Michael J Ryan, Josh Purtell, David Broman, Christopher Potts, Matei Zaharia, Omar Khattab, 17 Jun 2024, Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs, https://arxiv.org/abs/2406.11695
- Sachit Menon, Richard Zemel, Carl Vondrick, 20 Jun 2024, Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities, https://arxiv.org/abs/2406.14562
- Lina M. Rojas-Barahona, 2024, Talking to Machines: do you read me?. Computation and Language, Universite de Lorraine, https://hal.science/tel-04620199/document
- Rafe Brena, May 24, 2024, 3 Key Differences Between Human and Machine Intelligence You Need to Know: AI is an alien intelligence https://pub.towardsai.net/3-key-differences-between-human-and-machine-intelligence-you-need-to-know-7a34dcee2cd3 (Good article about how LLMs don't have "emotions" or "intelligence" and they don't "pause".)
- Vishal Rajput, Apr 11, 2024, What’s next for AI: AI agentic workflows? https://medium.com/aiguys/next-for-llms-and-rag-ai-agentic-workflows-1869ba0a6796
- Rachel Metz, July 12, 2024, OpenAI Scale Ranks Progress Toward ‘Human-Level’ Problem Solving: The company believes its technology is approaching the second level of five on the path to artificial general intelligence, Bloomberg, https://www.bloomberg.com/news/articles/2024-07-11/openai-sets-levels-to-track-progress-toward-superintelligent-ai?sref=P6Q0mxvj
- Sirui Chen, Bo Peng, Meiqi Chen, Ruiqi Wang, Mengying Xu, Xingyu Zeng, Rui Zhao, Shengjie Zhao, Yu Qiao, Chaochao Lu, 1 May 2024, Causal Evaluation of Language Models, https://arxiv.org/abs/2405.00622 Project: https://opencausalab.github.io/CaLM
- Anna Tong and Katie Paul July 16, 2024, Exclusive: OpenAI working on new reasoning technology under code name ‘Strawberry’, https://www.reuters.com/technology/artificial-intelligence/openai-working-new-reasoning-technology-under-code-name-strawberry-2024-07-12/
- Zehui Chen, Kuikun Liu, Qiuchen Wang, Jiangning Liu, Wenwei Zhang, Kai Chen, Feng Zhao, 29 Jul 2024, MindSearch: Mimicking Human Minds Elicits Deep AI Searcher, https://arxiv.org/abs/2407.20183 Code: https://github.com/InternLM/MindSearch Project: https://mindsearch.netlify.app
- Ethan Mollick, May 12, 2024, Superhuman? What does it mean for AI to be better than a human? And how can we tell? https://www.oneusefulthing.org/p/superhuman
- Ignacio de Gregorio, Aug 2024, Grokking, a New Form of Reasoning, https://medium.com/@ignacio.de.gregorio.noblejas/grokking-a-new-form-of-reasoning-6785ea89d2ec
- Zarif Bin Akhtar, Mapping Generative Artificial Intelligence (GAI's) Exciting Future: From Gemini to Q* and Beyond, https://publications.eai.eu/index.php/airo/article/view/5962 https://doi.org/10.4108/airo.5962 PDF: https://publications.eai.eu/index.php/airo/article/view/5962/3329
- Guangyan Sun, Mingyu Jin, Zhenting Wang, Cheng-Long Wang, Siqi Ma, Qifan Wang, Ying Nian Wu, Yongfeng Zhang, Dongfang Liu, 16 Aug 2024, Visual Agents as Fast and Slow Thinkers, https://arxiv.org/abs/2408.08862
- Adam Zewe, June 14, 2024, Technique improves the reasoning capabilities of large language models: Combining natural language and programming, the method enables LLMs to solve numerical, analytical, and language-based tasks transparently, MIT News, https://news.mit.edu/2024/technique-improves-reasoning-capabilities-large-language-models-0614
- Pei Zhou, Jay Pujara, Xiang Ren, Xinyun Chen, Heng-Tze Cheng, Quoc V. Le, Ed H. Chi, Denny Zhou, Swaroop Mishra, Huaixiu Steven Zheng, 6 Feb 2024, Self-Discover: Large Language Models Self-Compose Reasoning Structures, https://arxiv.org/abs/2402.03620
- Tinghui Zhu, Kai Zhang, Jian Xie, Yu Su, 4 Feb 2024 (v2), Deductive Beam Search: Decoding Deducible Rationale for Chain-of-Thought Reasoning, https://arxiv.org/abs/2401.17686
- Xiaoyu Liu, Paiheng Xu, Junda Wu, Jiaxin Yuan, Yifan Yang, Yuhang Zhou, Fuxiao Liu, Tianrui Guan, Haoliang Wang, Tong Yu, Julian 14 Mar 2024, Large Language Models and Causal Inference in Collaboration: A Comprehensive Survey, McAuley, Wei Ai, Furong Huang, https://arxiv.org/abs/2403.09606
- Jiace Zhu, Yingtao Shen, Jie Zhao, An Zou, 25 Aug 2024, Path-Consistency: Prefix Enhancement for Efficient Inference in LLM, https://arxiv.org/abs/2409.01281
- Cogni Down Under, Sep 2024, Reflection 70B: The AI That Thinks Before It Speaks, https://medium.com/@cognidownunder/reflection-70b-the-ai-that-thinks-before-it-speaks-8a70d3a0e38a
- Anson Ho, Tamay Besiroglu, Ege Erdil, David Owen, Robi Rahman, Zifan Carl Guo, David Atkinson, Neil Thompson, Jaime Sevilla, 9 Mar 2024, Algorithmic progress in language models, https://arxiv.org/abs/2403.05812
- Alberto Romero. Sep 10, 2024, Big News: OpenAI to Launch AI Model That Can Reason in 2 Weeks, https://www.thealgorithmicbridge.com/p/big-news-openai-to-launch-ai-model
- Zifan Zheng, Yezhaohui Wang, Yuxin Huang, Shichao Song, Bo Tang, Feiyu Xiong, Zhiyu Li, 5 Sep 2024, Attention Heads of Large Language Models: A Survey, https://arxiv.org/abs/2409.03752 https://github.com/IAAR-Shanghai/Awesome-Attention-Heads (This survey is about making attention mechanisms more performant, accurate and intelligent, rather than improving efficiency.)
- Asankhaya Sharma (codelion), Sep 2024, Optillm: Optimizing inference proxy for LLMs, https://github.com/codelion/optillm
- Louis Bouchard, Sep 12, 2024, OpenAI's o1 Model: The Future of Reasoning AI? What Sets It Apart, How OpenAI's o1 Model Thinks Through Problems (And Why It's Slower), https://www.louisbouchard.ai/openai-o1/
- OpenAI, September 12, 2024, Introducing OpenAI o1-preview, A new series of reasoning models for solving hard problems. https://openai.com/index/introducing-openai-o1-preview/
- OpenAI, September 12, 2024, Learning to Reason with LLMs, https://openai.com/index/learning-to-reason-with-llms/
- Nathan Lambert, Sep 05, 2024, OpenAI’s Strawberry, LM self-talk, inference scaling laws, and spending more on inference, Whether or not scaling works, we should spend more on inference, https://www.interconnects.ai/p/openai-strawberry-and-inference-scaling-laws
- Ignacio de Gregorio Noblejas, September 15, 2024, OpenAI Launches o1. Here’s All You Need to Know, https://thetechoasis.beehiiv.com/p/openai-launches-o1-heres-need-know
- Trung Quoc Luong, Xinbo Zhang, Zhanming Jie, Peng Sun, Xiaoran Jin, Hang Li, 27 Jun 2024 (v2), ReFT: Reasoning with Reinforced Fine-Tuning, https://arxiv.org/abs/2401.08967
- Tianqiao Liu, Zui Chen, Zitao Liu, Mi Tian, Weiqi Luo, 13 Sep 2024, Expediting and Elevating Large Language Model Reasoning via Hidden Chain-of-Thought Decoding, https://arxiv.org/abs/2409.08561
- Michael Nuñez, September 16, 2024, SambaNova challenges OpenAI’s o1 model with Llama 3.1-powered demo on HuggingFace, https://venturebeat.com/ai/sambanova-challenges-openais-o1-model-with-llama-3-1-powered-demo-on-huggingface/
- Zayne Sprague, Fangcong Yin, Juan Diego Rodriguez, Dongwei Jiang, Manya Wadhwa, Prasann Singhal, Xinyu Zhao, Xi Ye, Kyle Mahowald, Greg Durrett, 18 Sep 2024, To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning, https://arxiv.org/abs/2409.12183
- Santosh Kumar Radha, Yasamin Nouri Jelyani, Ara Ghukasyan, Oktay Goktas, 19 Sep 2024, Iteration of Thought: Leveraging Inner Dialogue for Autonomous Large Language Model Reasoning, https://arxiv.org/abs/2409.12618
- Justin Chih-Yao Chen, Archiki Prasad, Swarnadeep Saha, Elias Stengel-Eskin, Mohit Bansal, 18 Sep 2024, MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning, https://arxiv.org/abs/2409.12147 https://github.com/dinobby/MAgICoRe
- Siyun Zhao, Yuqing Yang, Zilong Wang, Zhiyuan He, Luna K. Qiu, Lili Qiu, 23 Sep 2024, Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely, https://arxiv.org/abs/2409.14924
- Artem Shelamanov, Sep 2024, Why OpenAI’s o1 Model Is A Scam, https://pub.towardsai.net/why-openais-o1-model-is-a-scam-eb3356c3d70e
- Chloe Berger, October 2, 2024, Mark Cuban says his puppy is ‘smarter than AI is today’, https://fortune.com/2024/10/01/mark-cuban-dog-puppy-smarter-than-ai/
- Julia Love and Rachel Metz, October 2, 2024, Google Is Working on Reasoning AI, Chasing OpenAI’s Efforts, https://www.bloomberg.com/news/articles/2024-10-02/google-is-working-on-reasoning-ai-chasing-openai-s-efforts
- Zhenwen Liang, Ye Liu, Tong Niu, Xiangliang Zhang, Yingbo Zhou, Semih Yavuz, 5 Oct 2024, Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification, https://arxiv.org/abs/2410.05318
- Iman Mirzadeh, Keivan Alizadeh, Hooman Shahrokhi, Oncel Tuzel, Samy Bengio, Mehrdad Farajtabar, 7 Oct 2024, GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models, https://arxiv.org/abs/2410.05229
- Sonya Huang, Pat Grady, and o1, Sequoia, October 9, 2024 Generative AI’s Act o1, https://www.sequoiacap.com/article/generative-ais-act-o1/
- Ignacio de Gregorio Noblejas, October 20, 2024, The Anti-LLM Revolution Begins,https://thetechoasis.beehiiv.com/p/the-anti-llm-revolution-begins
- By Asif Razzaq, October 13, 2024, OpenR: An Open-Source AI Framework Enhancing Reasoning in Large Language Models, https://www.marktechpost.com/2024/10/13/openr-an-open-source-ai-framework-enhancing-reasoning-in-large-language-models/
- Siwei Wu, Zhongyuan Peng, Xinrun Du, Tuney Zheng, Minghao Liu, Jialong Wu, Jiachen Ma, Yizhi Li, Jian Yang, Wangchunshu Zhou, Qunshu Lin, Junbo Zhao, Zhaoxiang Zhang, Wenhao Huang, Ge Zhang, Chenghua Lin, J.H. Liu, 22 Oct 2024 (v2), A Comparative Study on Reasoning Patterns of OpenAI's o1 Model, https://arxiv.org/abs/2410.13639
- Latent Space, Nov 05, 2024, Inference, Fast and Slow. When System 1/System 2 analogies are not enough: The 6 types of LLM inference https://www.latent.space/p/inference-fast-and-slow
- Will Lockett Nov 2024, Apple Calls BS On The AI Revolution, They aren’t late to the AI game; they are just the only sceptical big tech company. https://medium.com/predict/apple-calls-bullshit-on-the-ai-revolution-ae38fdf83392
- Anthony Ha, Nov 2024, OpenAI reportedly developing new strategies to deal with AI improvement slowdown, https://techcrunch.com/2024/11/09/openai-reportedly-developing-new-strategies-to-deal-with-ai-improvement-slowdown/
- Michael Nuñez, November 11, 2024, AI’s math problem: FrontierMath benchmark shows how far technology still has to go, https://venturebeat.com/ai/ais-math-problem-frontiermath-benchmark-shows-how-far-technology-still-has-to-go/
- Kyle Orland, 13 Nov 2024, What if AI doesn’t just keep getting better forever? New reports highlight fears of diminishing returns for traditional LLM training. https://arstechnica.com/ai/2024/11/what-if-ai-doesnt-just-keep-getting-better-forever/
- Carl Franzen, November 20, 2024, DeepSeek’s first reasoning model R1-Lite-Preview turns heads, beating OpenAI o1 performance, https://venturebeat.com/ai/deepseeks-first-reasoning-model-r1-lite-preview-turns-heads-beating-openai-o1-performance/
- Zhi Rui Tam, Cheng-Kuang Wu, Yi-Lin Tsai, Chieh-Yen Lin, Hung-yi Lee, Yun-Nung Chen, 14 Oct 2024 (v3), Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models, https://arxiv.org/abs/2408.02442
- Janelle Teng, Nov 26, 2024, AI's reasoning quandary, https://nextbigteng.substack.com/p/ais-reasoning-quandary
- Qwen Team, November 28, 2024, QwQ: Reflect Deeply on the Boundaries of the Unknown, https://qwenlm.github.io/blog/qwq-32b-preview/
- mshumer, Nov 2024, Open Reasoning Engine, https://github.com/mshumer/OpenReasoningEngine
- Tom Schaul, 25 Nov 2024, Boundless Socratic Learning with Language Games, https://arxiv.org/abs/2411.16905
- Alberto Romero, Dec 06, 2024, OpenAI Announces o1 Model And ChatGPT Pro ($200/Mo). OpenAI Christmas event: Day 1 of 12, https://www.thealgorithmicbridge.com/p/openai-announces-o1-model-and-chatgpt
- Justin Chih-Yao Chen, Zifeng Wang, Hamid Palangi, Rujun Han, Sayna Ebrahimi, Long Le, Vincent Perot, Swaroop Mishra, Mohit Bansal, Chen-Yu Lee, Tomas Pfister, 29 Nov 2024, Reverse Thinking Makes LLMs Stronger Reasoners, https://arxiv.org/abs/2411.19865
- Tiernan Ray, Dec. 10, 2024, How Cerebras boosted Meta's Llama to 'frontier model' performance The company also demonstrates initial training of a one-trillion-parameter AI model on a single machine using conventional DDR5 memory chips. https://www.zdnet.com/article/how-cerebras-boosted-metas-llama-to-frontier-model-performance/
- Shibo Hao, Sainbayar Sukhbaatar, DiJia Su, Xian Li, Zhiting Hu, Jason Weston, Yuandong Tian, 9 Dec 2024, Training Large Language Models to Reason in a Continuous Latent Space, https://arxiv.org/abs/2412.06769
AGI Research
General research on achieving Artifical General Intelligence (AGI):
- Tao Feng, Chuanyang Jin, Jingyu Liu, Kunlun Zhu, Haoqin Tu, Zirui Cheng, Guanyu Lin, Jiaxuan You, 16 May 2024, How Far Are We From AGI, https://arxiv.org/abs/2405.10313
- Nathan Lambert, APR 18, 2024, Llama 3: Scaling open LLMs to AGI, https://www.interconnects.ai/p/llama-3-and-scaling-open-llms
- jbetke, June 3, 2024, General Intelligence (2024), https://nonint.com/2024/06/03/general-intelligence-2024/
- Steve Yadlowsky, Lyric Doshi, Nilesh Tripuraneni, Nov 2023, Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models, https://arxiv.org/abs/2311.00871
- Denise Holt, Jan 29, 2024, “Deep Learning is Rubbish” — Karl Friston & Yann LeCun Face Off at Davos 2024 World Economic Forum, 𝐀𝐈 𝐦𝐨𝐧𝐤𝐬.𝐢𝐨, https://medium.com/aimonks/deep-learning-is-rubbish-karl-friston-yann-lecun-face-off-at-davos-2024-world-economic-forum-494e82089d22
- Hayden Field, June 20, 2024, OpenAI competitor Anthropic announces its most powerful AI yet, CNBC, https://www.cnbc.com/2024/06/20/anthropic-claude-3point5-sonnet-ai-announced.html
- Arjun Kharpal, June 21, 2024, SoftBank CEO says AI that is 10,000 times smarter than humans will come out in 10 years, CNBC, https://www.cnbc.com/2024/06/21/softbank-ceo-predicts-ai-that-is-10000-times-smarter-than-humans-.html
- Rahul Verma, June 21, 2024, OpenAI's GPT-5 Pushed Back To Late 2025, But Promises Ph.D.-Level Abilities, https://in.mashable.com/tech/77593/openais-gpt-5-pushed-back-to-late-2025-but-promises-phd-level-abilities
- Ignacio de Gregorio, June 2024, Mixture-of-Agents Beats ChatGPT-4o: Collaboration is Intelligence, https://medium.com/@ignacio.de.gregorio.noblejas/mixture-of-agents-beats-chatgpt-4o-6470a74f1525
- Rachel Metz, July 12, 2024, OpenAI Scale Ranks Progress Toward ‘Human-Level’ Problem Solving: The company believes its technology is approaching the second level of five on the path to artificial general intelligence, Bloomberg, https://www.bloomberg.com/news/articles/2024-07-11/openai-sets-levels-to-track-progress-toward-superintelligent-ai?sref=P6Q0mxvj
- Anna Tong and Katie Paul July 16, 2024, Exclusive: OpenAI working on new reasoning technology under code name ‘Strawberry’, https://www.reuters.com/technology/artificial-intelligence/openai-working-new-reasoning-technology-under-code-name-strawberry-2024-07-12/
- Ethan Mollick, May 12, 2024, Superhuman? What does it mean for AI to be better than a human? And how can we tell? https://www.oneusefulthing.org/p/superhuman
- Zarif Bin Akhtar, Mapping Generative Artificial Intelligence (GAI's) Exciting Future: From Gemini to Q* and Beyond, https://publications.eai.eu/index.php/airo/article/view/5962 https://doi.org/10.4108/airo.5962 PDF: https://publications.eai.eu/index.php/airo/article/view/5962/3329
- Jack Dymond, August 2024, Progressive Intelligence for Low-Power Devices, Ph.D. Thesis, Faculty of Engineering and Physical Sciences, School of Electronics and Computer Science, University of Southampton, https://eprints.soton.ac.uk/492900/1/JackDymond-Final-Thesis.pdf
- Rohin Shah, Seb Farquhar, Anca Dragan, 21st Aug 2024, AGI Safety and Alignment at Google DeepMind: A Summary of Recent Work, https://www.alignmentforum.org/posts/79BPxvSsjzBkiSyTq/agi-safety-and-alignment-at-google-deepmind-a-summary-of
- Roy Lo, June 13, 2024, Defining AI 2.0: Beyond Generative AI, https://www.linkedin.com/pulse/defining-ai-20-beyond-generative-roy-lo-tbvie/
- Ryan McNeal, Aug 27, 2024, ChatGPT and GPT-4 could get a sweet upgrade this fall with 'strawberry', https://www.androidauthority.com/openai-strawberry-ai-3475682/
- Vishal Rajput, Jul 8, 2024, Why LLMs Can’t Plan And Unlikely To Reach AGI? https://medium.com/aiguys/why-llms-cant-plan-and-unlikely-to-reach-agi-642bda3e0aa3
- Lareina Yee, June 7, 2024, Gen AI: A cognitive industrial revolution, https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/gen-ai-a-cognitive-industrial-revolution
- Martin_Casado, Aug 31, 2024, Tweet (State of LLMs) https://threadreaderapp.com/thread/1829905130512400775.html
- Gian Segato, September 2024, The dawn of a new startup era, https://giansegato.com/essays/dawn-new-startup-era
- Anson Ho, Tamay Besiroglu, Ege Erdil, David Owen, Robi Rahman, Zifan Carl Guo, David Atkinson, Neil Thompson, Jaime Sevilla, Mar 12, 2024, Algorithmic Progress in Language Models, Epoch AI, https://epochai.org/blog/algorithmic-progress-in-language-models
- Anson Ho, Tamay Besiroglu, Ege Erdil, David Owen, Robi Rahman, Zifan Carl Guo, David Atkinson, Neil Thompson, Jaime Sevilla, 9 Mar 2024, Algorithmic progress in language models, https://arxiv.org/abs/2403.05812
- Alberto Romero. Sep 10, 2024, Big News: OpenAI to Launch AI Model That Can Reason in 2 Weeks, https://www.thealgorithmicbridge.com/p/big-news-openai-to-launch-ai-model
- David Gilmore, Sep 2024, When will AI outthink humans? https://davidvgilmore.com/writings/outthinking-ai (Interesting analysis of all the GPUs in the world and when they will "out-think" all the human knowledge workers, predicting a range of years from 2028 to 2035, depending on assumptions.)
- Chloe Berger, October 2, 2024, Mark Cuban says his puppy is ‘smarter than AI is today’, https://fortune.com/2024/10/01/mark-cuban-dog-puppy-smarter-than-ai/
- Julia Love and Rachel Metz, October 2, 2024, Google Is Working on Reasoning AI, Chasing OpenAI’s Efforts, https://www.bloomberg.com/news/articles/2024-10-02/google-is-working-on-reasoning-ai-chasing-openai-s-efforts
- Samantha Kelly, Sept. 29, 2024, 'Superintelligent' AI Is Only a Few Thousand Days Away: OpenAI CEO Sam Altman, https://www.cnet.com/tech/services-and-software/superintelligent-ai-is-only-a-few-thousand-days-away-openai-ceo-sam-altman/
- Tianyang Zhong, Zhengliang Liu, Yi Pan, Yutong Zhang, Yifan Zhou, Shizhe Liang, Zihao Wu, Yanjun Lyu, Peng Shu, Xiaowei Yu, Chao Cao, Hanqi Jiang, Hanxu Chen, Yiwei Li, Junhao Chen, Huawen Hu, Yihen Liu, Huaqin Zhao, Shaochen Xu, Haixing Dai, Lin Zhao, Ruidong Zhang, Wei Zhao, Zhenyuan Yang, Jingyuan Chen, Peilong Wang, Wei Ruan, Hui Wang, Huan Zhao, Jing Zhang, Yiming Ren, Shihuan Qin, Tong Chen, Jiaxi Li, Arif Hassan Zidan, Afrar Jahin, Minheng Chen, Sichen Xia, Jason Holmes, Yan Zhuang, Jiaqi Wang, Bochen Xu, Weiran Xia, Jichao Yu, Kaibo Tang, Yaxuan Yang, Bolun Sun, Tao Yang, Guoyu Lu, Xianqiao Wang, Lilong Chai, He Li, Jin Lu, Lichao Sun, Xin Zhang, Bao Ge, Xintao Hu, Lian Zhang, Hua Zhou, Lu Zhang, Shu Zhang, Ninghao Liu, Bei Jiang, Linglong Kong, Zhen Xiang, Yudan Ren, Jun Liu, Xi Jiang, Yu Bao, Wei Zhang, Xiang Li, Gang Li, Wei Liu, Dinggang Shen, Andrea Sikora, Xiaoming Zhai, Dajiang Zhu, Tianming Liu, 27 Sep 2024, Evaluation of OpenAI o1: Opportunities and Challenges of AGI, https://arxiv.org/abs/2409.18486
- https://www.cio.com/article/3567138/ai-native-software-engineering-may-be-closer-than-developers-think.html
- Ignacio de Gregorio Noblejas, October 20, 2024, The Anti-LLM Revolution Begins,https://thetechoasis.beehiiv.com/p/the-anti-llm-revolution-begins
- Aki Ranin, Sep 2, 2024, The Code Canaries Are Singing — Our Path Toward AGI: How the fate of human software developers reveals our path toward AGI, https://akiranin.medium.com/the-code-canaries-are-singing-our-path-toward-agi-6c234cae0189
- Will Lockett Nov 2024, Apple Calls BS On The AI Revolution, They aren’t late to the AI game; they are just the only sceptical big tech company. https://medium.com/predict/apple-calls-bullshit-on-the-ai-revolution-ae38fdf83392
- Anthony Ha, Nov 2024, OpenAI reportedly developing new strategies to deal with AI improvement slowdown, https://techcrunch.com/2024/11/09/openai-reportedly-developing-new-strategies-to-deal-with-ai-improvement-slowdown/
- Michael Nuñez, November 11, 2024, AI’s math problem: FrontierMath benchmark shows how far technology still has to go, https://venturebeat.com/ai/ais-math-problem-frontiermath-benchmark-shows-how-far-technology-still-has-to-go/
- Kyle Orland, 13 Nov 2024, What if AI doesn’t just keep getting better forever? New reports highlight fears of diminishing returns for traditional LLM training. https://arstechnica.com/ai/2024/11/what-if-ai-doesnt-just-keep-getting-better-forever/
- Gary Marcus, Nov 25, 2024, A new AI scaling law shell game? Scaling laws ain’t what they used to be, https://garymarcus.substack.com/p/a-new-ai-scaling-law-shell-game
- Brian Merchant, Dec 2024, AI Generated Business: The Rise of AGI and the Rush to Find a Working Business Model, https://ainowinstitute.org/general/ai-generated-business
- David Luan, Pieter Abbeel, December 09, 2024, Amazon opens new AI lab in San Francisco focused on long-term research bets. The Amazon AGI SF Lab will focus on developing new foundational capabilities for enabling useful AI agents. https://www.amazon.science/blog/amazon-opens-new-ai-lab-in-san-francisco-focused-on-long-term-research-bets
- Deirdre Bosa, Jasmine Wu, Dec 11 2024, The limits of intelligence — Why AI advancement could be slowing down, https://www.cnbc.com/2024/12/11/why-ai-advancement-could-be-slowing-down.html
More AI Research
Read more about:
- Advanced AI Mathematics
- Matrix Algebra
- Zero-Multiplication Models
- Approximate Computing
- Inference Optimizations
- Loop Optimizations
- Code Optimizations
- « Research Home