Aussie AI
Scaling Laws in Generative AI
-
Last Updated 21 March, 2025
-
by David Spuler, Ph.D.
What are AI Scaling Laws?
Scaling laws are the contention that AI models will become smarter by scaling the model size in terms of paramter count, and/or the total number of input tokens used in model training. Recent reductions in effects from greater training have thrown some of these scaling laws in doubt, giving rise to a new scaling law called the "inference scaling law," which says that scaling the amount of inference computations can also increase model intelligence.
What are Inference Scaling Laws?
Inference scaling laws are the contention that smarter LLMs can be created by using additional inference computations, such as repeated LLM queries at runtime, rather than by more extensive training. The success of the OpenAI "o1" model has supported this trend, as it is based on a multi-step inference algorithm called "Chain-of-Thought."
Research on Inference Scaling Laws
Research papers on the scaling laws in regard to multi-step inference:
- Ethan Mollick, Sep 16, 2024, Scaling: The State of Play in AI, https://www.oneusefulthing.org/p/scaling-the-state-of-play-in-ai
- Akash Bajwa, Oct 07, 2024, Inference Time Scaling Laws: AI Megacycle Of System 1 And System 2 Applications, https://akashbajwa.substack.com/p/inference-time-scaling-laws
- Tanay Jaipuria, Oct 29, 2024, OpenAI's o-1 and inference-time scaling laws, https://www.tanayj.com/p/openais-o-1-and-inference-time-scaling
- Krystal Hu and Anna Tong, November 15, 2024, OpenAI and others seek new path to smarter AI as current methods hit limitations, https://www.reuters.com/technology/artificial-intelligence/openai-rivals-seek-new-path-smarter-ai-current-methods-hit-limitations-2024-11-11/
- Maxwell Zeff, November 20, 2024, Nvidia’s CEO defends his moat as AI labs change how they improve their AI models, https://techcrunch.com/2024/11/20/nvidias-ceo-defends-his-moat-as-ai-labs-change-how-they-improve-their-ai-models/
- Gary Marcus, Nov 25, 2024, A new AI scaling law shell game? Scaling laws ain’t what they used to be, https://garymarcus.substack.com/p/a-new-ai-scaling-law-shell-game
- Zhe Chen, Weiyun Wang, Yue Cao, Yangzhou Liu, Zhangwei Gao, Erfei Cui, Jinguo Zhu, Shenglong Ye, Hao Tian, Zhaoyang Liu, Lixin Gu, Xuehui Wang, Qingyun Li, Yimin Ren, Zixuan Chen, Jiapeng Luo, Jiahao Wang, Tan Jiang, Bo Wang, Conghui He, Botian Shi, Xingcheng Zhang, Han Lv, Yi Wang, Wenqi Shao, Pei Chu, Zhongying Tu, Tong He, Zhiyong Wu, Huipeng Deng, Jiaye Ge, Kai Chen, Min Dou, Lewei Lu, Xizhou Zhu, Tong Lu, Dahua Lin, Yu Qiao, Jifeng Dai, Wenhai Wang, 6 Dec 2024, Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling, https://arxiv.org/abs/2412.05271
- Maxwell Zeff, November 20, 2024, Current AI scaling laws are showing diminishing returns, forcing AI labs to change course, https://techcrunch.com/2024/11/20/ai-scaling-laws-are-showing-diminishing-returns-forcing-ai-labs-to-change-course/ ("at least 10 to 20x gains in model performance ...intelligent prompting, UX decisions, and passing context at the right time into the models...")
- Xingyu Chen, Jiahao Xu, Tian Liang, Zhiwei He, Jianhui Pang, Dian Yu, Linfeng Song, Qiuzhi Liu, Mengfei Zhou, Zhuosheng Zhang, Rui Wang, Zhaopeng Tu, Haitao Mi, Dong Yu, 30 Dec 2024, Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs, https://arxiv.org/abs/2412.21187
- Rohin Manvi, Anikait Singh, Stefano Ermon, 3 Oct 2024, Adaptive Inference-Time Compute: LLMs Can Predict if They Can Do Better, Even Mid-Generation, https://arxiv.org/abs/2410.02725
- Yiwei Li, Peiwen Yuan, Shaoxiong Feng, Boyuan Pan, Xinglin Wang, Bin Sun, Heda Wang, and Kan Li, 19 Jan 2024, Escape sky-high cost: Early-stopping self-consistency for multi-step reasoning. The Twelfth International Conference on Learning Representations, 2024, https://arxiv.org/abs/2401.10480 https://github.com/Yiwei98/ESC (Uses "early stopping" idea to improve CoT efficiency during inference.)
- Akash Bajwa, Jan 06, 2025, Test-Time Search: A Path To AGI: Stacking Scaling Laws And Reward Engineering, https://akashbajwa.substack.com/p/test-time-search-a-path-to-agi
- Cameron R. Wolfe, Ph.D., Jan 06, 2025, Scaling Laws for LLMs: From GPT-3 to o3, Understanding the current state of LLM scaling and the future of AI research, https://cameronrwolfe.substack.com/p/scaling-laws-for-llms-from-gpt-3
- Sunil Manghani, Dec 21, 2024, Train Less, Think More: Advancing LLMs Through Test-Time Compute,https://medium.com/electronic-life/train-less-think-more-advancing-llms-through-test-time-compute-a46832e973e9
- Duncan Anderson, Jan 2025, The wall that wasn’t: Benchmark results for the latest AI models suggest that any “scaling wall” has already been breached and we’re on the path to AGI. https://medium.com/barnacle-labs/the-wall-that-wasnt-62c617f66ad4
- Charlie Snell, Jaehoon Lee, Kelvin Xu, Aviral Kumar, 6 Aug 2024, Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters, https://arxiv.org/abs/2408.03314 (Original test time compute paper.)
- Maxwell Zeff, January 7, 2025, Nvidia CEO says his AI chips are improving faster than Moore’s Law, https://techcrunch.com/2025/01/07/nvidia-ceo-says-his-ai-chips-are-improving-faster-than-moores-law/
- Violet Xiang, Charlie Snell, Kanishk Gandhi, Alon Albalak, Anikait Singh, Chase Blagden, Duy Phung, Rafael Rafailov, Nathan Lile, Dakota Mahan, Louis Castricato, Jan-Philipp Franken, Nick Haber, Chelsea Finn, 8 Jan 2025, Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought, https://arxiv.org/abs/2501.04682
- Sebastian Raschka, PhD, Jan 15, 2025, Noteworthy AI Research Papers of 2024 (Part Two). Six influential AI papers from July to December, https://magazine.sebastianraschka.com/p/ai-research-papers-2024-part-2 (Examines multimodal LLama3 models and the different multimodal architectures.)
- G Wang, S Zhang, T Zhan, Z Shen, J Li, X Hu, X Sun, Jan 2025, Unlocking the Mysteries of OpenAI o1: A Survey of the Reasoning Abilities of Large Language Models, https://openreview.net/pdf?id=J0ADLa2rNp
- Rylan Schaeffer, Joshua Kazdan, John Hughes, Jordan Juravsky, Sara Price, Aengus Lynch, Erik Jones, Robert Kirk, Azalia Mirhoseini, Sanmi Koyejo, 24 Feb 2025, How Do Large Language Monkeys Get Their Power (Laws)? https://arxiv.org/abs/2502.17578 (More attempts at reasoning on a problem means more accuracy.)
What is Test Time Compute?
Test time compute is using additional computation at the LLM inference stage, rather than in pre-training or fine-tuning. The model weights stay constant during inference, but certain algorithms can improve reasoning through advanced prompting strategies and multi-step inference algorithms.
Research papers on test time compute:
- Siwei Wu, Zhongyuan Peng, Xinrun Du, Tuney Zheng, Minghao Liu, Jialong Wu, Jiachen Ma, Yizhi Li, Jian Yang, Wangchunshu Zhou, Qunshu Lin, Junbo Zhao, Zhaoxiang Zhang, Wenhao Huang, Ge Zhang, Chenghua Lin, J.H. Liu, 22 Oct 2024 (v2), A Comparative Study on Reasoning Patterns of OpenAI's o1 Model, https://arxiv.org/abs/2410.13639
- Maxwell Zeff, November 20, 2024, Nvidia’s CEO defends his moat as AI labs change how they improve their AI models, https://techcrunch.com/2024/11/20/nvidias-ceo-defends-his-moat-as-ai-labs-change-how-they-improve-their-ai-models/
- mshumer, Nov 2024, Open Reasoning Engine, https://github.com/mshumer/OpenReasoningEngine
- Ekin Akyürek, Mehul Damani, Linlu Qiu, Han Guo, Yoon Kim, Jacob Andreas, 11 Nov 2024, The Surprising Effectiveness of Test-Time Training for Abstract Reasoning, https://arxiv.org/abs/2411.07279
- Noam Brown, Tuomas Sandholm, 16 Nov 2017 (v3), Safe and Nested Subgame Solving for Imperfect-Information Games, https://arxiv.org/abs/1705.02955 (An early pre-AI paper on reasoning in multiple steps.)
- Maxwell Zeff, November 20, 2024, Current AI scaling laws are showing diminishing returns, forcing AI labs to change course, https://techcrunch.com/2024/11/20/ai-scaling-laws-are-showing-diminishing-returns-forcing-ai-labs-to-change-course/ ("at least 10 to 20x gains in model performance ...intelligent prompting, UX decisions, and passing context at the right time into the models...")
- Xiangjue Dong, Maria Teleki, James Caverlee, 18 Dec 2024, A Survey on LLM Inference-Time Self-Improvement, https://arxiv.org/abs/2412.14352 https://github.com/dongxiangjue/Awesome-LLM-Self-Improvement (Broad survey of reasoning improvement methods from multi-step inference to RALM to decoding algorithms.)
- Xingyu Chen, Jiahao Xu, Tian Liang, Zhiwei He, Jianhui Pang, Dian Yu, Linfeng Song, Qiuzhi Liu, Mengfei Zhou, Zhuosheng Zhang, Rui Wang, Zhaopeng Tu, Haitao Mi, Dong Yu, 30 Dec 2024, Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs, https://arxiv.org/abs/2412.21187
- Akash Bajwa, Jan 06, 2025, Test-Time Search: A Path To AGI: Stacking Scaling Laws And Reward Engineering, https://akashbajwa.substack.com/p/test-time-search-a-path-to-agi
- Cameron R. Wolfe, Ph.D., Jan 06, 2025, Scaling Laws for LLMs: From GPT-3 to o3, Understanding the current state of LLM scaling and the future of AI research, https://cameronrwolfe.substack.com/p/scaling-laws-for-llms-from-gpt-3
- Sunil Manghani, Dec 21, 2024, Train Less, Think More: Advancing LLMs Through Test-Time Compute,https://medium.com/electronic-life/train-less-think-more-advancing-llms-through-test-time-compute-a46832e973e9
- Duncan Anderson, Jan 2025, The wall that wasn’t: Benchmark results for the latest AI models suggest that any “scaling wall” has already been breached and we’re on the path to AGI. https://medium.com/barnacle-labs/the-wall-that-wasnt-62c617f66ad4
- Charlie Snell, Jaehoon Lee, Kelvin Xu, Aviral Kumar, 6 Aug 2024, Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters, https://arxiv.org/abs/2408.03314 (Original test time compute paper.)
- Yixin Ji, Juntao Li, Hai Ye, Kaixin Wu, Jia Xu, Linjian Mo, Min Zhang, 5 Jan 2025, Test-time Computing: from System-1 Thinking to System-2 Thinking, https://arxiv.org/abs/2501.02497
- Edward Beeching, Lewis Tunstall, Sasha Rush Dec 16, 2024, Scaling Test Time Compute with Open Source Models, https://huggingface.co/spaces/HuggingFaceH4/blogpost-scaling-test-time-compute
- Maciej Besta, Julia Barth, Eric Schreiber, Ales Kubicek, Afonso Catarino, Robert Gerstenberger, Piotr Nyczyk, Patrick Iff, Yueling Li, Sam Houliston, Tomasz Sternal, Marcin Copik, Grzegorz Kwaśniewski, Jürgen Müller, Łukasz Flis, Hannes Eberhard, Hubert Niewiadomski, Torsten Hoefler, 23 Jan 2025 (v3), Reasoning Language Models: A Blueprint, https://arxiv.org/abs/2501.11223 (Survey and blueprint for how to build a Large Reasoning Model.)
- Ziyu Guo, Renrui Zhang, Chengzhuo Tong, Zhizheng Zhao, Peng Gao, Hongsheng Li, Pheng-Ann Heng, 23 Jan 2025, Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step, https://arxiv.org/abs/2501.13926 https://github.com/ZiyuGuo99/Image-Generation-CoT
- G Wang, S Zhang, T Zhan, Z Shen, J Li, X Hu, X Sun, Jan 2025, Unlocking the Mysteries of OpenAI o1: A Survey of the Reasoning Abilities of Large Language Models, https://openreview.net/pdf?id=J0ADLa2rNp
- Niklas Muennighoff, Zitong Yang, Weijia Shi, Xiang Lisa Li, Li Fei-Fei, Hannaneh Hajishirzi, Luke Zettlemoyer, Percy Liang, Emmanuel Candès, Tatsunori Hashimoto, 3 Feb 2025 (v2), s1: Simple test-time scaling, https://arxiv.org/abs/2501.19393 https://github.com/simplescaling/s1 (Method of "budget forcing" that allows either shortening or lengthening multi-step reasoning sequences.)
- Sebastian Raschka, PhD, Feb 05, 2025, Understanding Reasoning LLMs: Methods and Strategies for Building and Refining Reasoning Models https://magazine.sebastianraschka.com/p/understanding-reasoning-llms
- Jonas Geiping, Sean McLeish, Neel Jain, John Kirchenbauer, Siddharth Singh, Brian R. Bartoldson, Bhavya Kailkhura, Abhinav Bhatele, Tom Goldstein, 7 Feb 2025, Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach, https://arxiv.org/abs/2502.05171
- Dacheng Li, Shiyi Cao, Chengkun Cao, Xiuyu Li, Shangyin Tan, Kurt Keutzer, Jiarong Xing, Joseph E. Gonzalez, Ion Stoica, 20 Feb 2025, S*: Test Time Scaling for Code Generation, https://arxiv.org/abs/2502.14382 https://github.com/NovaSky-AI/SkyThought
- Ben Dickson, February 20, 2025, How test-time scaling unlocks hidden reasoning abilities in small language models (and allows them to outperform LLMs), https://venturebeat.com/ai/how-test-time-scaling-unlocks-hidden-reasoning-abilities-in-small-language-models-and-allows-them-to-outperform-llms/
- Shubham Parashar, Blake Olson, Sambhav Khurana, Eric Li, Hongyi Ling, James Caverlee, Shuiwang Ji, 18 Feb 2025, Inference-Time Computations for LLM Reasoning and Planning: A Benchmark and Insights, https://arxiv.org/abs/2502.12521
- Marthe Ballon, Andres Algaba, Vincent Ginis, 21 Feb 2025, The Relationship Between Reasoning and Performance in Large Language Models -- o3 (mini) Thinks Harder, Not Longer, https://arxiv.org/abs/2502.15631
- Maxwell Zeff, February 24, 2025, Anthropic launches a new AI model that ‘thinks’ as long as you want, https://techcrunch.com/2025/02/24/anthropic-launches-a-new-ai-model-that-thinks-as-long-as-you-want/
- Kif Leswing, Feb 26 2025, Nvidia CEO Huang says AI has to do ’100 times more’ computation now than when ChatGPT was released, https://www.cnbc.com/2025/02/26/nvidia-ceo-huang-says-next-generation-ai-will-need-more-compute.html (The thesis that AI reasoning will need 100 times more compute, regardless of whether it is a single-step "long answers" model thinking out loud, or a multi-step test time compute model.)
- Wenkai Yang, Shuming Ma, Yankai Lin, Furu Wei, 25 Feb 2025, Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning, https://arxiv.org/abs/2502.18080 (Trying to generate the "shortest correct response" by examining the lengths needed for CoT.)
- Juntai Cao, Xiang Zhang, Raymond Li, Chuyuan Li, Shafiq Joty, Giuseppe Carenini, 27 Feb 2025, Multi2: Multi-Agent Test-Time Scalable Framework for Multi-Document Processing, https://arxiv.org/abs/2502.20592 (Test time computed applied to the multi-document summarization use case.)
- Komal Kumar, Tajamul Ashraf, Omkar Thawakar, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, Phillip H.S. Torr, Salman Khan, Fahad Shahbaz Khan, 28 Feb 2025, LLM Post-Training: A Deep Dive into Reasoning Large Language Models, https://arxiv.org/abs/2502.21321 https://github.com/mbzuai-oryx/Awesome-LLM-Post-training
- Supreeth Koundinya, March 10, 2025, Manus is a Wrapper of Anthropic’s Claude, and It’s Okay, https://analyticsindiamag.com/ai-features/manus-is-a-wrapper-of-anthropics-claude-and-its-okay/ (“Manus didn’t just slap an API on a model. They built an autonomous system that can execute deep research, deep thinking, and multi-step tasks in a way that no other AI have.”)
- Eric Zhao, Pranjal Awasthi, Sreenivas Gollapudi, 20 Feb 2025 (v2), Sample, Scrutinize and Scale: Effective Inference-Time Search by Scaling Verification https://arxiv.org/abs/2502.01839 (Wrapping a single model with a Best-of-N approach that self-selects the best answer can significantly improve reasoning rates.)
- Dibyanayan Bandyopadhyay, Soham Bhattacharjee, Asif Ekbal, 13 Mar 2025, Thinking Machines: A Survey of LLM based Reasoning Strategies, https://arxiv.org/abs/2503.10814
Research on Scaling Laws
Research on the traditional scaling laws of model size and training data:
- Sotiris Anagnostidis, Gregor Bachmann, Imanol Schlag, Thomas Hofmann, 2024, Navigating Scaling Laws: Compute Optimality in Adaptive Model Training https://openreview.net/pdf?id=3KxPo62PYn (Evaluates some model properties, such as width, on vision Transformers from the point of view of the scaling laws.)
- Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models. CoRR, abs/2001.08361, 2020. https://arxiv.org/abs/2001.08361
- Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, Tom Hennigan, Eric Noland, Katie Millican, George van den Driessche, Bogdan Damoc, Aurelia Guy, Simon Osindero, Karen Simonyan, Erich Elsen, Jack W. Rae, Oriol Vinyals, and Laurent Sifre. Training compute-optimal large language models. CoRR, abs/2203.15556, 2022. doi: 10.48550/arXiv.2203.15556. https://arxiv.org/abs/2203.15556
- Aidan Clark, Diego de las Casas, Aurelia Guy, Arthur Mensch, Michela Paganini, Jordan Hoffmann, Bogdan Damoc, Blake Hechtman, Trevor Cai, Sebastian Borgeaud, George van den Driessche, Eliza Rutherford, Tom Hennigan, Matthew Johnson, Katie Millican, Albin Cassirer, Chris Jones, Elena Buchatskaya, David Budden, Laurent Sifre, Simon Osindero, Oriol Vinyals, Jack Rae, Erich Elsen, Koray Kavukcuoglu, Karen Simonyan, 9 Feb 2022 (v2), Unified Scaling Laws for Routed Language Models, https://arxiv.org/abs/2202.01169
- Benj Edwards, 16 July, 2024, Microsoft CTO Kevin Scott thinks LLM “scaling laws” will hold despite criticism, https://arstechnica.com/information-technology/2024/07/microsoft-cto-defies-critics-ai-progress-not-slowing-down-its-just-warming-up/
- Nandu Anilal, July 16, 2024, Infrastructure after AI Scaling: Why AI scaling won't last forever (and what comes next) https://nandu.substack.com/p/infrastructure-after-ai-scaling
- Chaofan Tao, Qian Liu, Longxu Dou, Niklas Muennighoff, Zhongwei Wan, Ping Luo, Min Lin, Ngai Wong, 18 Jul 2024, Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies, https://arxiv.org/abs/2407.13623
- 18 Apr 2024 (v2), The Efficiency Spectrum of Large Language Models: An Algorithmic Survey, Tianyu Ding, Tianyi Chen, Haidong Zhu, Jiachen Jiang, Yiqi Zhong, Jinxin Zhou, Guangzhi Wang, Zhihui Zhu, Ilya Zharkov, Luming Liang, https://arxiv.org/abs/2312.00678
- Tiernan Ray, July 24, 2024, 3 ways Meta's Llama 3.1 is an advance for Gen AI, https://www.zdnet.com/article/3-ways-metas-llama-3-1-is-an-advance-for-gen-ai/
- Bradley Brown, Jordan Juravsky, Ryan Ehrlich, Ronald Clark, Quoc V. Le, Christopher Ré, Azalia Mirhoseini, 31 Jul 2024, Large Language Monkeys: Scaling Inference Compute with Repeated Sampling, https://arxiv.org/abs/2407.21787 (Generating multiple answers by repeated inference queries, and then using a verifier to choose the best one, which is shown to greatly increase overall accuracy.)
- Yangzhen Wu, Zhiqing Sun, Shanda Li, Sean Welleck, Yiming Yang, 1 Aug 2024, An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models, https://arxiv.org/abs/2408.00724
- Pablo Villalobos, Anson Ho, Jaime Sevilla, Tamay Besiroglu, Lennart Heim, Marius Hobbhahn, Jun 06, 2024, Will We Run Out of Data? Limits of LLM Scaling Based on Human-Generated Data, Epoch AI, https://epochai.org/blog/will-we-run-out-of-data-limits-of-llm-scaling-based-on-human-generated-data
- Anson Ho, Tamay Besiroglu, Ege Erdil, David Owen, Robi Rahman, Zifan Carl Guo, David Atkinson, Neil Thompson, Jaime Sevilla, 9 Mar 2024, Algorithmic progress in language models, https://arxiv.org/abs/2403.05812
- Nathan Lambert, Sep 05, 2024, OpenAI’s Strawberry, LM self-talk, inference scaling laws, and spending more on inference, Whether or not scaling works, we should spend more on inference, https://www.interconnects.ai/p/openai-strawberry-and-inference-scaling-laws
- Ethan Mollick, Sep 16, 2024, Scaling: The State of Play in AI, https://www.oneusefulthing.org/p/scaling-the-state-of-play-in-ai
- Chuhan Wu, Ruiming Tang, 17 September 2024, Towards a Universal Scaling Law of LLM Training and Inference, DOI: 10.14293/PR2199.001074.v1, https://www.scienceopen.com/document_file/b3ff92f8-76a6-42ca-94d2-48693442bf98/ScienceOpenPreprint/Unified_law_arxiv.pdf
- Elias Frantar, September, 2024, Compressing Large Neural Networks Algorithms, Systems and Scaling Laws, Ph.D. Thesis, Graduate School, Institute of Science and Technology, Austria, https://research-explorer.ista.ac.at/download/17485/17880/frantar_thesis_final.pdf
- Akash Bajwa, Oct 07, 2024, Inference Time Scaling Laws: AI Megacycle Of System 1 And System 2 Applications, https://akashbajwa.substack.com/p/inference-time-scaling-laws
- Tanay Jaipuria, Oct 29, 2024, OpenAI's o-1 and inference-time scaling laws, https://www.tanayj.com/p/openais-o-1-and-inference-time-scaling
- Tanishq Kumar, Zachary Ankner, Benjamin F. Spector, Blake Bordelon, Niklas Muennighoff, Mansheej Paul, Cengiz Pehlevan, Christopher Ré, Aditi Raghunathan, 7 Nov 2024, Scaling Laws for Precision, https://arxiv.org/abs/2411.04330
- Krystal Hu and Anna Tong, November 15, 2024, OpenAI and others seek new path to smarter AI as current methods hit limitations, https://www.reuters.com/technology/artificial-intelligence/openai-rivals-seek-new-path-smarter-ai-current-methods-hit-limitations-2024-11-11/
- Bo Chen, Xiaoyu Li, Yingyu Liang, Jiangxuan Long, Zhenmei Shi, Zhao Song, 12 Nov 2024, Circuit Complexity Bounds for RoPE-based Transformer Architecture, https://arxiv.org/abs/2411.07602
- Xu Ouyang, Tao Ge, Thomas Hartvigsen, Zhisong Zhang, Haitao Mi, Dong Yu, 27 Nov 2024 (v2), Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens, https://arxiv.org/abs/2411.17691
- Gary Marcus, Nov 25, 2024, A new AI scaling law shell game? Scaling laws ain’t what they used to be, https://garymarcus.substack.com/p/a-new-ai-scaling-law-shell-game
- Gary Grossman, Edelman, December 1, 2024, The end of AI scaling may not be nigh: Here’s what’s next, https://venturebeat.com/ai/the-end-of-ai-scaling-may-not-be-nigh-heres-whats-next/
- Maxwell Zeff, November 20, 2024, Current AI scaling laws are showing diminishing returns, forcing AI labs to change course, https://techcrunch.com/2024/11/20/ai-scaling-laws-are-showing-diminishing-returns-forcing-ai-labs-to-change-course/ ("at least 10 to 20x gains in model performance ...intelligent prompting, UX decisions, and passing context at the right time into the models...")
- Akash Bajwa, Jan 06, 2025, Test-Time Search: A Path To AGI: Stacking Scaling Laws And Reward Engineering, https://akashbajwa.substack.com/p/test-time-search-a-path-to-agi
- Cameron R. Wolfe, Ph.D., Jan 06, 2025, Scaling Laws for LLMs: From GPT-3 to o3, Understanding the current state of LLM scaling and the future of AI research, https://cameronrwolfe.substack.com/p/scaling-laws-for-llms-from-gpt-3
- Maxwell Zeff, January 7, 2025, Nvidia CEO says his AI chips are improving faster than Moore’s Law, https://techcrunch.com/2025/01/07/nvidia-ceo-says-his-ai-chips-are-improving-faster-than-moores-law/
- Chien-Ping Lu, 8 Jan 2025 (v3), The Race to Efficiency: A New Perspective on AI Scaling Laws, https://arxiv.org/abs/2501.02156
- Violet Xiang, Charlie Snell, Kanishk Gandhi, Alon Albalak, Anikait Singh, Chase Blagden, Duy Phung, Rafael Rafailov, Nathan Lile, Dakota Mahan, Louis Castricato, Jan-Philipp Franken, Nick Haber, Chelsea Finn, 8 Jan 2025, Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought, https://arxiv.org/abs/2501.04682
- Andrea Matarazzo, Riccardo Torlone, 3 Jan 2025, A Survey on Large Language Models with some Insights on their Capabilities and Limitations, https://arxiv.org/abs/2501.04040 (Broad survey with many LLM topics covered from history to architectures to optimizations.)
- Sebastian Raschka, PhD, Jan 15, 2025, Noteworthy AI Research Papers of 2024 (Part Two). Six influential AI papers from July to December, https://magazine.sebastianraschka.com/p/ai-research-papers-2024-part-2 (Examines multimodal LLama3 models and the different multimodal architectures.)
- Tong Xiao, Jingbo Zhu, 16 Jan 2025, Foundations of Large Language Models, https://arxiv.org/abs/2501.09223 (Huge 230 page paper on many topics such as training, prompting, alignment, and long context.)
- Samira Abnar, Harshay Shah, Dan Busbridge, Alaaeldin Mohamed Elnouby Ali, Josh Susskind, Vimal Thilak, 21 Jan 2025, Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models, https://arxiv.org/abs/2501.12370
- Mohit Sewak, Ph.D., January 29, 2025, Achieving General Intelligence (AGI) and Super Intelligence (ASI): Pathways, Uncertainties, and Ethical Concerns, https://towardsai.net/p/l/achieving-general-intelligence-agi-and-super-intelligence-asi-pathways-uncertainties-and-ethical-concerns
- Da Yu, Edith Cohen, Badih Ghazi, Yangsibo Huang, Pritish Kamath, Ravi Kumar, Daogao Liu, Chiyuan Zhang, 3 Feb 2025, Scaling Embedding Layers in Language Models, https://arxiv.org/abs/2502.01637 (Using n-gram multi-token embedding layers, because embeddings are cheap to compute, rather than increasing vocabulary size.)
- Sam Altman, Feb 10, 2025, Three Observations, https://blog.samaltman.com/three-observations (Talks about scaling laws, inference costs reducing, and AGI. One of them: "The cost to use a given level of AI falls about 10x every 12 months, and lower prices lead to much more use. ")
- Ahmed El-Kishky, Alexander Wei, Andre Saraiva, Borys Minaev, Daniel Selsam, David Dohan, Francis Song, Hunter Lightman, Ignasi Clavera, Jakub Pachocki, Jerry Tworek, Lorenz Kuhn, Lukasz Kaiser, Mark Chen, Max Schwarzer, Mostafa Rohaninejad, Nat McAleese, o3 contributors, Oleg Mürk, Rhythm Garg, Rui Shu, Szymon Sidor, Vineet Kosaraju, Wenda Zhou, 3 Feb 2025, Competitive Programming with Large Reasoning Models, https://arxiv.org/abs/2502.06807 (OpenAI's paper on o3 that has similar conclusions to what DeepSeek showed about Reinforcement Learning for reasoning models, namely that "scaling general-purpose reinforcement learning" still works.)
- Alberto Romero, Feb 19, 2025, Grok 3: Another Win For The Bitter Lesson. Congratulations to the xAI team—and the advocates of the scaling laws, https://www.thealgorithmicbridge.com/p/grok-3-another-win-for-the-bitter
- Jeremy Kahn, February 26, 2025, The $19.6 billion pivot: How OpenAI’s 2-year struggle to launch GPT-5 revealed that its core AI strategy has stopped working, https://fortune.com/2025/02/25/what-happened-gpt-5-openai-orion-pivot-scaling-pre-training-llm-agi-reasoning/
- Rylan Schaeffer, Joshua Kazdan, John Hughes, Jordan Juravsky, Sara Price, Aengus Lynch, Erik Jones, Robert Kirk, Azalia Mirhoseini, Sanmi Koyejo, 24 Feb 2025, How Do Large Language Monkeys Get Their Power (Laws)? https://arxiv.org/abs/2502.17578 (More attempts at reasoning on a problem means more accuracy.)
- METR, 19 March 2025 Measuring AI Ability to Complete Long Tasks, https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/ (A unique analysis of a new "scaling law" that measures how good AI is against humans, in terms of the length of the tasks in minutes.)
More AI Research
Read more about: