Aussie AI

Scaling Laws in Generative AI

Last Updated 21 March, 2025

by David Spuler, Ph.D.

What are AI Scaling Laws?

Scaling laws are the contention that AI models will become smarter by scaling the model size in terms of paramter count, and/or the total number of input tokens used in model training. Recent reductions in effects from greater training have thrown some of these scaling laws in doubt, giving rise to a new scaling law called the "inference scaling law," which says that scaling the amount of inference computations can also increase model intelligence.

What are Inference Scaling Laws?

Inference scaling laws are the contention that smarter LLMs can be created by using additional inference computations, such as repeated LLM queries at runtime, rather than by more extensive training. The success of the OpenAI "o1" model has supported this trend, as it is based on a multi-step inference algorithm called "Chain-of-Thought."

Research on Inference Scaling Laws

Research papers on the scaling laws in regard to multi-step inference:

Ethan Mollick, Sep 16, 2024, Scaling: The State of Play in AI, https://www.oneusefulthing.org/p/scaling-the-state-of-play-in-ai
Akash Bajwa, Oct 07, 2024, Inference Time Scaling Laws: AI Megacycle Of System 1 And System 2 Applications, https://akashbajwa.substack.com/p/inference-time-scaling-laws
Tanay Jaipuria, Oct 29, 2024, OpenAI's o-1 and inference-time scaling laws, https://www.tanayj.com/p/openais-o-1-and-inference-time-scaling
Krystal Hu and Anna Tong, November 15, 2024, OpenAI and others seek new path to smarter AI as current methods hit limitations, https://www.reuters.com/technology/artificial-intelligence/openai-rivals-seek-new-path-smarter-ai-current-methods-hit-limitations-2024-11-11/
Maxwell Zeff, November 20, 2024, Nvidia’s CEO defends his moat as AI labs change how they improve their AI models, https://techcrunch.com/2024/11/20/nvidias-ceo-defends-his-moat-as-ai-labs-change-how-they-improve-their-ai-models/
Gary Marcus, Nov 25, 2024, A new AI scaling law shell game? Scaling laws ain’t what they used to be, https://garymarcus.substack.com/p/a-new-ai-scaling-law-shell-game
Zhe Chen, Weiyun Wang, Yue Cao, Yangzhou Liu, Zhangwei Gao, Erfei Cui, Jinguo Zhu, Shenglong Ye, Hao Tian, Zhaoyang Liu, Lixin Gu, Xuehui Wang, Qingyun Li, Yimin Ren, Zixuan Chen, Jiapeng Luo, Jiahao Wang, Tan Jiang, Bo Wang, Conghui He, Botian Shi, Xingcheng Zhang, Han Lv, Yi Wang, Wenqi Shao, Pei Chu, Zhongying Tu, Tong He, Zhiyong Wu, Huipeng Deng, Jiaye Ge, Kai Chen, Min Dou, Lewei Lu, Xizhou Zhu, Tong Lu, Dahua Lin, Yu Qiao, Jifeng Dai, Wenhai Wang, 6 Dec 2024, Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling, https://arxiv.org/abs/2412.05271
Maxwell Zeff, November 20, 2024, Current AI scaling laws are showing diminishing returns, forcing AI labs to change course, https://techcrunch.com/2024/11/20/ai-scaling-laws-are-showing-diminishing-returns-forcing-ai-labs-to-change-course/ ("at least 10 to 20x gains in model performance ...intelligent prompting, UX decisions, and passing context at the right time into the models...")
Xingyu Chen, Jiahao Xu, Tian Liang, Zhiwei He, Jianhui Pang, Dian Yu, Linfeng Song, Qiuzhi Liu, Mengfei Zhou, Zhuosheng Zhang, Rui Wang, Zhaopeng Tu, Haitao Mi, Dong Yu, 30 Dec 2024, Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs, https://arxiv.org/abs/2412.21187
Rohin Manvi, Anikait Singh, Stefano Ermon, 3 Oct 2024, Adaptive Inference-Time Compute: LLMs Can Predict if They Can Do Better, Even Mid-Generation, https://arxiv.org/abs/2410.02725
Yiwei Li, Peiwen Yuan, Shaoxiong Feng, Boyuan Pan, Xinglin Wang, Bin Sun, Heda Wang, and Kan Li, 19 Jan 2024, Escape sky-high cost: Early-stopping self-consistency for multi-step reasoning. The Twelfth International Conference on Learning Representations, 2024, https://arxiv.org/abs/2401.10480 https://github.com/Yiwei98/ESC (Uses "early stopping" idea to improve CoT efficiency during inference.)
Akash Bajwa, Jan 06, 2025, Test-Time Search: A Path To AGI: Stacking Scaling Laws And Reward Engineering, https://akashbajwa.substack.com/p/test-time-search-a-path-to-agi
Cameron R. Wolfe, Ph.D., Jan 06, 2025, Scaling Laws for LLMs: From GPT-3 to o3, Understanding the current state of LLM scaling and the future of AI research, https://cameronrwolfe.substack.com/p/scaling-laws-for-llms-from-gpt-3
Sunil Manghani, Dec 21, 2024, Train Less, Think More: Advancing LLMs Through Test-Time Compute,https://medium.com/electronic-life/train-less-think-more-advancing-llms-through-test-time-compute-a46832e973e9
Duncan Anderson, Jan 2025, The wall that wasn’t: Benchmark results for the latest AI models suggest that any “scaling wall” has already been breached and we’re on the path to AGI. https://medium.com/barnacle-labs/the-wall-that-wasnt-62c617f66ad4
Charlie Snell, Jaehoon Lee, Kelvin Xu, Aviral Kumar, 6 Aug 2024, Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters, https://arxiv.org/abs/2408.03314 (Original test time compute paper.)
Maxwell Zeff, January 7, 2025, Nvidia CEO says his AI chips are improving faster than Moore’s Law, https://techcrunch.com/2025/01/07/nvidia-ceo-says-his-ai-chips-are-improving-faster-than-moores-law/
Violet Xiang, Charlie Snell, Kanishk Gandhi, Alon Albalak, Anikait Singh, Chase Blagden, Duy Phung, Rafael Rafailov, Nathan Lile, Dakota Mahan, Louis Castricato, Jan-Philipp Franken, Nick Haber, Chelsea Finn, 8 Jan 2025, Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought, https://arxiv.org/abs/2501.04682
Sebastian Raschka, PhD, Jan 15, 2025, Noteworthy AI Research Papers of 2024 (Part Two). Six influential AI papers from July to December, https://magazine.sebastianraschka.com/p/ai-research-papers-2024-part-2 (Examines multimodal LLama3 models and the different multimodal architectures.)
G Wang, S Zhang, T Zhan, Z Shen, J Li, X Hu, X Sun, Jan 2025, Unlocking the Mysteries of OpenAI o1: A Survey of the Reasoning Abilities of Large Language Models, https://openreview.net/pdf?id=J0ADLa2rNp
Rylan Schaeffer, Joshua Kazdan, John Hughes, Jordan Juravsky, Sara Price, Aengus Lynch, Erik Jones, Robert Kirk, Azalia Mirhoseini, Sanmi Koyejo, 24 Feb 2025, How Do Large Language Monkeys Get Their Power (Laws)? https://arxiv.org/abs/2502.17578 (More attempts at reasoning on a problem means more accuracy.)

What is Test Time Compute?

Test time compute is using additional computation at the LLM inference stage, rather than in pre-training or fine-tuning. The model weights stay constant during inference, but certain algorithms can improve reasoning through advanced prompting strategies and multi-step inference algorithms.

Research papers on test time compute:

Siwei Wu, Zhongyuan Peng, Xinrun Du, Tuney Zheng, Minghao Liu, Jialong Wu, Jiachen Ma, Yizhi Li, Jian Yang, Wangchunshu Zhou, Qunshu Lin, Junbo Zhao, Zhaoxiang Zhang, Wenhao Huang, Ge Zhang, Chenghua Lin, J.H. Liu, 22 Oct 2024 (v2), A Comparative Study on Reasoning Patterns of OpenAI's o1 Model, https://arxiv.org/abs/2410.13639
Maxwell Zeff, November 20, 2024, Nvidia’s CEO defends his moat as AI labs change how they improve their AI models, https://techcrunch.com/2024/11/20/nvidias-ceo-defends-his-moat-as-ai-labs-change-how-they-improve-their-ai-models/
mshumer, Nov 2024, Open Reasoning Engine, https://github.com/mshumer/OpenReasoningEngine
Ekin Akyürek, Mehul Damani, Linlu Qiu, Han Guo, Yoon Kim, Jacob Andreas, 11 Nov 2024, The Surprising Effectiveness of Test-Time Training for Abstract Reasoning, https://arxiv.org/abs/2411.07279
Noam Brown, Tuomas Sandholm, 16 Nov 2017 (v3), Safe and Nested Subgame Solving for Imperfect-Information Games, https://arxiv.org/abs/1705.02955 (An early pre-AI paper on reasoning in multiple steps.)
Maxwell Zeff, November 20, 2024, Current AI scaling laws are showing diminishing returns, forcing AI labs to change course, https://techcrunch.com/2024/11/20/ai-scaling-laws-are-showing-diminishing-returns-forcing-ai-labs-to-change-course/ ("at least 10 to 20x gains in model performance ...intelligent prompting, UX decisions, and passing context at the right time into the models...")
Xiangjue Dong, Maria Teleki, James Caverlee, 18 Dec 2024, A Survey on LLM Inference-Time Self-Improvement, https://arxiv.org/abs/2412.14352 https://github.com/dongxiangjue/Awesome-LLM-Self-Improvement (Broad survey of reasoning improvement methods from multi-step inference to RALM to decoding algorithms.)
Xingyu Chen, Jiahao Xu, Tian Liang, Zhiwei He, Jianhui Pang, Dian Yu, Linfeng Song, Qiuzhi Liu, Mengfei Zhou, Zhuosheng Zhang, Rui Wang, Zhaopeng Tu, Haitao Mi, Dong Yu, 30 Dec 2024, Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs, https://arxiv.org/abs/2412.21187
Akash Bajwa, Jan 06, 2025, Test-Time Search: A Path To AGI: Stacking Scaling Laws And Reward Engineering, https://akashbajwa.substack.com/p/test-time-search-a-path-to-agi
Cameron R. Wolfe, Ph.D., Jan 06, 2025, Scaling Laws for LLMs: From GPT-3 to o3, Understanding the current state of LLM scaling and the future of AI research, https://cameronrwolfe.substack.com/p/scaling-laws-for-llms-from-gpt-3
Sunil Manghani, Dec 21, 2024, Train Less, Think More: Advancing LLMs Through Test-Time Compute,https://medium.com/electronic-life/train-less-think-more-advancing-llms-through-test-time-compute-a46832e973e9
Duncan Anderson, Jan 2025, The wall that wasn’t: Benchmark results for the latest AI models suggest that any “scaling wall” has already been breached and we’re on the path to AGI. https://medium.com/barnacle-labs/the-wall-that-wasnt-62c617f66ad4
Charlie Snell, Jaehoon Lee, Kelvin Xu, Aviral Kumar, 6 Aug 2024, Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters, https://arxiv.org/abs/2408.03314 (Original test time compute paper.)
Yixin Ji, Juntao Li, Hai Ye, Kaixin Wu, Jia Xu, Linjian Mo, Min Zhang, 5 Jan 2025, Test-time Computing: from System-1 Thinking to System-2 Thinking, https://arxiv.org/abs/2501.02497
Edward Beeching, Lewis Tunstall, Sasha Rush Dec 16, 2024, Scaling Test Time Compute with Open Source Models, https://huggingface.co/spaces/HuggingFaceH4/blogpost-scaling-test-time-compute
Maciej Besta, Julia Barth, Eric Schreiber, Ales Kubicek, Afonso Catarino, Robert Gerstenberger, Piotr Nyczyk, Patrick Iff, Yueling Li, Sam Houliston, Tomasz Sternal, Marcin Copik, Grzegorz Kwaśniewski, Jürgen Müller, Łukasz Flis, Hannes Eberhard, Hubert Niewiadomski, Torsten Hoefler, 23 Jan 2025 (v3), Reasoning Language Models: A Blueprint, https://arxiv.org/abs/2501.11223 (Survey and blueprint for how to build a Large Reasoning Model.)
Ziyu Guo, Renrui Zhang, Chengzhuo Tong, Zhizheng Zhao, Peng Gao, Hongsheng Li, Pheng-Ann Heng, 23 Jan 2025, Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step, https://arxiv.org/abs/2501.13926 https://github.com/ZiyuGuo99/Image-Generation-CoT
G Wang, S Zhang, T Zhan, Z Shen, J Li, X Hu, X Sun, Jan 2025, Unlocking the Mysteries of OpenAI o1: A Survey of the Reasoning Abilities of Large Language Models, https://openreview.net/pdf?id=J0ADLa2rNp
Niklas Muennighoff, Zitong Yang, Weijia Shi, Xiang Lisa Li, Li Fei-Fei, Hannaneh Hajishirzi, Luke Zettlemoyer, Percy Liang, Emmanuel Candès, Tatsunori Hashimoto, 3 Feb 2025 (v2), s1: Simple test-time scaling, https://arxiv.org/abs/2501.19393 https://github.com/simplescaling/s1 (Method of "budget forcing" that allows either shortening or lengthening multi-step reasoning sequences.)
Sebastian Raschka, PhD, Feb 05, 2025, Understanding Reasoning LLMs: Methods and Strategies for Building and Refining Reasoning Models https://magazine.sebastianraschka.com/p/understanding-reasoning-llms
Jonas Geiping, Sean McLeish, Neel Jain, John Kirchenbauer, Siddharth Singh, Brian R. Bartoldson, Bhavya Kailkhura, Abhinav Bhatele, Tom Goldstein, 7 Feb 2025, Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach, https://arxiv.org/abs/2502.05171
Dacheng Li, Shiyi Cao, Chengkun Cao, Xiuyu Li, Shangyin Tan, Kurt Keutzer, Jiarong Xing, Joseph E. Gonzalez, Ion Stoica, 20 Feb 2025, S*: Test Time Scaling for Code Generation, https://arxiv.org/abs/2502.14382 https://github.com/NovaSky-AI/SkyThought
Ben Dickson, February 20, 2025, How test-time scaling unlocks hidden reasoning abilities in small language models (and allows them to outperform LLMs), https://venturebeat.com/ai/how-test-time-scaling-unlocks-hidden-reasoning-abilities-in-small-language-models-and-allows-them-to-outperform-llms/
Shubham Parashar, Blake Olson, Sambhav Khurana, Eric Li, Hongyi Ling, James Caverlee, Shuiwang Ji, 18 Feb 2025, Inference-Time Computations for LLM Reasoning and Planning: A Benchmark and Insights, https://arxiv.org/abs/2502.12521
Marthe Ballon, Andres Algaba, Vincent Ginis, 21 Feb 2025, The Relationship Between Reasoning and Performance in Large Language Models -- o3 (mini) Thinks Harder, Not Longer, https://arxiv.org/abs/2502.15631
Maxwell Zeff, February 24, 2025, Anthropic launches a new AI model that ‘thinks’ as long as you want, https://techcrunch.com/2025/02/24/anthropic-launches-a-new-ai-model-that-thinks-as-long-as-you-want/
Kif Leswing, Feb 26 2025, Nvidia CEO Huang says AI has to do ’100 times more’ computation now than when ChatGPT was released, https://www.cnbc.com/2025/02/26/nvidia-ceo-huang-says-next-generation-ai-will-need-more-compute.html (The thesis that AI reasoning will need 100 times more compute, regardless of whether it is a single-step "long answers" model thinking out loud, or a multi-step test time compute model.)
Wenkai Yang, Shuming Ma, Yankai Lin, Furu Wei, 25 Feb 2025, Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning, https://arxiv.org/abs/2502.18080 (Trying to generate the "shortest correct response" by examining the lengths needed for CoT.)
Juntai Cao, Xiang Zhang, Raymond Li, Chuyuan Li, Shafiq Joty, Giuseppe Carenini, 27 Feb 2025, Multi2: Multi-Agent Test-Time Scalable Framework for Multi-Document Processing, https://arxiv.org/abs/2502.20592 (Test time computed applied to the multi-document summarization use case.)
Komal Kumar, Tajamul Ashraf, Omkar Thawakar, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, Phillip H.S. Torr, Salman Khan, Fahad Shahbaz Khan, 28 Feb 2025, LLM Post-Training: A Deep Dive into Reasoning Large Language Models, https://arxiv.org/abs/2502.21321 https://github.com/mbzuai-oryx/Awesome-LLM-Post-training
Supreeth Koundinya, March 10, 2025, Manus is a Wrapper of Anthropic’s Claude, and It’s Okay, https://analyticsindiamag.com/ai-features/manus-is-a-wrapper-of-anthropics-claude-and-its-okay/ (“Manus didn’t just slap an API on a model. They built an autonomous system that can execute deep research, deep thinking, and multi-step tasks in a way that no other AI have.”)
Eric Zhao, Pranjal Awasthi, Sreenivas Gollapudi, 20 Feb 2025 (v2), Sample, Scrutinize and Scale: Effective Inference-Time Search by Scaling Verification https://arxiv.org/abs/2502.01839 (Wrapping a single model with a Best-of-N approach that self-selects the best answer can significantly improve reasoning rates.)
Dibyanayan Bandyopadhyay, Soham Bhattacharjee, Asif Ekbal, 13 Mar 2025, Thinking Machines: A Survey of LLM based Reasoning Strategies, https://arxiv.org/abs/2503.10814

Research on Scaling Laws

Research on the traditional scaling laws of model size and training data:

Sotiris Anagnostidis, Gregor Bachmann, Imanol Schlag, Thomas Hofmann, 2024, Navigating Scaling Laws: Compute Optimality in Adaptive Model Training https://openreview.net/pdf?id=3KxPo62PYn (Evaluates some model properties, such as width, on vision Transformers from the point of view of the scaling laws.)
Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models. CoRR, abs/2001.08361, 2020. https://arxiv.org/abs/2001.08361
Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, Tom Hennigan, Eric Noland, Katie Millican, George van den Driessche, Bogdan Damoc, Aurelia Guy, Simon Osindero, Karen Simonyan, Erich Elsen, Jack W. Rae, Oriol Vinyals, and Laurent Sifre. Training compute-optimal large language models. CoRR, abs/2203.15556, 2022. doi: 10.48550/arXiv.2203.15556. https://arxiv.org/abs/2203.15556
Aidan Clark, Diego de las Casas, Aurelia Guy, Arthur Mensch, Michela Paganini, Jordan Hoffmann, Bogdan Damoc, Blake Hechtman, Trevor Cai, Sebastian Borgeaud, George van den Driessche, Eliza Rutherford, Tom Hennigan, Matthew Johnson, Katie Millican, Albin Cassirer, Chris Jones, Elena Buchatskaya, David Budden, Laurent Sifre, Simon Osindero, Oriol Vinyals, Jack Rae, Erich Elsen, Koray Kavukcuoglu, Karen Simonyan, 9 Feb 2022 (v2), Unified Scaling Laws for Routed Language Models, https://arxiv.org/abs/2202.01169
Benj Edwards, 16 July, 2024, Microsoft CTO Kevin Scott thinks LLM “scaling laws” will hold despite criticism, https://arstechnica.com/information-technology/2024/07/microsoft-cto-defies-critics-ai-progress-not-slowing-down-its-just-warming-up/
Nandu Anilal, July 16, 2024, Infrastructure after AI Scaling: Why AI scaling won't last forever (and what comes next) https://nandu.substack.com/p/infrastructure-after-ai-scaling
Chaofan Tao, Qian Liu, Longxu Dou, Niklas Muennighoff, Zhongwei Wan, Ping Luo, Min Lin, Ngai Wong, 18 Jul 2024, Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies, https://arxiv.org/abs/2407.13623
18 Apr 2024 (v2), The Efficiency Spectrum of Large Language Models: An Algorithmic Survey, Tianyu Ding, Tianyi Chen, Haidong Zhu, Jiachen Jiang, Yiqi Zhong, Jinxin Zhou, Guangzhi Wang, Zhihui Zhu, Ilya Zharkov, Luming Liang, https://arxiv.org/abs/2312.00678
Tiernan Ray, July 24, 2024, 3 ways Meta's Llama 3.1 is an advance for Gen AI, https://www.zdnet.com/article/3-ways-metas-llama-3-1-is-an-advance-for-gen-ai/
Bradley Brown, Jordan Juravsky, Ryan Ehrlich, Ronald Clark, Quoc V. Le, Christopher Ré, Azalia Mirhoseini, 31 Jul 2024, Large Language Monkeys: Scaling Inference Compute with Repeated Sampling, https://arxiv.org/abs/2407.21787 (Generating multiple answers by repeated inference queries, and then using a verifier to choose the best one, which is shown to greatly increase overall accuracy.)
Yangzhen Wu, Zhiqing Sun, Shanda Li, Sean Welleck, Yiming Yang, 1 Aug 2024, An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models, https://arxiv.org/abs/2408.00724
Pablo Villalobos, Anson Ho, Jaime Sevilla, Tamay Besiroglu, Lennart Heim, Marius Hobbhahn, Jun 06, 2024, Will We Run Out of Data? Limits of LLM Scaling Based on Human-Generated Data, Epoch AI, https://epochai.org/blog/will-we-run-out-of-data-limits-of-llm-scaling-based-on-human-generated-data
Anson Ho, Tamay Besiroglu, Ege Erdil, David Owen, Robi Rahman, Zifan Carl Guo, David Atkinson, Neil Thompson, Jaime Sevilla, 9 Mar 2024, Algorithmic progress in language models, https://arxiv.org/abs/2403.05812
Nathan Lambert, Sep 05, 2024, OpenAI’s Strawberry, LM self-talk, inference scaling laws, and spending more on inference, Whether or not scaling works, we should spend more on inference, https://www.interconnects.ai/p/openai-strawberry-and-inference-scaling-laws
Ethan Mollick, Sep 16, 2024, Scaling: The State of Play in AI, https://www.oneusefulthing.org/p/scaling-the-state-of-play-in-ai
Chuhan Wu, Ruiming Tang, 17 September 2024, Towards a Universal Scaling Law of LLM Training and Inference, DOI: 10.14293/PR2199.001074.v1, https://www.scienceopen.com/document_file/b3ff92f8-76a6-42ca-94d2-48693442bf98/ScienceOpenPreprint/Unified_law_arxiv.pdf
Elias Frantar, September, 2024, Compressing Large Neural Networks Algorithms, Systems and Scaling Laws, Ph.D. Thesis, Graduate School, Institute of Science and Technology, Austria, https://research-explorer.ista.ac.at/download/17485/17880/frantar_thesis_final.pdf
Akash Bajwa, Oct 07, 2024, Inference Time Scaling Laws: AI Megacycle Of System 1 And System 2 Applications, https://akashbajwa.substack.com/p/inference-time-scaling-laws
Tanay Jaipuria, Oct 29, 2024, OpenAI's o-1 and inference-time scaling laws, https://www.tanayj.com/p/openais-o-1-and-inference-time-scaling
Tanishq Kumar, Zachary Ankner, Benjamin F. Spector, Blake Bordelon, Niklas Muennighoff, Mansheej Paul, Cengiz Pehlevan, Christopher Ré, Aditi Raghunathan, 7 Nov 2024, Scaling Laws for Precision, https://arxiv.org/abs/2411.04330
Krystal Hu and Anna Tong, November 15, 2024, OpenAI and others seek new path to smarter AI as current methods hit limitations, https://www.reuters.com/technology/artificial-intelligence/openai-rivals-seek-new-path-smarter-ai-current-methods-hit-limitations-2024-11-11/
Bo Chen, Xiaoyu Li, Yingyu Liang, Jiangxuan Long, Zhenmei Shi, Zhao Song, 12 Nov 2024, Circuit Complexity Bounds for RoPE-based Transformer Architecture, https://arxiv.org/abs/2411.07602
Xu Ouyang, Tao Ge, Thomas Hartvigsen, Zhisong Zhang, Haitao Mi, Dong Yu, 27 Nov 2024 (v2), Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens, https://arxiv.org/abs/2411.17691
Gary Marcus, Nov 25, 2024, A new AI scaling law shell game? Scaling laws ain’t what they used to be, https://garymarcus.substack.com/p/a-new-ai-scaling-law-shell-game
Gary Grossman, Edelman, December 1, 2024, The end of AI scaling may not be nigh: Here’s what’s next, https://venturebeat.com/ai/the-end-of-ai-scaling-may-not-be-nigh-heres-whats-next/
Maxwell Zeff, November 20, 2024, Current AI scaling laws are showing diminishing returns, forcing AI labs to change course, https://techcrunch.com/2024/11/20/ai-scaling-laws-are-showing-diminishing-returns-forcing-ai-labs-to-change-course/ ("at least 10 to 20x gains in model performance ...intelligent prompting, UX decisions, and passing context at the right time into the models...")
Akash Bajwa, Jan 06, 2025, Test-Time Search: A Path To AGI: Stacking Scaling Laws And Reward Engineering, https://akashbajwa.substack.com/p/test-time-search-a-path-to-agi
Cameron R. Wolfe, Ph.D., Jan 06, 2025, Scaling Laws for LLMs: From GPT-3 to o3, Understanding the current state of LLM scaling and the future of AI research, https://cameronrwolfe.substack.com/p/scaling-laws-for-llms-from-gpt-3
Maxwell Zeff, January 7, 2025, Nvidia CEO says his AI chips are improving faster than Moore’s Law, https://techcrunch.com/2025/01/07/nvidia-ceo-says-his-ai-chips-are-improving-faster-than-moores-law/
Chien-Ping Lu, 8 Jan 2025 (v3), The Race to Efficiency: A New Perspective on AI Scaling Laws, https://arxiv.org/abs/2501.02156
Violet Xiang, Charlie Snell, Kanishk Gandhi, Alon Albalak, Anikait Singh, Chase Blagden, Duy Phung, Rafael Rafailov, Nathan Lile, Dakota Mahan, Louis Castricato, Jan-Philipp Franken, Nick Haber, Chelsea Finn, 8 Jan 2025, Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought, https://arxiv.org/abs/2501.04682
Andrea Matarazzo, Riccardo Torlone, 3 Jan 2025, A Survey on Large Language Models with some Insights on their Capabilities and Limitations, https://arxiv.org/abs/2501.04040 (Broad survey with many LLM topics covered from history to architectures to optimizations.)
Sebastian Raschka, PhD, Jan 15, 2025, Noteworthy AI Research Papers of 2024 (Part Two). Six influential AI papers from July to December, https://magazine.sebastianraschka.com/p/ai-research-papers-2024-part-2 (Examines multimodal LLama3 models and the different multimodal architectures.)
Tong Xiao, Jingbo Zhu, 16 Jan 2025, Foundations of Large Language Models, https://arxiv.org/abs/2501.09223 (Huge 230 page paper on many topics such as training, prompting, alignment, and long context.)
Samira Abnar, Harshay Shah, Dan Busbridge, Alaaeldin Mohamed Elnouby Ali, Josh Susskind, Vimal Thilak, 21 Jan 2025, Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models, https://arxiv.org/abs/2501.12370
Mohit Sewak, Ph.D., January 29, 2025, Achieving General Intelligence (AGI) and Super Intelligence (ASI): Pathways, Uncertainties, and Ethical Concerns, https://towardsai.net/p/l/achieving-general-intelligence-agi-and-super-intelligence-asi-pathways-uncertainties-and-ethical-concerns
Da Yu, Edith Cohen, Badih Ghazi, Yangsibo Huang, Pritish Kamath, Ravi Kumar, Daogao Liu, Chiyuan Zhang, 3 Feb 2025, Scaling Embedding Layers in Language Models, https://arxiv.org/abs/2502.01637 (Using n-gram multi-token embedding layers, because embeddings are cheap to compute, rather than increasing vocabulary size.)
Sam Altman, Feb 10, 2025, Three Observations, https://blog.samaltman.com/three-observations (Talks about scaling laws, inference costs reducing, and AGI. One of them: "The cost to use a given level of AI falls about 10x every 12 months, and lower prices lead to much more use. ")
Ahmed El-Kishky, Alexander Wei, Andre Saraiva, Borys Minaev, Daniel Selsam, David Dohan, Francis Song, Hunter Lightman, Ignasi Clavera, Jakub Pachocki, Jerry Tworek, Lorenz Kuhn, Lukasz Kaiser, Mark Chen, Max Schwarzer, Mostafa Rohaninejad, Nat McAleese, o3 contributors, Oleg Mürk, Rhythm Garg, Rui Shu, Szymon Sidor, Vineet Kosaraju, Wenda Zhou, 3 Feb 2025, Competitive Programming with Large Reasoning Models, https://arxiv.org/abs/2502.06807 (OpenAI's paper on o3 that has similar conclusions to what DeepSeek showed about Reinforcement Learning for reasoning models, namely that "scaling general-purpose reinforcement learning" still works.)
Alberto Romero, Feb 19, 2025, Grok 3: Another Win For The Bitter Lesson. Congratulations to the xAI team—and the advocates of the scaling laws, https://www.thealgorithmicbridge.com/p/grok-3-another-win-for-the-bitter
Jeremy Kahn, February 26, 2025, The $19.6 billion pivot: How OpenAI’s 2-year struggle to launch GPT-5 revealed that its core AI strategy has stopped working, https://fortune.com/2025/02/25/what-happened-gpt-5-openai-orion-pivot-scaling-pre-training-llm-agi-reasoning/
Rylan Schaeffer, Joshua Kazdan, John Hughes, Jordan Juravsky, Sara Price, Aengus Lynch, Erik Jones, Robert Kirk, Azalia Mirhoseini, Sanmi Koyejo, 24 Feb 2025, How Do Large Language Monkeys Get Their Power (Laws)? https://arxiv.org/abs/2502.17578 (More attempts at reasoning on a problem means more accuracy.)
METR, 19 March 2025 Measuring AI Ability to Complete Long Tasks, https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/ (A unique analysis of a new "scaling law" that measures how good AI is against humans, in terms of the length of the tasks in minutes.)

Aussie AI

Scaling Laws in Generative AI

What are AI Scaling Laws?

What are Inference Scaling Laws?

Research on Inference Scaling Laws

What is Test Time Compute?

Research on Scaling Laws

More AI Research

Quick Links

Product

New to Writing?

Writing Styles