Aussie AI

Small Language Models

Last Updated 22 May, 2025

by David Spuler, Ph.D.

What are Small Language Models?

Small Langage Models (SLMs) are like LLMs, but smaller in terms of the total number of weights and parameters, usually 1B or 2B, but sometimes up to 7B. Small models can be full precision or quantized, but their "smallness" refers to their parameter count, not their memory size after model compression. This means they are less expensive to compute and run faster with a lower latency.

Great progress has been made in training these smaller models with fewer weights to nevertheless offer a great deal of intelligence, albeit artificial. Small models are particularly useful for on-device inference, such as AI phones and AI PCs.

Research on SLMs

Research papers on small language models:

J Cañete, F Bravo-Marquez, 2024, Speedy Gonzales: A Collection of Fast Task-Specific Models for Spanish, https://felipebravom.com/publications/starsem2024.pdf (Optimizing small models on CPU and GPU for the Spanish language, mostly using distillation.)
Yash Bhaskar, Feb 22, 2024, Gemma vs. Mistal: Comparison of Smaller AI-Language Models, Cubed, https://blog.cubed.run/gemma-vs-mistal-comparison-of-smaller-ai-language-models-a9482f87b0f2
Benj Edwards, 24 April, 2024, Microsoft’s Phi-3 shows the surprising power of small, locally run AI language models, https://arstechnica.com/information-technology/2024/04/microsofts-phi-3-shows-the-surprising-power-of-small-locally-run-ai-language-models/
Busayo Awobade, Mardiyyah Oduwole, Steven Kolawole, 6 Apr 2024, What Happens When Small Is Made Smaller? Exploring the Impact of Compression on Small Data Pretrained Language Models, https://arxiv.org/abs/2404.04759 (General article shows that the big three of model compression work not just on compression big LLMs, but also on making small models even smaller.)
Stan Gibson, 03 Jun 2024, Getting infrastructure right for generative AI, CIO, https://www.cio.com/article/2128440/getting-infrastructure-right-for-generative-ai.html
Robert Wolfe, Isaac Slaughter, Bin Han, Bingbing Wen, Yiwei Yang, Lucas Rosenblatt, Bernease Herman, Eva Brown, Zening Qu, Nic Weber, and Bill Howe. 2024. Laboratory-Scale AI: Open-Weight Models are Competitive with ChatGPT Even in Low-Resource Settings. In ACMConference on Fairness, Accountability, and Transparency (ACM FAccT ’24), June 3–6, 2024, Rio de Janeiro, Brazil. ACM, New York, NY, USA, 18 pages. https://doi.org/10.1145/3630106.3658966 https://arxiv.org/pdf/2405.16820
Qingyuan Wang, Barry Cardiff, Antoine Frappé, Benoit Larras, Deepu John, 26 Mar 2024, Tiny Models are the Computational Saver for Large Models, https://arxiv.org/abs/2403.17726v1 (Choose tiny or small models after an initial layer of the larger model, combining early exit with easy-hard queries for multi-model inference.)
Devvrit, Sneha Kudugunta, Aditya Kusupati, Tim Dettmers, Kaifeng Chen, Inderjit Dhillon, Yulia Tsvetkov, Hannaneh Hajishirzi, Sham Kakade, Ali Farhadi, Prateek Jain, 2024, MatFormer: Nested Transformer for Elastic Inference https://openreview.net/pdf?id=93BaEweoRg (A method of training one large model, and then extracting many smaller sub-models from that model, using FFNs with a subset of parameters, which if done staticly can then be similar to a form of model compression, and elastic inference done dynamically is a type of adaptive inference.)
Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio Cesar, Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sébastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee, Yuanzhi Li, June 2023, Textbooks Are All You Need, Microsoft Research, https://www.microsoft.com/en-us/research/publication/textbooks-are-all-you-need/
Mojan Javaheripi, Sébastien Bubeck, December 12, 2023, Phi-2: The surprising power of small language models, Microsoft Research, https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/
Grant Gross, 13 Jun 2024, IT leaders go small for purpose-built AI, https://www.cio.com/article/2139985/it-leaders-go-small-for-purpose-built-ai.html
Michael Hassid, Tal Remez, Jonas Gehring, Roy Schwartz, Yossi Adi, 31 Mar 2024, The Larger the Better? Improved LLM Code-Generation via Budget Reallocation, https://arxiv.org/abs/2404.00725v1
Benjamin Bergner, Andrii Skliar, Amelie Royer, Tijmen Blankevoort, Yuki Asano, Babak Ehteshami Bejnordi, 26 Feb 2024, Think Big, Generate Quick: LLM-to-SLM for Fast Autoregressive Decoding, https://arxiv.org/abs/2402.16844 (Using a large model to train parallel decoding for a small language model.)
Omkar Thawakar, Ashmal Vayani, Salman Khan, Hisham Cholakal, Rao M. Anwer, Michael Felsberg, Tim Baldwin, Eric P. Xing, Fahad Shahbaz Khan, 26 Feb 2024, MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT, https://arxiv.org/abs/2402.16840 Code: https://github.com/mbzuai-oryx/MobiLlama
Peter Hase, Mohit Bansal, Peter Clark, Sarah Wiegreffe, 12 Jan 2024, The Unreasonable Effectiveness of Easy Training Data for Hard Tasks, https://arxiv.org/abs/2401.06751
Chang, Xiangyu; Miraj Ahmed, Sk; Krishnamurthy, Srikanth V.; Guler, Basak; Swami, Ananthram; Oymak, Samet; Roy-Chowdhury, Amit K., Jan 2024, Plug-and-Play Transformer Modules for Test-Time Adaptation, https://arxiv.org/abs/2401.04130 https://ui.adsabs.harvard.edu/abs/2024arXiv240104130C/abstract
Chia-Hsuan Lee, Hao Cheng, Mari Ostendorf, Nov 2023, OrchestraLLM: Efficient Orchestration of Language Models for Dialogue State Tracking, https://arxiv.org/abs/2311.09758
Chang Liu, Chongyang Tao, Jianxin Liang, Jiazhan Feng, Tao Shen, 2023, Quzhe Huang, Dongyan Zhao,Length-Adaptive Distillation: Customizing Small Language Model for Dynamic Token Pruning, Findings of the Association for Computational Linguistics: EMNLP 2023, pages 4452–4463, December 6-10, 2023, https://aclanthology.org/2023.findings-emnlp.294.pdf (Explores combining static model compression via knowledge distillation with dynamic adaptive inference via token pruning. This creates a modified distillation algorithm that prepares the model for token pruning during training.)
Ignacio de Gregorio, June 2024, My Thoughts on Apple Intelligence: Leveling the Stakes & Betraying the Essence, https://readmedium.com/en/my-thoughts-on-apple-intelligence-16a793359cb5
Jie Wu, Yufeng Zhu, Lei Shen, Xuqing Lu, 14 Jun 2024, GEB-1.3B: Open Lightweight Large Language Model, https://arxiv.org/abs/2406.09900 Code: https://huggingface.co/GEB-AGI/geb-1.3b
Lucas Mearian, 05 Jun 2024, Can Intel’s new chips compete with Nvidia in the AI universe? https://www.computerworld.com/article/2138358/can-intels-new-chips-compete-with-nvidia-in-the-ai-universe.html
Kaiyan Zhang, Jianyu Wang, Ning Ding, Biqing Qi, Ermo Hua, Xingtai Lv, Bowen Zhou, 18 Jun 2024, Fast and Slow Generating: An Empirical Study on Large and Small Language Models Collaborative Decoding, https://arxiv.org/abs/2406.12295 Code: https://github.com/TsinghuaC3I/FS-GEN
Apple, June 2024, Introducing Apple’s On-Device and Server Foundation Models, https://machinelearning.apple.com/research/introducing-apple-foundation-models (Apple's on-device models feature optimizations including small models, grouped query attention, 2-bit/4-bit quantization including activation quantization, shared embedding/unembedding tensors, small-ish vocabulary size of 49k, an undisclosed efficient KV cache optimization for neural engines, and layer-specific 16-bit LoRA/QLoRA adapters of size "10s of megabytes" for fine-tuned specialized model versions, also sometimes in 2-bit/4-bit, claiming speed rates of 0.6ms/token in prefill, and 30 tokens per second in decoding.)
Ignacio de Gregorio, June 2024, How Does Apple Intelligence Really Work? Deep dive into Apple’s newest bet, https://medium.com/@ignacio.de.gregorio.noblejas/how-does-apple-intelligence-really-work-5f79b368c86d
Piotr Skalski, June 20, 2024, Florence-2: Open Source Vision Foundation Model by Microsoft, https://blog.roboflow.com/florence-2/
Tom Taulli, February 17, 2024, 3 Most Common Problems with Small Language Models: Small language models are rising in popularity, but they have problems too. Here's how to address them, https://aibusiness.com/nlp/3-most-common-problems-with-small-language-models
Zixu Hao, Huiqiang Jiang, Shiqi Jiang, Ju Ren, Ting Cao, June 2024, Hybrid SLM and LLM for Edge-Cloud Collaborative Inference, EdgeFM ’24, June 3–7, 2024, Minato-ku, Tokyo, Japan, https://dl.acm.org/doi/pdf/10.1145/3662006.3662067 (Small model on edge devices with large model in the cloud, performing collaborative inference.)
Franklin Huang, May 17, 2024, Machine Learning Systems with Reduced Memory Requirements, Masters of Science, Electrical Engineering and Computer Sciences, University of California, Berkeley, Technical Report No. UCB/EECS-2024-120 http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-120.html https://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-120.pdf Code: https://github.com/hongyihuang/spec-mcts/blob/main/triton (Broad paper that examines a lot of different optimizations that reduce memory costs, including quantization, kernel fusion, sparsity, MatMul optimizations, KV cache compression, and various other methods.)
Clement Farabet, Tris Warkentin, Jun 27, 2024 Gemma 2 is now available to researchers and developers, https://blog.google/technology/developers/google-gemma-2/
CNBC, July 4, 2024, For China’s AI players, 2024 is a ‘year of small models,’ analyst says, https://www.cnbc.com/video/2024/07/04/for-chinas-ai-players-2024-is-a-year-of-small-models-analyst-says.html
Zechun Liu, Changsheng Zhao, Forrest Iandola, Chen Lai, Yuandong Tian, Igor Fedorov, Yunyang Xiong, Ernie Chang, Yangyang Shi, Raghuraman Krishnamoorthi, Liangzhen Lai, Vikas Chandra, 27 Jun 2024 (v2), MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases, Meta Research, https://arxiv.org/abs/2402.14905 Code: https://github.com/facebookresearch/MobileLLM
Hayden Field, July 18, 2024, OpenAI debuts mini version of its most powerful model yet, https://www.cnbc.com/2024/07/18/openai-4o-mini-model-announced.html
David Linthicum, Aug 02, 2024, Small language models and open source are transforming AI, https://www.infoworld.com/article/3480593/small-language-models-and-open-source-are-transforming-ai.html
Yuan Yao, Tianyu Yu, Ao Zhang, Chongyi Wang, Junbo Cui, Hongji Zhu, Tianchi Cai, Haoyu Li, Weilin Zhao, Zhihui He, Qianyu Chen, Huarong Zhou, Zhensheng Zou, Haoye Zhang, Shengding Hu, Zhi Zheng, Jie Zhou, Jie Cai, Xu Han, Guoyang Zeng, Dahai Li, Zhiyuan Liu, Maosong Sun, 3 Aug 2024, MiniCPM-V: A GPT-4V Level MLLM on Your Phone, https://arxiv.org/abs/2408.01800 Code: https://github.com/OpenBMB/MiniCPM-V
Level Up Coding, Aug 2024, Google open-sources the most powerful small model on the edge: 2B parameters surpass GPT-3.5-Turbo, and Apple 15Pro runs fast, https://levelup.gitconnected.com/google-open-sources-the-most-powerful-small-model-on-the-edge-2b-parameters-surpass-gpt-3-5-turbo-c0b13f96997c
Carl Franzen, August 20, 2024, Microsoft releases powerful new Phi-3.5 models, beating Google, OpenAI and more, https://venturebeat.com/ai/microsoft-releases-powerful-new-phi-3-5-models-beating-google-openai-and-more/
Louie Peters, Aug 27, 2024, Two Paths to Small LMs? Synthetic Data (Phi 3.5) vs Pruning & Distillation (Llama-3.1-Minitron), https://newsletter.towardsai.net/p/114-two-paths-to-small-lms-synthetic
Thierry Moreau, Aug 22, 2024, In Defense of the Small Language Model, https://octo.ai/blog/in-defense-of-the-small-language-model/
Paul DelSignore, Aug 15, 2024, Why You Need To Know About Small Language Models: The Future of AI Efficiency and Precision, https://generativeai.pub/why-you-need-to-know-about-small-language-models-d4c0a4c292a0
Kari Briski, August 21, 2024, Lightweight Champ: NVIDIA Releases Small Language Model With State-of-the-Art Accuracy, https://blogs.nvidia.com/blog/mistral-nemo-minitron-8b-small-language-model/
Chandra Irugalbandara, Ashish Mahendra, Roland Daynauth, Tharuka Kasthuri Arachchige, Jayanaka Dantanarayana, Krisztian Flautner, Lingjia Tang, Yiping Kang, Jason Mars, 16 Apr 2024 (v3), Scaling Down to Scale Up: A Cost-Benefit Analysis of Replacing OpenAI's LLM with Open Source SLMs in Production, https://arxiv.org/abs/2312.14972
Taryn Plumb, August 27, 2024, Google drops ‘stronger’ and ‘significantly improved’ experimental Gemini models, https://venturebeat.com/ai/google-drops-stronger-and-significantly-improved-experimental-gemini-models/
Alvaro Cintas, Aug 27, 2024, How to run Phi-3.5 in your phone, https://university.therundown.ai/c/daily-tutorials/how-to-run-phi-3-5-in-your-phone-4d5d917a-09b0-40c0-a0b4-fb63d9a65d9c
Asif Razzaq, September 5, 2024, Yi-Coder Released by 01.AI: A Powerful Small-Scale Code LLM Series, Delivering Exceptional Performance in Code Generation, Editing, and Long-Context Comprehension, https://www.marktechpost.com/2024/09/05/yi-coder-released-by-01-ai-a-powerful-small-scale-code-llm-series-delivering-exceptional-performance-in-code-generation-editing-and-long-context-comprehension/
Lihu Chen, Gaël Varoquaux, 10 Sep 2024, What is the Role of Small Models in the LLM Era: A Survey, https://arxiv.org/abs/2409.06857 https://github.com/tigerchen52/role_of_small_models
James Thomason, April 12, 2024, Why small language models are the next big thing in AI, https://venturebeat.com/ai/why-small-language-models-are-the-next-big-thing-in-ai/
Hritik Bansal, Arian Hosseini, Rishabh Agarwal, Vinh Q. Tran, Mehran Kazemi, 29 Aug 2024, Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling, https://arxiv.org/abs/2408.16737
Cobus Greyling, Sep 27, 2024, Small Language Model (SLM) Efficiency, Performance & Potential, https://cobusgreyling.medium.com/small-language-model-slm-efficiency-performance-potential-ed59c4d48ce9
Shrenik Bhansali, Alwin Jin, Tyler Lizzo, Larry Heck, 23 Oct 2024, LEGO: Language Model Building Blocks, https://arxiv.org/abs/2410.18287 (Extract small models out of large models.)
Jacob Robbins, October 26, 2024, Do small language models hold the key to enterprise AI adoption? https://pitchbook.com/news/articles/small-language-models-ai-enterprise-software
Ankit Singh Rawat, Veeranjaneyulu Sadhanala, Afshin Rostamizadeh, Ayan Chakrabarti, Wittawat Jitkrittum, Vladimir Feinberg, Seungyeon Kim, Hrayr Harutyunyan, Nikunj Saunshi, Zachary Nado, Rakesh Shivanna, Sashank J. Reddi, Aditya Krishna Menon, Rohan Anil, Sanjiv Kumar, 24 Oct 2024, A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs, https://arxiv.org/abs/2410.18779
Michael Nuñez, October 28, 2024, Moondream raises $4.5M to prove that smaller AI models can still pack a punch, https://venturebeat.com/ai/moondream-raises-4-5m-to-prove-that-smaller-ai-models-can-still-pack-a-punch/
Saleh Ashkboos, Iman Mirzadeh, Keivan Alizadeh, Mohammad Hossein Sekhavat, Moin Nabi, Mehrdad Farajtabar, Fartash Faghri, 25 Oct 2024, Computational Bottlenecks of Training Small-scale Large Language Models, https://arxiv.org/abs/2410.19456
Fali Wang, Zhiwei Zhang, Xianren Zhang, Zongyu Wu, Tzuhao Mo, Qiuhao Lu, Wanjing Wang, Rui Li, Junjie Xu, Xianfeng Tang, Qi He, Yao Ma, Ming Huang, Suhang Wang, 4 Nov 2024, A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness, https://arxiv.org/abs/2411.03350
Andres Marafioti, Merve Noyan, Miquel Farré, Elie Bakouch, Pedro Cuenca, November 26, 2024, SmolVLM - small yet mighty Vision Language Model, https://huggingface.co/blog/smolvlm
Michael Nuñez, December 12, 2024, Microsoft’s smaller AI model beats the big guys: Meet Phi-4, the efficiency king, https://venturebeat.com/ai/microsofts-smaller-ai-model-beats-the-big-guys-meet-phi-4-the-efficiency-king/
Shubham Sharma, December 17, 2024, UAE’s Falcon 3 challenges open-source leaders amid surging demand for small AI models, https://venturebeat.com/ai/uaes-falcon-3-challenges-open-source-leaders-amid-surging-demand-for-small-ai-models/
Francisco Durán, Matias Martinez, Patricia Lago, Silverio Martínez-Fernández, 19 Dec 2024, Energy consumption of code small language models serving with runtime engines and execution providers, https://arxiv.org/abs/2412.15441
Shreyas Subramanian, Vikram Elango, Mecit Gungor, 3 Jan 2025, Small Language Models (SLMs) Can Still Pack a Punch: A survey, https://arxiv.org/abs/2501.05465
Siddharth Narayanan, James D. Braza, Ryan-Rhys Griffiths, Manu Ponnapati, Albert Bou, Jon Laurent, Ori Kabeli, Geemi Wellawatte, Sam Cox, Samuel G. Rodriques, Andrew D. White, 30 Dec 2024, Aviary: training language agents on challenging scientific tasks, https://arxiv.org/abs/2412.21154 (Using smaller models combined with multi-step reasoning to compete with big models with 100x less inference cost.)
Kyle Wiggers, January 23, 2025, Hugging Face claims its new AI models are the smallest of their kind, https://techcrunch.com/2025/01/23/hugging-face-claims-its-new-ai-models-are-the-smallest-of-their-kind/
Matthias Bastian, Oct 6, 2024, Study reveals major reasoning flaws in smaller AI language models, https://the-decoder.com/study-reveals-major-reasoning-flaws-in-smaller-ai-language-models/
Shuyang Jiang, Yusheng Liao, Zhe Chen, Ya Zhang, Yanfeng Wang, Yu Wang, 21 Jan 2025, MedS3: Towards Medical Small Language Models with Self-Evolved Slow Thinking, https://arxiv.org/abs/2501.12051 https://github.com/pixas/medsss
Radhika Rajkumar, Jan. 30, 2025, Mistral AI says its Small 3 model is a local, open-source alternative to GPT-4o mini. The new 24B-parameter LLM 'excels in scenarios where quick, accurate responses are critical.' In fact, the model can be run on a MacBook with 32GB RAM. https://www.zdnet.com/article/mistral-ai-says-its-small-3-model-is-a-local-open-source-alternative-to-gpt-4o-mini/
hannibal27, Feb 2025, mistral-small-24b-instruct-2501 is simply the best model ever made, https://www.reddit.com/r/LocalLLaMA/comments/1ig2cm2/mistralsmall24binstruct2501_is_simply_the_best/
Fabio Matricardi, Jan 18, 2025, How a Small Language Model Can Achieve 100% Accuracy: In Context Learning is Underrated — ICL is the secret key to reach performance boosting — teach to an AI how to say “I don’t know” — part 2, https://generativeai.pub/how-a-small-language-model-can-achieve-100-accuracy-323a789ffa83
Ben Dickson, February 20, 2025, How test-time scaling unlocks hidden reasoning abilities in small language models (and allows them to outperform LLMs), https://venturebeat.com/ai/how-test-time-scaling-unlocks-hidden-reasoning-abilities-in-small-language-models-and-allows-them-to-outperform-llms/
Jasmine Wu, Deirdre Bosa, Feb 21 2025, How DeepSeek used distillation to train its artificial intelligence model, and what it means for companies such as OpenAI, https://www.cnbc.com/2025/02/21/deepseek-trained-ai-model-using-distillation-now-a-disruptive-force.html
Kayhan Behdin, Yun Dai, Ata Fatahibaarzi, Aman Gupta, Qingquan Song, Shao Tang, Hejian Sang, Gregory Dexter, Sirou Zhu, Siyu Zhu, Tejas Dharamsi, Maziar Sanjabi, Vignesh Kothapalli, Hamed Firooz, Zhoutong Fu, Yihan Cao, Pin-Lun Hsu, Fedor Borisyuk, Zhipeng Wang, Rahul Mazumder, Natesh Pillai, Luke Simon, 20 Feb 2025, Efficient AI in Practice: Training and Deployment of Efficient LLMs for Industry Applications, https://arxiv.org/abs/2502.14305 (Deploying small models for efficiency via distillation and quantization/pruning.)
Sabri Eyuboglu, Dan Biderman, Avanika Narayan, Feb 24, 2025, Minions: the rise of small, on-device LMs: Embracing small LMs, shifting compute on-device, and cutting cloud costs in the process, https://hazyresearch.stanford.edu/blog/2025-02-24-minions
kinfey, Feb 27, 2025, Welcome to the new Phi-4 models - Microsoft Phi-4-mini & Phi-4-multimodal, https://techcommunity.microsoft.com/blog/educatordeveloperblog/welcome-to-the-new-phi-4-models---microsoft-phi-4-mini--phi-4-multimodal/4386037
Carl Franzen, March 13, 2025, Cohere targets global enterprises with new highly multilingual Command A model requiring only 2 GPUs, https://venturebeat.com/ai/cohere-targets-global-enterprises-with-new-highly-multilingual-command-a-model-requiring-only-2-gpus/
Nathan Lambert, Mar 14, 2025, Gemma 3, OLMo 2 32B, and the growing potential of open-source AI: Leading open-weight models and the first open-source model to clearly surpass GPT 3.5 (the very last version), https://www.interconnects.ai/p/gemma-3-olmo-2-32b-and-the-growing
Gemma Team, Google DeepMind, 12 March 2025, Gemma 3Technical Report, https://storage.googleapis.com/deepmind-media/gemma/Gemma3Report.pdf
Mistral AI, Mar 17, 2025, Mistral Small 3.1: SOTA. Multimodal. Multilingual. Apache 2.0, https://mistral.ai/news/mistral-small-3-1
Michael Nuñez, March 17, 2025, Mistral AI drops new open-source model that outperforms GPT-4o Mini with fraction of parameters, https://venturebeat.com/ai/mistral-ai-drops-new-open-source-model-that-outperforms-gpt-4o-mini-with-fraction-of-parameters/
Chen Wu, Yin Song, 13 May 2025, Scaling Context, Not Parameters: Training a Compact 7B Language Model for Efficient Long-Context Processing, Mistral, https://arxiv.org/abs/2505.08651 https://huggingface.co/aws-prototyping/MegaBeam-Mistral-7B-512k
Xuechen Zhang, Zijian Huang, Chenshun Ni, Ziyang Xiong, Jiasi Chen, Samet Oymak, 14 May 2025 (v2), Making Small Language Models Efficient Reasoners: Intervention, Supervision, Reinforcement, https://arxiv.org/abs/2505.07961
Xiaomi LLM-Core Team: Bingquan Xia, Bowen Shen, Cici, Dawei Zhu, Di Zhang, Gang Wang, Hailin Zhang, Huaqiu Liu, Jiebao Xiao, Jinhao Dong, Liang Zhao, Peidian Li, Peng Wang, Shihua Yu, Shimao Chen, Weikun Wang, Wenhan Ma, Xiangwei Deng, Yi Huang, Yifan Song, Zihan Jiang, Bowen Ye, Can Cai, Chenhong He, Dong Zhang, Duo Zhang, Guoan Wang, Hao Tian, Haochen Zhao, Heng Qu, Hongshen Xu, Jun Shi, Kainan Bao, QingKai Fang, Kang Zhou, Kangyang Zhou, Lei Li, Menghang Zhu, Nuo Chen, Qiantong Wang, Shaohui Liu, Shicheng Li, Shuhao Gu, Shuhuai Ren, Shuo Liu, Sirui Deng, Weiji Zhuang, Weiwei Lv, Wenyu Yang, Xin Zhang, Xing Yong, Xing Zhang, Xingchen Song, Xinzhe Xu, Xu Wang, Yihan Yan, Yu Tu, Yuanyuan Tian, Yudong Wang, Yue Yu, Zhenru Lin, Zhichao Song, Zihao Yue, Xiaomi, 12 May 2025, MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining, https://arxiv.org/abs/2505.07608
C Zhang, X Zhu, L Chen, T Yang, E Pan, G Yu, Y Zhao, 2025, Enhancing LLM Inference Performance on ARMCPUsthrough Software and Hardware Co-optimization Strategies, DOI 10.23919/ICS.2025.3568404, https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10994252

Aussie AI

Small Language Models

What are Small Language Models?

Research on SLMs

More AI Research

Quick Links

Product

New to Writing?

Writing Styles