Aussie AI
Small Language Models
-
Last Updated 28 November, 2024
-
by David Spuler, Ph.D.
What are Small Language Models?
Small Langage Models (SLMs) are like LLMs, but smaller in terms of the total number of weights and parameters. This means they are less expensive to compute and run faster with a lower latency.
Great progress has been made in training these smaller models with fewer weights to nevertheless offer a great deal of intelligence, albeit artificial. Small models are particularly useful for on-device inference, such as AI phones and AI PCs.
Research on SLMs
Research papers on small language models:
- J Cañete, F Bravo-Marquez, 2024, Speedy Gonzales: A Collection of Fast Task-Specific Models for Spanish, https://felipebravom.com/publications/starsem2024.pdf (Optimizing small models on CPU and GPU for the Spanish language, mostly using distillation.)
- Yash Bhaskar, Feb 22, 2024, Gemma vs. Mistal: Comparison of Smaller AI-Language Models, Cubed, https://blog.cubed.run/gemma-vs-mistal-comparison-of-smaller-ai-language-models-a9482f87b0f2
- Benj Edwards, 24 April, 2024, Microsoft’s Phi-3 shows the surprising power of small, locally run AI language models, https://arstechnica.com/information-technology/2024/04/microsofts-phi-3-shows-the-surprising-power-of-small-locally-run-ai-language-models/
- Busayo Awobade, Mardiyyah Oduwole, Steven Kolawole, 6 Apr 2024, What Happens When Small Is Made Smaller? Exploring the Impact of Compression on Small Data Pretrained Language Models, https://arxiv.org/abs/2404.04759 (General article shows that the big three of model compression work not just on compression big LLMs, but also on making small models even smaller.)
- Stan Gibson, 03 Jun 2024, Getting infrastructure right for generative AI, CIO, https://www.cio.com/article/2128440/getting-infrastructure-right-for-generative-ai.html
- Robert Wolfe, Isaac Slaughter, Bin Han, Bingbing Wen, Yiwei Yang, Lucas Rosenblatt, Bernease Herman, Eva Brown, Zening Qu, Nic Weber, and Bill Howe. 2024. Laboratory-Scale AI: Open-Weight Models are Competitive with ChatGPT Even in Low-Resource Settings. In ACMConference on Fairness, Accountability, and Transparency (ACM FAccT ’24), June 3–6, 2024, Rio de Janeiro, Brazil. ACM, New York, NY, USA, 18 pages. https://doi.org/10.1145/3630106.3658966 https://arxiv.org/pdf/2405.16820
- Qingyuan Wang, Barry Cardiff, Antoine Frappé, Benoit Larras, Deepu John, 26 Mar 2024, Tiny Models are the Computational Saver for Large Models, https://arxiv.org/abs/2403.17726v1 (Choose tiny or small models after an initial layer of the larger model, combining early exit with easy-hard queries for multi-model inference.)
- Devvrit, Sneha Kudugunta, Aditya Kusupati, Tim Dettmers, Kaifeng Chen, Inderjit Dhillon, Yulia Tsvetkov, Hannaneh Hajishirzi, Sham Kakade, Ali Farhadi, Prateek Jain, 2024, MatFormer: Nested Transformer for Elastic Inference https://openreview.net/pdf?id=93BaEweoRg (A method of training one large model, and then extracting many smaller sub-models from that model, using FFNs with a subset of parameters, which if done staticly can then be similar to a form of model compression, and elastic inference done dynamically is a type of adaptive inference.)
- Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio Cesar, Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sébastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee, Yuanzhi Li, June 2023, Textbooks Are All You Need, Microsoft Research, https://www.microsoft.com/en-us/research/publication/textbooks-are-all-you-need/
- Mojan Javaheripi, Sébastien Bubeck, December 12, 2023, Phi-2: The surprising power of small language models, Microsoft Research, https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/
- Grant Gross, 13 Jun 2024, IT leaders go small for purpose-built AI, https://www.cio.com/article/2139985/it-leaders-go-small-for-purpose-built-ai.html
- Michael Hassid, Tal Remez, Jonas Gehring, Roy Schwartz, Yossi Adi, 31 Mar 2024, The Larger the Better? Improved LLM Code-Generation via Budget Reallocation, https://arxiv.org/abs/2404.00725v1
- Benjamin Bergner, Andrii Skliar, Amelie Royer, Tijmen Blankevoort, Yuki Asano, Babak Ehteshami Bejnordi, 26 Feb 2024, Think Big, Generate Quick: LLM-to-SLM for Fast Autoregressive Decoding, https://arxiv.org/abs/2402.16844 (Using a large model to train parallel decoding for a small language model.)
- Omkar Thawakar, Ashmal Vayani, Salman Khan, Hisham Cholakal, Rao M. Anwer, Michael Felsberg, Tim Baldwin, Eric P. Xing, Fahad Shahbaz Khan, 26 Feb 2024, MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT, https://arxiv.org/abs/2402.16840 Code: https://github.com/mbzuai-oryx/MobiLlama
- Peter Hase, Mohit Bansal, Peter Clark, Sarah Wiegreffe, 12 Jan 2024, The Unreasonable Effectiveness of Easy Training Data for Hard Tasks, https://arxiv.org/abs/2401.06751
- Chang, Xiangyu; Miraj Ahmed, Sk; Krishnamurthy, Srikanth V.; Guler, Basak; Swami, Ananthram; Oymak, Samet; Roy-Chowdhury, Amit K., Jan 2024, Plug-and-Play Transformer Modules for Test-Time Adaptation, https://arxiv.org/abs/2401.04130 https://ui.adsabs.harvard.edu/abs/2024arXiv240104130C/abstract
- Chia-Hsuan Lee, Hao Cheng, Mari Ostendorf, Nov 2023, OrchestraLLM: Efficient Orchestration of Language Models for Dialogue State Tracking, https://arxiv.org/abs/2311.09758
- Chang Liu, Chongyang Tao, Jianxin Liang, Jiazhan Feng, Tao Shen, 2023, Quzhe Huang, Dongyan Zhao,Length-Adaptive Distillation: Customizing Small Language Model for Dynamic Token Pruning, Findings of the Association for Computational Linguistics: EMNLP 2023, pages 4452–4463, December 6-10, 2023, https://aclanthology.org/2023.findings-emnlp.294.pdf (Explores combining static model compression via knowledge distillation with dynamic adaptive inference via token pruning. This creates a modified distillation algorithm that prepares the model for token pruning during training.)
- Ignacio de Gregorio, June 2024, My Thoughts on Apple Intelligence: Leveling the Stakes & Betraying the Essence, https://readmedium.com/en/my-thoughts-on-apple-intelligence-16a793359cb5
- Jie Wu, Yufeng Zhu, Lei Shen, Xuqing Lu, 14 Jun 2024, GEB-1.3B: Open Lightweight Large Language Model, https://arxiv.org/abs/2406.09900 Code: https://huggingface.co/GEB-AGI/geb-1.3b
- Lucas Mearian, 05 Jun 2024, Can Intel’s new chips compete with Nvidia in the AI universe? https://www.computerworld.com/article/2138358/can-intels-new-chips-compete-with-nvidia-in-the-ai-universe.html
- Kaiyan Zhang, Jianyu Wang, Ning Ding, Biqing Qi, Ermo Hua, Xingtai Lv, Bowen Zhou, 18 Jun 2024, Fast and Slow Generating: An Empirical Study on Large and Small Language Models Collaborative Decoding, https://arxiv.org/abs/2406.12295 Code: https://github.com/TsinghuaC3I/FS-GEN
- Apple, June 2024, Introducing Apple’s On-Device and Server Foundation Models, https://machinelearning.apple.com/research/introducing-apple-foundation-models (Apple's on-device models feature optimizations including small models, grouped query attention, 2-bit/4-bit quantization including activation quantization, shared embedding/unembedding tensors, small-ish vocabulary size of 49k, an undisclosed efficient KV cache optimization for neural engines, and layer-specific 16-bit LoRA/QLoRA adapters of size "10s of megabytes" for fine-tuned specialized model versions, also sometimes in 2-bit/4-bit, claiming speed rates of 0.6ms/token in prefill, and 30 tokens per second in decoding.)
- Ignacio de Gregorio, June 2024, How Does Apple Intelligence Really Work? Deep dive into Apple’s newest bet, https://medium.com/@ignacio.de.gregorio.noblejas/how-does-apple-intelligence-really-work-5f79b368c86d
- Piotr Skalski, June 20, 2024, Florence-2: Open Source Vision Foundation Model by Microsoft, https://blog.roboflow.com/florence-2/
- Tom Taulli, February 17, 2024, 3 Most Common Problems with Small Language Models: Small language models are rising in popularity, but they have problems too. Here's how to address them, https://aibusiness.com/nlp/3-most-common-problems-with-small-language-models
- Zixu Hao, Huiqiang Jiang, Shiqi Jiang, Ju Ren, Ting Cao, June 2024, Hybrid SLM and LLM for Edge-Cloud Collaborative Inference, EdgeFM ’24, June 3–7, 2024, Minato-ku, Tokyo, Japan, https://dl.acm.org/doi/pdf/10.1145/3662006.3662067 (Small model on edge devices with large model in the cloud, performing collaborative inference.)
- Franklin Huang, May 17, 2024, Machine Learning Systems with Reduced Memory Requirements, Masters of Science, Electrical Engineering and Computer Sciences, University of California, Berkeley, Technical Report No. UCB/EECS-2024-120 http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-120.html https://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-120.pdf Code: https://github.com/hongyihuang/spec-mcts/blob/main/triton (Broad paper that examines a lot of different optimizations that reduce memory costs, including quantization, kernel fusion, sparsity, MatMul optimizations, KV cache compression, and various other methods.)
- Clement Farabet, Tris Warkentin, Jun 27, 2024 Gemma 2 is now available to researchers and developers, https://blog.google/technology/developers/google-gemma-2/
- CNBC, July 4, 2024, For China’s AI players, 2024 is a ‘year of small models,’ analyst says, https://www.cnbc.com/video/2024/07/04/for-chinas-ai-players-2024-is-a-year-of-small-models-analyst-says.html
- Zechun Liu, Changsheng Zhao, Forrest Iandola, Chen Lai, Yuandong Tian, Igor Fedorov, Yunyang Xiong, Ernie Chang, Yangyang Shi, Raghuraman Krishnamoorthi, Liangzhen Lai, Vikas Chandra, 27 Jun 2024 (v2), MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases, Meta Research, https://arxiv.org/abs/2402.14905 Code: https://github.com/facebookresearch/MobileLLM
- Hayden Field, July 18, 2024, OpenAI debuts mini version of its most powerful model yet, https://www.cnbc.com/2024/07/18/openai-4o-mini-model-announced.html
- David Linthicum, Aug 02, 2024, Small language models and open source are transforming AI, https://www.infoworld.com/article/3480593/small-language-models-and-open-source-are-transforming-ai.html
- Yuan Yao, Tianyu Yu, Ao Zhang, Chongyi Wang, Junbo Cui, Hongji Zhu, Tianchi Cai, Haoyu Li, Weilin Zhao, Zhihui He, Qianyu Chen, Huarong Zhou, Zhensheng Zou, Haoye Zhang, Shengding Hu, Zhi Zheng, Jie Zhou, Jie Cai, Xu Han, Guoyang Zeng, Dahai Li, Zhiyuan Liu, Maosong Sun, 3 Aug 2024, MiniCPM-V: A GPT-4V Level MLLM on Your Phone, https://arxiv.org/abs/2408.01800 Code: https://github.com/OpenBMB/MiniCPM-V
- Level Up Coding, Aug 2024, Google open-sources the most powerful small model on the edge: 2B parameters surpass GPT-3.5-Turbo, and Apple 15Pro runs fast, https://levelup.gitconnected.com/google-open-sources-the-most-powerful-small-model-on-the-edge-2b-parameters-surpass-gpt-3-5-turbo-c0b13f96997c
- Carl Franzen, August 20, 2024, Microsoft releases powerful new Phi-3.5 models, beating Google, OpenAI and more, https://venturebeat.com/ai/microsoft-releases-powerful-new-phi-3-5-models-beating-google-openai-and-more/
- Louie Peters, Aug 27, 2024, Two Paths to Small LMs? Synthetic Data (Phi 3.5) vs Pruning & Distillation (Llama-3.1-Minitron), https://newsletter.towardsai.net/p/114-two-paths-to-small-lms-synthetic
- Thierry Moreau, Aug 22, 2024, In Defense of the Small Language Model, https://octo.ai/blog/in-defense-of-the-small-language-model/
- Paul DelSignore, Aug 15, 2024, Why You Need To Know About Small Language Models: The Future of AI Efficiency and Precision, https://generativeai.pub/why-you-need-to-know-about-small-language-models-d4c0a4c292a0
- Kari Briski, August 21, 2024, Lightweight Champ: NVIDIA Releases Small Language Model With State-of-the-Art Accuracy, https://blogs.nvidia.com/blog/mistral-nemo-minitron-8b-small-language-model/
- Chandra Irugalbandara, Ashish Mahendra, Roland Daynauth, Tharuka Kasthuri Arachchige, Jayanaka Dantanarayana, Krisztian Flautner, Lingjia Tang, Yiping Kang, Jason Mars, 16 Apr 2024 (v3), Scaling Down to Scale Up: A Cost-Benefit Analysis of Replacing OpenAI's LLM with Open Source SLMs in Production, https://arxiv.org/abs/2312.14972
- Taryn Plumb, August 27, 2024, Google drops ‘stronger’ and ‘significantly improved’ experimental Gemini models, https://venturebeat.com/ai/google-drops-stronger-and-significantly-improved-experimental-gemini-models/
- Alvaro Cintas, Aug 27, 2024, How to run Phi-3.5 in your phone, https://university.therundown.ai/c/daily-tutorials/how-to-run-phi-3-5-in-your-phone-4d5d917a-09b0-40c0-a0b4-fb63d9a65d9c
- Asif Razzaq, September 5, 2024, Yi-Coder Released by 01.AI: A Powerful Small-Scale Code LLM Series, Delivering Exceptional Performance in Code Generation, Editing, and Long-Context Comprehension, https://www.marktechpost.com/2024/09/05/yi-coder-released-by-01-ai-a-powerful-small-scale-code-llm-series-delivering-exceptional-performance-in-code-generation-editing-and-long-context-comprehension/
- Lihu Chen, Gaël Varoquaux, 10 Sep 2024, What is the Role of Small Models in the LLM Era: A Survey, https://arxiv.org/abs/2409.06857 https://github.com/tigerchen52/role_of_small_models
- James Thomason, April 12, 2024, Why small language models are the next big thing in AI, https://venturebeat.com/ai/why-small-language-models-are-the-next-big-thing-in-ai/
- Hritik Bansal, Arian Hosseini, Rishabh Agarwal, Vinh Q. Tran, Mehran Kazemi, 29 Aug 2024, Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling, https://arxiv.org/abs/2408.16737
- Cobus Greyling, Sep 27, 2024, Small Language Model (SLM) Efficiency, Performance & Potential, https://cobusgreyling.medium.com/small-language-model-slm-efficiency-performance-potential-ed59c4d48ce9
- Shrenik Bhansali, Alwin Jin, Tyler Lizzo, Larry Heck, 23 Oct 2024, LEGO: Language Model Building Blocks, https://arxiv.org/abs/2410.18287 (Extract small models out of large models.)
- Jacob Robbins, October 26, 2024, Do small language models hold the key to enterprise AI adoption? https://pitchbook.com/news/articles/small-language-models-ai-enterprise-software
- Ankit Singh Rawat, Veeranjaneyulu Sadhanala, Afshin Rostamizadeh, Ayan Chakrabarti, Wittawat Jitkrittum, Vladimir Feinberg, Seungyeon Kim, Hrayr Harutyunyan, Nikunj Saunshi, Zachary Nado, Rakesh Shivanna, Sashank J. Reddi, Aditya Krishna Menon, Rohan Anil, Sanjiv Kumar, 24 Oct 2024, A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs, https://arxiv.org/abs/2410.18779
- Michael Nuñez, October 28, 2024, Moondream raises $4.5M to prove that smaller AI models can still pack a punch, https://venturebeat.com/ai/moondream-raises-4-5m-to-prove-that-smaller-ai-models-can-still-pack-a-punch/
- Saleh Ashkboos, Iman Mirzadeh, Keivan Alizadeh, Mohammad Hossein Sekhavat, Moin Nabi, Mehrdad Farajtabar, Fartash Faghri, 25 Oct 2024, Computational Bottlenecks of Training Small-scale Large Language Models, https://arxiv.org/abs/2410.19456
- Fali Wang, Zhiwei Zhang, Xianren Zhang, Zongyu Wu, Tzuhao Mo, Qiuhao Lu, Wanjing Wang, Rui Li, Junjie Xu, Xianfeng Tang, Qi He, Yao Ma, Ming Huang, Suhang Wang, 4 Nov 2024, A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness, https://arxiv.org/abs/2411.03350
- Andres Marafioti, Merve Noyan, Miquel Farré, Elie Bakouch, Pedro Cuenca, November 26, 2024, SmolVLM - small yet mighty Vision Language Model, https://huggingface.co/blog/smolvlm
More AI Research
Read more about: