Aussie AI
Open Source Models
-
Last Updated 12 December, 2024
-
by David Spuler, Ph.D.
There are many different AI models that have been open-sourced. In many cases, both the code for the inference algorithm and the model's weights are available. Some licenses have only minimal restrictions (e.g. MIT License, Apache License 2.0), whereas other model licenses restrict usage to research or non-commercial activities.
Research Papers on Open Source Models
- Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample, Meta AI, Feb 2023, LLaMA: Open and Efficient Foundation Language Models, https://arxiv.org/abs/2302.13971 (Meta's Llama version 1, research-licensed, not fully open-sourced.)
- Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini, Rui Hou, Hakan Inan, Marcin Kardas, Viktor Kerkez, Madian Khabsa, Isabel Kloumann, Artem Korenev, Punit Singh Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Diana Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Todor Mihaylov, Pushkar Mishra, Igor Molybog, Yixin Nie, Andrew Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subramanian, Xiaoqing Ellen Tan, Binh Tang, Ross Taylor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aurelien Rodriguez, Robert Stojnic, Sergey Edunov, Thomas Scialom, Meta AI, July 2023, Llama 2: Open Foundation and Fine-Tuned Chat Models, https://arxiv.org/abs/2307.09288 (LLama version 2, open-sourced including commercial, with a non-standard model-specific license.)
- MosaicML NLP Team, "Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs", May 2023, Mosaic ML Blog, https://www.mosaicml.com/blog/mpt-7b
- Georgi Gerganov, Jun, 2023 Llama.cpp project, https://github.com/ggerganov/llama.cpp/
- Almazrouei, Ebtesam and Alobeidli, Hamza and Alshamsi, Abdulaziz and Cappelli, Alessandro and Cojocaru, Ruxandra and Debbah, Merouane and Goffinet, Etienne and Heslow, Daniel and Launay, Julien and Malartic, Quentin and Noune, Badreddine and Pannier, Baptiste and Penedo, Guilherme, "Falcon-40B: an open large language model with state-of-the-art performance", 2023, Hugging Face repository. https://huggingface.co/tiiuae/falcon-40b
- Guilherme Penedo and Quentin Malartic and Daniel Hesslow and Ruxandra Cojocaru and Alessandro Cappelli and Hamza Alobeidli and Baptiste Pannier and Ebtesam Almazrouei and Julien Launay, "The RefinedWeb dataset for Falcon LLM: outperforming curated corpora with web data, and web data only", June 2023, arXiv article https://arxiv.org/abs/2306.01116
- Tasmia Ansari, UC Berkeley Releases Open LLaMA, an Open-Source Alternative to Meta’s LLaMA, May 2023, Analytics India Magazine https://analyticsindiamag.com/uc-berkeley-release-an-open-source-alternative-to-metas-llama/
- Together Computer, "OpenChatKit: An Open Toolkit and Base Model for Dialogue-style Applications", March 2023, GitHub repository https://github.com/togethercomputer/OpenChatKit
- BigScience, "BLOOM: A 176B-Parameter Open-Access Multilingual Language Model", June 2023, arXiv paper 2211.05100 https://arxiv.org/pdf/2211.05100.pdf
- Nolan Dey, Gurpreet Gosal, Zhiming (Charles) Chen, Hemant Khachane, William Marshall, Ribhu Pathria, Marvin Tom, Joel Hestness, "Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster", April 2023, arXiv 2304.03208 https://arxiv.org/abs/2304.03208
- Lianmin Zheng and Wei-Lin Chiang and Ying Sheng and Siyuan Zhuang and Zhanghao Wu and Yonghao Zhuang and Zi Lin and Zhuohan Li and Dacheng Li and Eric. P Xing and Hao Zhang and Joseph E. Gonzalez and Ion Stoica, "Judging LLM-as-a-judge with MT-Bench and Chatbot Arena", 2023, ArXiv paper 2306.05685 https://arxiv.org/abs/2306.05685
- Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Yu, Joey Gonzalez, Hao Zhang, and Ion Stoica. June 20th, 2023, vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention, https://arxiv.org/pdf/2309.06180.pdf
- Jeon, Byungsoo, May 2024, Automated and Portable Machine Learning Systems, Ph.D. Thesis, Carnegie Mellon University, https://doi.org/10.1184/R1/25746708.v1 https://kilthub.cmu.edu/articles/thesis/Automated_and_Portable_Machine_Learning_Systems/25746708/1 PDF: https://kilthub.cmu.edu/ndownloader/files/46074087 Code: https://github.com/cmu-catalyst/collage (Portability layer to integrate the various kernels and low-level backends more easily. Also covers pipeline parallelism in graph models, and KV cache parallelism similar to FlashDecode.)
- Maria Korolov, 15 May 2024, 10 things to watch out for with open source gen AI, CIO, https://www.cio.com/article/2104280/10-things-to-watch-out-for-with-open-source-gen-ai.html
- JH Jones, May 2024, A Quantitative Comparison of Pre-Trained Model Registries to Traditional Software Package Registries, Masters Thesis, Electrical and Computer Engineering, Purdue University, https://hammer.purdue.edu/articles/thesis/A_Quantitative_Comparison_of_Pre-Trained_Model_Registries_to_Traditional_Software_Package_Registries/25686447/1 PDF: https://hammer.purdue.edu/ndownloader/files/46096152
- Tomasz Tunguz, Apr 24, 2024, A Shift in LLM Marketing : The Rise of the B2B Model, https://tomtunguz.com/snowflake-arctic-model/
- Nathan Lambert, APR 18, 2024, Llama 3: Scaling open LLMs to AGI, https://www.interconnects.ai/p/llama-3-and-scaling-open-llms
- John Loeffler, April 19, 2024, Meta rolls out new Meta AI website, and it might just bury Microsoft and Google's AI dreams, Tech Radar, https://www.techradar.com/computing/meta-rolls-out-new-meta-ai-website-and-it-might-just-bury-microsoft-and-googles-ai-dreams
- Robert Wolfe, Isaac Slaughter, Bin Han, Bingbing Wen, Yiwei Yang, Lucas Rosenblatt, Bernease Herman, Eva Brown, Zening Qu, Nic Weber, and Bill Howe. 2024. Laboratory-Scale AI: Open-Weight Models are Competitive with ChatGPT Even in Low-Resource Settings. In ACMConference on Fairness, Accountability, and Transparency (ACM FAccT ’24), June 3–6, 2024, Rio de Janeiro, Brazil. ACM, New York, NY, USA, 18 pages. https://doi.org/10.1145/3630106.3658966 https://arxiv.org/pdf/2405.16820
- Michael Nuñez, February 6, 2024, Meet ‘Smaug-72B’: The new king of open-source AI, Venture Beat, https://venturebeat.com/ai/meet-smaug-72b-the-new-king-of-open-source-ai/
- Sharon Machlis, March 28, 2024, 5 easy ways to run an LLM locally, InfoWorld, https://www.infoworld.com/article/3705035/5-easy-ways-to-run-an-llm-locally.html
- Ebtesam Almazrouei, Hamza Alobeidli, Abdulaziz Alshamsi, Alessandro Cappelli, Ruxandra Cojocaru, Mérouane Debbah, Étienne Goffinet, Daniel Hesslow, Julien Launay, Quentin Malartic, Daniele Mazzotta, Badreddine Noune, Baptiste Pannier, Guilherme Penedo, 29 Nov 2023, The Falcon Series of Open Language Models, https://arxiv.org/abs/2311.16867
- Ankit Patel, June 14, 2024, NVIDIA Releases Open Synthetic Data Generation Pipeline for Training Large Language Models, https://blogs.nvidia.com/blog/nemotron-4-synthetic-data-generation-llm-training/
- David Spuler, March 2024, Chapter 5. Design Choices & Architectures, Generative AI in C++: Coding Transformers and LLMs, https://www.amazon.com/dp/B0CXJKCWX9
- Intel, Apr 25, 2024, Deployment of Llama3 on Your AI PC with OpenVINO™, https://medium.com/openvino-toolkit/deployment-of-llama3-on-your-ai-pc-with-openvino-b58e961501d6
- Bin Xiao, Burak Kantarci, Jiawen Kang, Dusit Niyato, Mohsen Guizani, 18 Jun 2024 (v2), Efficient Prompting for LLM-based Generative Internet of Things, https://arxiv.org/abs/2406.10382
- Elizabeth Gibney, 19 June 2024, Not all ‘open source’ AI models are actually open: here’s a ranking, Nature, https://www.nature.com/articles/d41586-024-02012-5
- Liesenfeld, A., Dingemanse, M., 2024, Rethinking open source generative AI: open washing and the EU AI Act, In FAccT '24: Proc. 2024 ACM Conf. on Fairness, Accountability, and Transparency 1774–1787 (ACM, 2024). https://dl.acm.org/doi/10.1145/3630106.3659005
- William Gallagher, Jun 19, 2024, Apple researchers add 20 more open-source models to improve text and image AI, https://appleinsider.com/articles/24/06/19/apple-researchers-add-20-more-open-source-models-to-improve-text-and-image-ai
- Piotr Skalski, June 20, 2024, Florence-2: Open Source Vision Foundation Model by Microsoft, https://blog.roboflow.com/florence-2/
- Waleed Kadous, August 23, 2023, Llama 2 is about as factually accurate as GPT-4 for summaries and is 30X cheaper, https://www.anyscale.com/blog/llama-2-is-about-as-factually-accurate-as-gpt-4-for-summaries-and-is-30x-cheaper Code: https://github.com/anyscale/factuality-eval
- Ben Wodecki, November 16, 2023, Generative AI Projects More Than Triple on GitHub in 2023, https://aibusiness.com/nlp/gen-ai-projects-soar-more-than-triple-on-github
- Valentina Alto, 2024, Chapter 3: Choosing an LLM for Your Application, Building LLM-Powered Applications: Create intelligence apps and agents with large language models, Packt Publishing, https://www.amazon.com/Building-LLM-Apps-Intelligent-Language/dp/1835462316/
- Clement Farabet, Tris Warkentin, Jun 27, 2024 Gemma 2 is now available to researchers and developers, https://blog.google/technology/developers/google-gemma-2/
- Meta, July 23, 2024, Introducing Llama 3.1: Our most capable models to date, https://ai.meta.com/blog/meta-llama-3-1/
- Mark Zuckerberg, July 23, 2024 Open Source AI Is the Path Forward https://about.fb.com/news/2024/07/open-source-ai-is-the-path-forward/
- Vince Lam, Mar 12, 2024, 50+ Open-Source Options for Running LLMs Locally, https://medium.com/thedeephub/50-open-source-options-for-running-llms-locally-db1ec6f5a54f
- Michael Nuñez, July 18, 2024, Groq’s open-source Llama AI model tops leaderboard, outperforming GPT-4o and Claude in function calling, https://venturebeat.com/ai/groq-open-source-llama-ai-model-tops-leaderboard-outperforming-gpt-4o-and-claude-in-function-calling/
- Washington Post, 2024, Meta releases open-source AI model it says rivals OpenAI, Google tech, https://www.washingtonpost.com/technology/2024/07/23/meta-new-ai-llama-open/
- AIM, 2024, Mistral AI Unveils Mistral Large 2, Beats Llama 3.1 on Code and Math, https://analyticsindiamag.com/ai-news-updates/mistral-ai-unveils-mistral-large-2-beats-llama-3-1-on-code-and-math/
- David Linthicum, Aug 02, 2024, Small language models and open source are transforming AI, https://www.infoworld.com/article/3480593/small-language-models-and-open-source-are-transforming-ai.html
- Level Up Coding, Aug 2024, Google open-sources the most powerful small model on the edge: 2B parameters surpass GPT-3.5-Turbo, and Apple 15Pro runs fast, https://levelup.gitconnected.com/google-open-sources-the-most-powerful-small-model-on-the-edge-2b-parameters-surpass-gpt-3-5-turbo-c0b13f96997c
- Michael Nuñez, August 26, 2024, Aleph Alpha unveils EU-compliant AI: A new era for transparent machine learning, https://venturebeat.com/ai/aleph-alpha-unveils-eu-compliant-ai-a-new-era-for-transparent-machine-learning/
- Shubham Sharma, August 29, 2024, Meta leads open-source AI boom, Llama downloads surge 10x year-over-year, https://venturebeat.com/ai/meta-leads-open-source-ai-boom-llama-downloads-surge-10x-year-over-year/
- Chandra Irugalbandara, Ashish Mahendra, Roland Daynauth, Tharuka Kasthuri Arachchige, Jayanaka Dantanarayana, Krisztian Flautner, Lingjia Tang, Yiping Kang, Jason Mars, 16 Apr 2024 (v3), Scaling Down to Scale Up: A Cost-Benefit Analysis of Replacing OpenAI's LLM with Open Source SLMs in Production, https://arxiv.org/abs/2312.14972
- Shrestha, Y.R., von Krogh, G. & Feuerriegel, S., 2023, Building open-source AI. Nat Comput Sci 3, 908–911 (2023). https://doi.org/10.1038/s43588-023-00540-0 https://www.nature.com/articles/s43588-023-00540-0
- Abhinand, Aug 20, 2024, Self-Hosting LLaMA 3.1 70B (or any ~70B LLM) Affordably, https://abhinand05.medium.com/self-hosting-llama-3-1-70b-or-any-70b-llm-affordably-2bd323d72f8d
- David Spuler, March 2024, Open Source Models, in Generative AI in C++, https://www.aussieai.com/book/ch5-open-source-models
- Carl Franzen, September 5, 2024, Meet the new, most powerful open source AI model in the world: HyperWrite’s Reflection 70B, https://venturebeat.com/ai/meet-the-new-most-powerful-open-source-ai-model-in-the-world-hyperwrites-reflection-70b/
- Asif Razzaq, September 5, 2024, Yi-Coder Released by 01.AI: A Powerful Small-Scale Code LLM Series, Delivering Exceptional Performance in Code Generation, Editing, and Long-Context Comprehension, https://www.marktechpost.com/2024/09/05/yi-coder-released-by-01-ai-a-powerful-small-scale-code-llm-series-delivering-exceptional-performance-in-code-generation-editing-and-long-context-comprehension/
- Michael Nuñez, September 16, 2024, SambaNova challenges OpenAI’s o1 model with Llama 3.1-powered demo on HuggingFace, https://venturebeat.com/ai/sambanova-challenges-openais-o1-model-with-llama-3-1-powered-demo-on-huggingface/
- Meta, August 29, 2024, With 10x growth since 2023, Llama is the leading engine of AI innovation https://ai.meta.com/blog/llama-usage-doubled-may-through-july-2024/
- Michael Nuñez, October 1, 2024, Nvidia just dropped a bombshell: Its new AI model is open, massive, and ready to rival GPT-4, https://venturebeat.com/ai/nvidia-just-dropped-a-bombshell-its-new-ai-model-is-open-massive-and-ready-to-rival-gpt-4/
- Wenliang Dai, Nayeon Lee, Boxin Wang, Zhuoling Yang, Zihan Liu, Jon Barker, Tuomas Rintamaki, Mohammad Shoeybi, Bryan Catanzaro, Wei Ping, 17 Sep 2024, NVLM: Open Frontier-Class Multimodal LLMs, NVIDIA, https://arxiv.org/abs/2409.11402 https://huggingface.co/nvidia/NVLM-D-72B https://nvlm-project.github.io/
- Sean Michael Kerner, October 20, 2024, IBM debuts open source Granite 3.0 LLMs for enterprise AI, https://venturebeat.com/ai/ibm-debuts-open-source-granite-3-0-llms-for-enterprise-ai/
- Meta, October 18, 2024, Sharing new research, models, and datasets from Meta FAIR, https://ai.meta.com/blog/fair-news-segment-anything-2-1-meta-spirit-lm-layer-skip-salsa-lingua/
- Matt Marshall, October 24, 2024, The enterprise verdict on AI models: Why open source will win, https://venturebeat.com/ai/the-enterprise-verdict-on-ai-models-why-open-source-will-win/
- Meta, October 24, 2024, Introducing quantized Llama models with increased speed and a reduced memory footprint, https://ai.meta.com/blog/meta-llama-quantized-lightweight-models/
- Xingwu Sun, Yanfeng Chen, Yiqing Huang, Ruobing Xie, Jiaqi Zhu, Kai Zhang, Shuaipeng Li, Zhen Yang, Jonny Han, Xiaobo Shu, Jiahao Bu, (and many more authors), 4 Nov 2024, Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent, https://arxiv.org/abs/2411.02265 https://github.com/Tencent/Hunyuan-Large https://huggingface.co/tencent/Tencent-Hunyuan-Large
- Robert Corwin Nov 2024, Running Large Language Models Privately: A comparison of frameworks, models, and costs, https://towardsdatascience.com/running-large-language-models-privately-a-comparison-of-frameworks-models-and-costs-ac33cfe3a462
- Carl Franzen, October 31, 2024, Meta makes its MobileLLM open for researchers, posting full weights, https://venturebeat.com/ai/meta-makes-its-mobilellm-open-for-researchers-posting-full-weights/
- Jason Perlow, Nov. 6, 2024, The best open-source AI models: All your free-to-use options explained: Here are the best open-source and free-to-use AI models for text, images, and audio, organized by type, application, and licensing considerations. https://www.zdnet.com/article/the-best-open-source-ai-models-all-your-free-to-use-options-explained/
- Chris Wellons, November 10, 2024, Everything I've learned so far about running local LLMs, https://nullprogram.com/blog/2024/11/10/
- Tegan Jones, 6 November, 2024, Open source AI: What it is and why it matters for business. We now have a definition for ‘open source AI’ and that’s important for business owners, especially when big tech doesn’t adhere to it. https://www.smartcompany.com.au/artificial-intelligence/open-source-ai-what-it-is-and-why-it-matters-for-business/
- Qwen Team, November 28, 2024, QwQ: Reflect Deeply on the Boundaries of the Unknown, https://qwenlm.github.io/blog/qwq-32b-preview/
- Ai2, November 26, 2024, OLMo 2: The best fully open language model to date, https://allenai.org/blog/olmo2
- Kyle Wiggers, December 6, 2024, Meta unveils a new, more efficient Llama model, https://techcrunch.com/2024/12/06/meta-unveils-a-new-more-efficient-llama-model/
- Tiernan Ray, Dec. 10, 2024, How Cerebras boosted Meta's Llama to 'frontier model' performance The company also demonstrates initial training of a one-trillion-parameter AI model on a single machine using conventional DDR5 memory chips. https://www.zdnet.com/article/how-cerebras-boosted-metas-llama-to-frontier-model-performance/
- Ben Dickson, December 10, 2024, OpenAI’s o1 model doesn’t show its thinking, giving open source an advantage, https://venturebeat.com/ai/heres-how-openai-o1-might-lose-ground-to-open-source-models/
More AI Research
Read more about: