Aussie AI

AI Milestone Research Papers

  • Last Updated 12 December, 2024
  • by David Spuler, Ph.D.

There are many AI research papers, but some had greater significance than others. This article examines some of the milestones in the history of AI and GPT, as created by various researchers.

Transformer Historical Research Milestones

Original 2017 Transformer Paper from Google: The Transformer architecture was the basis of GPT and later ChatGPT. The code was open-sourced by Google in 2017.

  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin, Attention is all you need, 2017, arXive preprint arXiv:1706.03762. https://arxiv.org/abs/1706.03762

OpenAI's 2018 GPT-1 paper: The first Generative Pre-trained Transformer (GPT) version.

BERT (Bidirectional Encoder Representations from Transformers): an early 2019 Transformer from Google Research:

OpenAI's 2019 GPT-2 paper:

OpenAI's 2020 GPT-3 Research Paper. The paper that started GPT-3 as a very successful mode that underpinned the ChatGPT craze.

  • Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei, Language Models are Few-Shot Learners, OpenAI, July 2020, https://arxiv.org/abs/2005.14165
  • Floridi, L. and Chiriatti, M. (2020). GPT-3: Its nature, scope, limits, and consequences. Minds and Machines, 30:681–694. https://link.springer.com/article/10.1007/s11023-020-09548-1 (An interesting follow-up paper for GPT-3.)

Google Palm in 2022:

  • Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A., Barham, P., Chung, H. W., Sutton, C., Gehrmann, S., et al. (2022). Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311 https://arxiv.org/abs/2204.02311

OpenAI's 2023 GPT-4 Research Paper. Many more of the details were withheld for GPT-4 versus GPT-3.

Meta Facebook Llama 2023 research paper: Meta's first Llama model was released under a non-commercial research-only license.

  • Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample, Meta AI, Feb 2023, LLaMA: Open and Efficient Foundation Language Models, https://arxiv.org/abs/2302.13971

Meta's Llama v2 2023 research paper: Meta has open-sourced Llama v2 from Facebook Research with commercial usage allowed.

  • Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini, Rui Hou, Hakan Inan, Marcin Kardas, Viktor Kerkez, Madian Khabsa, Isabel Kloumann, Artem Korenev, Punit Singh Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Diana Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Todor Mihaylov, Pushkar Mishra, Igor Molybog, Yixin Nie, Andrew Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subramanian, Xiaoqing Ellen Tan, Binh Tang, Ross Taylor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aurelien Rodriguez, Robert Stojnic, Sergey Edunov, Thomas Scialom, Meta AI, July 2023, Llama 2: Open Foundation and Fine-Tuned Chat Models, https://arxiv.org/abs/2307.09288

No doubt many more milestones are still to come...

Specific AI Technical Research Milestones

Quantization optimizations: see the quantization research papers.

Model pruning optimizations: see the many pruning research papers.

InstructGPT 2022 paper: An important part of OpenAI's ChatGPT was how well it followed human instructions.

  • Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, Ryan Lowe, Training language models to follow instructions with human feedback, arXiv preprint arXiv:2203.02155 (2022) https://arxiv.org/abs/2203.02155a

Tokenization with Byte-Pair Encoding: An important early research paper in tokenization:

Shallow decoder architecture: One of many important Transformer optimization research papers:

Knowledge Distillation: The method of training a small model using an already-trained large model. The 2015 paper that coined the term "distillation":

  • Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015. https://arxiv.org/abs/1503.02531

Flash Attention: A fast and popular optimization of attention:

Transformer Survey Papers

Some of the useful survey papers on optimization of Transformers and AI models include:

  • Sehoon Kim, Coleman Hooper, Thanakul Wattanawong, Minwoo Kang, Ruohan Yan, Hasan Genc, Grace Dinh, Qijing Huang, Kurt Keutzer, Michael W. Mahoney, Yakun Sophia Shao, Amir Gholami, Full stack optimization of transformer inference: a survey, Feb 2023, arXiv:2302.14017, https://arxiv.org/abs/2302.14017
  • Full Stack Optimization of Transformer Inference: a Survey. Part 2 on Transformer Optimization, A Paper Overview, https://www.nebuly.com/blog/full-stack-optimization-of-transformer-inference-a-survey-part-2
  • Yi Tay, Mostafa Dehghani, Dara Bahri, and Donald Metzler. Efficient transformers: A survey (v2). arXiv preprint arXiv:2009.06732, 2022, https://arxiv.org/abs/2009.06732
  • Krishna Teja Chitty-Venkata, Sparsh Mittal, Murali Emani, Venkatram Vishwanath, Arun K. Somani, A Survey of Techniques for Optimizing Transformer Inference, 2023, arxiv.org July 2023, https://arxiv.org/abs/2307.07982
  • Ye Lin, Yanyang Li, Tong Xiao, Jingbo Zhu, Bag of Tricks for Optimizing Transformer Efficiency, Findings of the Association for Computational Linguistics: EMNLP 2021, November 2021, https://aclanthology.org/2021.findings-emnlp.357/
  • Amir Gholami, Sehoon Kim, Zhen Dong, Zhewei Yao, Michael W. Mahoney, and Kurt Keutzer. 2021. A Survey of Quantization Methods for Efficient Neural Network Inference. arXiv:2103.13630 [cs], June 2021, http://arxiv.org/abs/2103. 13630 arXiv: 2103.13630, https://arxiv.org/abs/2103.13630
  • Yizeng Han, Gao Huang, Shiji Song, Le Yang, Honghui Wang, Yulin Wang, Dynamic Neural Networks: A Survey, Dec 2021, https://arxiv.org/abs/2102.04906
  • Canwen Xu, Julian McAuley, 2022, A Survey on Model Compression and Acceleration for Pretrained Language Models, https://arxiv.org/abs/2202.07105
  • Jingjing Xu, Wangchunshu Zhou, Zhiyi Fu, Hao Zhou, Lei Li, A Survey on Green Deep Learning, Nov 2021, https://arxiv.org/abs/2111.05193 (Extensive survey paper.)
  • Xunyu Zhu, Jian Li, Yong Liu, Can Ma, Weiping Wang, A Survey on Model Compression for Large Language Models, arXiv preprint arXiv:2308.07633, Aug 2023 https://arxiv.org/abs/2308.07633 (Recent 2023 survey paper on various model compression approaches.)

More Milestone Papers

Significant or "Good" Papers

Various papers judged to be worthwhile reading for various reasons, based on a totally inexact personal judgement.

More AI Research

Read more about: