Aussie AI

Neural Architecture Search

  • Last Updated 7 December, 2024
  • by David Spuler, Ph.D.

Neural Architecture Search (NAS) is the very fancy way that AI researchers say things like this: how big should I make the model? How many weights? How many layers? What vocabulary?

Why NAS?

Choosing these numbers is actually a very hard problem. In the early days, these choices were done either randomly or by trial-and-error, which is expensive when you're talking about GPUs. If you go too large, then the model is over-parameterized and unnecessarily expensive. Go too low, and the model won't be very accurate, or might not even work at all. Hence, a large body of research on "NAS" has developed about systematic ways to find optimal sizes of the models on the various dimensions.

The biggest number is how many billions of weights the model should use, but this is actually dependent on a number of other numeric sizes. These weights are called "parameters" and the various other sizes are called "hyper-parameters" of the model, so NAS is also sometimes called "Hyper-Parameter Optimization" (HPO). The sizes and dimensions of models that NAS aims to determine includes:

  • Number of layers
  • Embedding size
  • Vocabulary size
  • Number of attention heads
  • Context size

NAS versus Model Compression

There are some parallels between neural architecture search and model compression, especially structural pruning. NAS aims to select the model hyperparamters before or during training, whereas model compression comes in afterwards and changes the model. Some types of pruning are very similar to NAS outcomes, such as:

As an example, any type of layer pruning is very similar to NAS choosing the number of layers. If you train your model, choosing a layer number via NAS, and then subsequently layer prune away some of those layers, that's the same as NAS choosing a smaller number of layers. Of course, that's only true for static layer pruning, whereas dynamic layer pruning such as early exiting has other runtime effects.

Survey Papers on NAS

Some of the review and survey papers on NAS:

General Research Papers on NAS

Some of the research papers on NAS:

This is not the full list of papers, I add with reasonable certainty, given that one survey paper stated there have been over 1,000 papers written on NAS since 2021. If this is your chosen dissertation topic, better start writing that lit review section early!

NAS and Dynamic Inference Optimization

Dynamic NAS is not yet a mainstream use of NAS searching. NAS has traditionally been applied to finding models without regard to dynamic approaches. An emerging area of research is to consider the hyperparameters of dynamic inference optimizations as part of searching the problem space for an optimal model.

Research papers on "dynamic NAS" include:

  • Matteo Gambella, Manuel Roveri, "EDANAS: Adaptive Neural Architecture Search for Early Exit Neural Networks", 2023 International Joint Conference on Neural Networks (IJCNN), pp.1-8, 2023. https://ieeexplore.ieee.org/document/10191876 (NAS applied to early-exit dynamic inference.)
  • Chakkrit Termritthikun, Yeshi Jamtsho, Jirarat Ieamsaard, Paisarn Muneesawang, Ivan Lee, 2021, EEEA-Net: An Early Exit Evolutionary Neural Architecture Search, Engineering Applications of Artificial Intelligence Volume 104, September 2021, 104397, https://www.sciencedirect.com/science/article/abs/pii/S0952197621002451, https://arxiv.org/abs/2108.06156, Code: https://github.com/chakkritte/EEEA-Net (A 2021 paper on NAS applied to early-exit.)
  • KT Chitty-Venkata, Y Bian, M Emani, V Vishwanath, Jan 2023 Differentiable Neural Architecture, Mixed Precision and Accelerator Co-search, IEEE Access, DOI:10.1109/ACCESS.2023.3320133, PDF: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10266308
  • Linnan Wang, Chenhan Yu, Satish Salian, Slawomir Kierat, Szymon Migacz, Alex Fit Florea, 2022, GPUNet: Searching the Deployable Convolution Neural Networks for GPUs, https://arxiv.org/abs/2205.00841 (A general NAS system that could be applied statically or dynamically.)
  • Matteo Gambella, Jary Pomponi, Simone Scardapane, Manuel Roveri, 24 Jan 2024, NACHOS: Neural Architecture Search for Hardware Constrained Early Exit Neural Networks, https://arxiv.org/abs/2401.13330
  • David Spuler, March 2024, Chapter 56. Neural Architecture Search, Generative AI in C++: Coding Transformers and LLMs, https://www.amazon.com/dp/B0CXJKCWX9
  • Angie Boggust, Venkatesh Sivaraman, Yannick Assogba, Donghao Ren, Dominik Moritz, Fred Hohman, 6 Aug 2024, Compress and Compare: Interactively Evaluating Efficiency and Behavior Across ML Model Compression Experiments, https://arxiv.org/abs/2408.03274
  • Hongrong Cheng, Miao Zhang, Javen Qinfeng Shi, 9 Aug 2024 (v2), A Survey on Deep Neural Network Pruning-Taxonomy, Comparison, Analysis, and Recommendations, IEEE Transactions on Pattern Analysis and Machine Intelligence, doi: 10.1109/TPAMI.2024.3447085, https://arxiv.org/abs/2308.06767 https://ieeexplore.ieee.org/abstract/document/10643325
  • Akhiad Bercovich, Tomer Ronen, Talor Abramovich, Nir Ailon, Nave Assaf, Mohammad Dabbah, Ido Galil, Amnon Geifman, Yonatan Geifman, Izhak Golan, Netanel Haber, Ehud Karpas, Itay Levy, Shahar Mor, Zach Moshe, Najeeb Nabwani, Omri Puny, Ran Rubin, Itamar Schen, Ido Shahaf, Oren Tropp, Omer Ullman Argov, Ran Zilberstein, Ran El-Yaniv, 28 Nov 2024, Puzzle: Distillation-Based NAS for Inference-Optimized LLMs,NVIDIA Research, https://arxiv.org/abs/2411.19146 (This is dynamic NAS on a vast scale in a search space of size 10^138, because the optimization is applied with low granularity to each block in attention and FFN subcomponents of each layer.)
  • Armin W. Thomas, Rom Parnichkun, Alexander Amini, Stefano Massaroli and Michael Poli, Liquid AI, December 2, 2024, Automated Architecture Synthesis via Targeted Evolution, https://arxiv.org/abs/2411.17800 https://www.liquid.ai/research/automated-architecture-synthesis-via-targeted-evolution

More AI Research

Read more about: