Aussie AI

Neural Architecture Search

  • Last Updated 25 April, 2026
  • by David Spuler, Ph.D.

What is NAS?

Neural Architecture Search (NAS) is the very fancy way that AI researchers say things like this: how big should I make the model? How many weights? How many layers? What vocabulary?

Why NAS?

Choosing these numbers is actually a very hard problem. In the early days, these choices were done either randomly or by trial-and-error, which is expensive when you're talking about GPUs. If you go too large, then the model is over-parameterized and unnecessarily expensive. Go too low, and the model won't be very accurate, or might not even work at all. Hence, a large body of research on "NAS" has developed about systematic ways to find optimal sizes of the models on the various dimensions.

The biggest number is how many billions of weights the model should use, but this is actually dependent on a number of other numeric sizes. These weights are called "parameters" and the various other sizes are called "hyper-parameters" of the model, so NAS is also sometimes called "Hyper-Parameter Optimization" (HPO). The sizes and dimensions of models that NAS aims to determine includes:

  • Number of layers
  • Embedding size
  • Vocabulary size
  • Number of attention heads
  • Context size

Neural Architecture Search (NAS): Book Excerpts and Blog Articles

Free online book excerpts with full text chapters online and free PDF downloads, and the Aussie AI blog, including related articles:

NAS versus Model Compression

There are some parallels between neural architecture search and model compression, especially structural pruning. NAS aims to select the model hyperparameters before or during training, whereas model compression comes in afterwards and changes the model. Some types of pruning are very similar to NAS outcomes, such as:

As an example, any type of layer pruning is very similar to NAS choosing the number of layers. If you train your model, choosing a layer number via NAS, and then subsequently layer prune away some of those layers, that's the same as NAS choosing a smaller number of layers. Of course, that's only true for static layer pruning, whereas dynamic layer pruning such as early exiting has other runtime effects.

Survey Papers on NAS

Some of the review and survey papers on NAS:

General Research Papers on NAS

Some of the research papers on NAS:

This is not the full list of papers, I add with reasonable certainty, given that one survey paper stated there have been over 1,000 papers written on NAS since 2021. If this is your chosen dissertation topic, better start writing that lit review section early!

NAS and Dynamic Inference Optimization

Dynamic NAS is not yet a mainstream use of NAS searching. NAS has traditionally been applied to finding models without regard to dynamic approaches. An emerging area of research is to consider the hyperparameters of dynamic inference optimizations as part of searching the problem space for an optimal model.

Research papers on "dynamic NAS" include:

AI Books from Aussie AI



The Sweetest Lesson: Your Brain Versus AI The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
  • Your brain is 50 times bigger than the best AI engines.
  • Truly intelligent AI will require more compute!
  • Another case of the bitter lesson?
  • Maybe it's the opposite of that: the sweetest lesson.

Get your copy from Amazon: The Sweetest Lesson



RAG Optimization RAG Optimization: Accurate and Efficient LLM Applications: new book on RAG architectures:
  • Smarter RAG
  • Faster RAG
  • Cheaper RAG
  • Agentic RAG
  • RAG reasoning

Get your copy from Amazon: RAG Optimization



Generative AI in C++ Generative AI Applications book:
  • Deciding on your AI project
  • Planning for success and safety
  • Designs and LLM architectures
  • Expediting development
  • Implementation and deployment

Get your copy from Amazon: Generative AI Applications



Generative AI in C++ Generative AI programming book:
  • Generative AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++



CUDA C++ Optimization CUDA C++ Optimization book:
  • Faster CUDA C++ kernels
  • Optimization tools & techniques
  • Compute optimization
  • Memory optimization

Get your copy from Amazon: CUDA C++ Optimization



CUDA C++ Optimization CUDA C++ Debugging book:
  • Debugging CUDA C++ kernels
  • Tools & techniques
  • Self-testing & reliability
  • Common GPU kernel bugs

Get your copy from Amazon: CUDA C++ Debugging

More AI Research

Read more about: