Aussie AI

NAS Versus Model Compression

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

There are some parallels between neural architecture search and model compression, especially structured pruning. NAS aims to select the model hyperparameters before or during training, whereas model compression comes in afterwards and changes the model. Some types of structured pruning are very similar to NAS outcomes, such as:

  • Depth pruning (e.g. layer pruning)
  • Width pruning (e.g. head pruning)
  • Length pruning (e.g. token pruning, embedding pruning)

As an example, any type of layer pruning is very similar to NAS choosing the number of layers. If you train your model, choosing a layer number via NAS, and then subsequently layer prune away some of those layers, that's the same as NAS choosing a smaller number of layers. Of course, that's only true for static layer pruning, whereas dynamic layer pruning such as early exiting has other runtime effects.

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++