Aussie AI

NAS Versus Model Compression

Book Excerpt from "Generative AI in C++"

by David Spuler, Ph.D.

There are some parallels between neural architecture search and model compression, especially structured pruning. NAS aims to select the model hyperparameters before or during training, whereas model compression comes in afterwards and changes the model. Some types of structured pruning are very similar to NAS outcomes, such as:

Depth pruning (e.g. layer pruning)
Width pruning (e.g. head pruning)
Length pruning (e.g. token pruning, embedding pruning)

As an example, any type of layer pruning is very similar to NAS choosing the number of layers. If you train your model, choosing a layer number via NAS, and then subsequently layer prune away some of those layers, that's the same as NAS choosing a smaller number of layers. Of course, that's only true for static layer pruning, whereas dynamic layer pruning such as early exiting has other runtime effects.

• Next:

• Up: Table of Contents

• Buy: Generative AI in C++: Coding Transformers and LLMs

The new AI programming book by Aussie AI co-founders:

AI coding in C++
Transformer engine speedups
LLM models
Phone and desktop AI
Code examples
Research citations

Get your copy from Amazon: Generative AI in C++

Aussie AI

NAS Versus Model Compression

Quick Links

Product

New to Writing?

Writing Styles