Aussie AI

What is Depth Pruningand?

Book Excerpt from "Generative AI in C++"

by David Spuler, Ph.D.

What is Depth Pruning?

Depth pruning is removal of layers in Transformers to adjust the “depth” to which computation proceeds. Transformers have a stack of layers in their encoder and/or decoder, which can be “deep” with many layers, or “shallow” with only a few. Layers can be statically pruned from the model file, or skipped at runtime via early exiting.

The most common type of depth pruning is layer pruning, of which the dynamic form is called early exit inference. However, there are other types of depth pruning in non-Transformer architectures, such as cascades in DNNs/CNNs.

Like all types of pruning, depth pruning can be performed statically or dynamically. The main type of dynamic depth pruning is called “early exit” and is one type of dynamic layer pruning, along with layer skipping. Static depth pruning is a type of model compression, such as static layer pruning or layer fusion, where entire layers of weights are removed from the model file.

Types of Depth Pruning. Various subtypes of depth pruning include:

Static layer pruning
Early exit (dynamic layer pruning)
Layer skipping
Layer fusion
Layer reordering
Cascades (in DNNs/CNNs)
Shallow decoder Transformer architecture

There are multiple dimensions along which to prune a model. Depth pruning is orthogonal to pruning in other model dimensions: width pruning, length pruning. As such, depth pruning can be combined with other types of pruning, such as in dual pruning and triple pruning (generally called “multi-dimensional pruning”).

• Next:

• Up: Table of Contents

• Buy: Generative AI in C++: Coding Transformers and LLMs

The new AI programming book by Aussie AI co-founders:

AI coding in C++
Transformer engine speedups
LLM models
Phone and desktop AI
Code examples
Research citations

Get your copy from Amazon: Generative AI in C++

Aussie AI

What is Depth Pruningand?

What is Depth Pruning?

Quick Links

Product

New to Writing?

Writing Styles