Aussie AI

Dynamic Structured Pruning

Book Excerpt from "Generative AI in C++"

by David Spuler, Ph.D.

Dynamic Structured Pruning

Dynamic pruning refers to pruning of network weights, links, or entire layers at runtime during inference. This differs from “static pruning” that is done offline during training, or in a post-training optimization, to create a modified model. The types of dynamic pruning may include:

Dynamic depth pruning: Skipping of inference of entire layers of the model using an “early exit” of the inference loop. See also depth pruning, layer pruning, layer skipping, layer fusion, and shallow decoders.
Dynamic width pruning: Dynamically reducing the “width” of the model based on the input. See width pruning, attention head pruning, channel pruning, filter pruning.
Dynamic length pruning: Adaptive to the input to modify internal dimensions related to tokens, embeddings, etc. See length pruning, token pruning, embeddings pruning, autoregressive algorithms.

Note that all types of dynamic pruning suffer some extra inference cost in the calculations that decide whether to prune or not. The hope is that the benefit of pruning will exceed the cost of decision logic. For example, choosing an “early exit” criterion for layer pruning will require extra computation at each layer, which is hopefully recouped by skipping layers often enough.

• Next:

• Up: Table of Contents

• Buy: Generative AI in C++: Coding Transformers and LLMs

The new AI programming book by Aussie AI co-founders:

AI coding in C++
Transformer engine speedups
LLM models
Phone and desktop AI
Code examples
Research citations

Get your copy from Amazon: Generative AI in C++

Aussie AI

Dynamic Structured Pruning

Dynamic Structured Pruning

Quick Links

Product

New to Writing?

Writing Styles