Aussie AI

Types of Structured Pruning

Book Excerpt from "Generative AI in C++"

by David Spuler, Ph.D.

Types of Structured Pruning

Pick a structure, any structure. Open up the standard vanilla Transformer research paper (Vaswani, et. al., 2017) and find the diagram of the architecture. Close your eyes, and poke your finger somewhere in that diagram. Open your eyes again. I can show you research papers on pruning of whatever structure you're pointing at, and sometimes hundreds of papers (e.g. early exit).

There's an odd thing, though: none of those types of structured pruning have gone mainstream. The vast majority of pruning capabilities in open source frameworks are simply for training-based unstructured pruning, such as magnitude pruning or movement pruning. I find this surprising since several of the structured pruning techniques show significant efficiency gains with modest loss of model accuracy.

The main types of structured pruning with significant research papers are:

Layer pruning
Early exit (i.e., dynamic layer pruning)
Attention head pruning
Channel pruning
Filter pruning
Token pruning

Some of the less commonly pruned Transformer components include:

Bias pruning
Embeddings pruning
FFN pruning
Normalization pruning
Softmax pruning
Positional encoding pruning

Did I miss any?

There are also some other notable techniques with the same goal of reducing the total number of weights, with some similarity to pruning:

Parameter sharing and layer fusion
Low-rank matrices

Smaller matrices have fewer weights, so another technique is to cut weights by using smaller matrices. Advanced matrix algebra can be used to factorize the large matrices into smaller “low-rank” matrices, with fewer rows and columns (hence, less weights). This idea applied to tensors is called “tensor decomposition.”

• Next:

• Up: Table of Contents

• Buy: Generative AI in C++: Coding Transformers and LLMs

The new AI programming book by Aussie AI co-founders:

AI coding in C++
Transformer engine speedups
LLM models
Phone and desktop AI
Code examples
Research citations

Get your copy from Amazon: Generative AI in C++

Aussie AI

Types of Structured Pruning

Types of Structured Pruning

Quick Links

Product

New to Writing?

Writing Styles