Aussie AI

Attention Head Pruning

Book Excerpt from "Generative AI in C++"

by David Spuler, Ph.D.

Attention Head Pruning

Attention head pruning is a type of model width pruning where some of the less important heads are removed. Research has shown that some of the attention heads are more important than others, and there is some redundancy that can be removed.

Attention head pruning is a type of “width pruning” of a model (see Chapter 48). The pruning can be done statically as a type of model compression, or dynamically depending on the user's inputs.

• Next:

• Up: Table of Contents

• Buy: Generative AI in C++: Coding Transformers and LLMs

The new AI programming book by Aussie AI co-founders:

AI coding in C++
Transformer engine speedups
LLM models
Phone and desktop AI
Code examples
Research citations

Get your copy from Amazon: Generative AI in C++

Aussie AI

Attention Head Pruning

Attention Head Pruning

Quick Links

Product

New to Writing?

Writing Styles