Aussie AI

What is Length Pruningand?

Book Excerpt from "Generative AI in C++"

by David Spuler, Ph.D.

What is Length Pruning?

Length pruning refers to applying pruning on model dimension corresponding to the user's input sequence and the propagation of that sequence through the model. There is much confusing terminology in the research on length pruning and token pruning, but I have attempted to categorize the main types of pruning along the “lengthwise” model dimension is as follows:

Token pruning
Embeddings pruning
Prompt compression

Other non-pruning AI model optimization techniques that operate on the same “lengthwise” dimension include:

Long context window optimizations
Length generalization
Input padding removal
Batching of multiple prompt queries (without padding)
Attention linearization optimizations
Non-autoregressive optimizations

Length pruning is structured weight pruning on one of the three axes of pruning. The other two axes are width pruning (e.g. attention head pruning) and depth pruning (e.g. layer pruning and early exit). All three types of pruning are mostly orthogonal to each other and can be combined into triple pruning.

• Next:

• Up: Table of Contents

• Buy: Generative AI in C++: Coding Transformers and LLMs

The new AI programming book by Aussie AI co-founders:

AI coding in C++
Transformer engine speedups
LLM models
Phone and desktop AI
Code examples
Research citations

Get your copy from Amazon: Generative AI in C++

Aussie AI

What is Length Pruningand?

What is Length Pruning?

Quick Links

Product

New to Writing?

Writing Styles