Aussie AI
What is Structured Pruningand?
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
What is Structured Pruning?
Structured pruning is removal of whole “structures” in a model. For example, “layer pruning” removes whole layers, and “attention head pruning” removes attention heads. This is different from unstructured pruning, which randomly removes the smaller weights no matter where they are, but many of the goals are the same:- Smaller model (model compression)
- Reduced memory usage
- Faster inference
Structured pruning differs from unstructured pruning (e.g. magnitude pruning) in that we don't care about the values of the weights. All of the weights in a pruned structure are removed, regardless of their magnitude.
That being said, we might analyze the weights to decide which structure to prune, in some types of structured pruning algorithms. So, the value of the weights may be considered in the pruning decision, but once we've decided to prune a particular structure from the model, then all of its weights are gone.
However, generally speaking, most of the research papers use more sophisticated decision making. The number of zero or tiny weights in a structure is a fixed static metric that isn't very useful. If doing static structural pruning, it is more powerful to instrument tests of inference execution, so as to determine which of the structures are being most used in determining inference results, and pruning any structures that aren't pulling their weight. For dynamic structural pruning, there are various algorithms to decide which structures to skip for a particular user query.
• Next: • Up: Table of Contents |
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |