Aussie AI
Vector-Level Pruning
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
Vector-Level Pruning
There is an intermediate type of pruning between low-level unstructured magnitude pruning of random weights and structured pruning of components at a high-level. This is a research-only technique so far, but involves looking at the vector level. The idea is to skip individual vector dot product computations.
It is even possible to do the pruning at an even lower level of “sub-vector pruning” where we look at the vector sub-segments that are sent to the GPU in parallel. If our model size is 4096, then we might not be sending all 4096 vector elements to the GPU at once, but they are split into sub-vectors. If we can skip an entire sub-vector computation often enough, that's a win.
Static vector pruning. Obviously, we can prune an entire vector or sub-vector if unstructured magnitude pruning happens to prune a vector to all-zeros. And at high sparsity of 80% or 90%, this will occur sometimes. This type of “static vector pruning” optimization can be detected offline in analysis of the weights, or as an optimized node in an ML compiler.
Dynamic vector pruning. Researchers have gone beyond this simplistic method with dynamic vector pruning. There are various ways to dynamically determine which vectors are not having any effect, and prune them from current or future computations. This optimization involves detecting vectors that result in zero dot products, or near-zero results. Also possible is “negative skipping” where we prune vectors that often result in negative dot product values, if they would then be zeroed by the RELU activation function. These ideas are promising and there remains much research opportunity here. See Chapter 50 for papers on zero skipping and negative skipping.
• Next: • Up: Table of Contents |
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |