Aussie AI
Optimizing Normalization
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
Optimizing Normalization
Although not as much of a bottleneck as MatMul or VecDot operations, the normalization algorithms are quite expensive. They are many-to-many vector operations that apply on a vector of probabilities (activations or logits). There are different normalization algorithms, but they will usually require computation of the min, max, and average, which all require scanning every vector element. And then every element of the vector needs to be scaled, which processes each vector element a second time.
Research has suggested several various ways to speed up the normalization component. They used to ask Statisticians how to double the average function in C++, but that was too mean. Examples of normalization speed improvements include:
- Vectorization (hardware-acceleration with GPU or CPU intrinsics)
- Code optimizations (i.e. just a small matter of coding)
- Normalization alternatives (e.g. MinMax is faster than BatchNorm)
- Normalization approximations
- Integer-only normalization
- Removing normalization (“norm pruning”)
- Placement of normalization blocks (i.e. “pre-norm” vs “post-norm”)
- Fused normalization (i.e., kernel fusion with a prior operation)
Vectorization is top of the list and can be key to performance improvement of normalization. Normalization functions are not usually as significant as MatMul in terms of time cost, but they can still be worth optimizing. A typical normalization requires a multi-scan of all of the elements of the output vectors. And this is done multiple times per token throughout each inference phase.
• Next: • Up: Table of Contents |
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |