Aussie AI
Layer Normalization
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
Layer Normalization
Layer normalization or “LayerNorm” was introduced in 2016 as an improvement on BatchNorm (Ba et al., 2016). The idea was to standardize the normalization across an entire layer of the AI engine. The effect is to standardize the scaling parameters for each “batch” in a layer, rather than having different parameters for different batches in a layer. Since then, LayerNorm has largely superseded BatchNorm, and is widely used in Transformers.
The default LayerNorm algorithm had two parameters of “bias” and “gain,” which are the same as for BatchNorm. However, it was found that LayerNorm was better without these parameters, which was originally called “LayerNorm-simple” (Xu et al, 2019).
LayerNorm has been an innate part of most Transformer architectures since it was discovered, and has been continually shown to be important. For example, it helps ensure attention equally across all keys in the forward pass, and smooths the gradients in the backward pass for faster training. However, researchers are still struggling to understand why it works so well, and there are still research papers being written about LayerNorm.
• Next: • Up: Table of Contents |
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |