Aussie AI

Layer Normalization

Book Excerpt from "Generative AI in C++"

by David Spuler, Ph.D.

Layer Normalization

Layer normalization or “LayerNorm” was introduced in 2016 as an improvement on BatchNorm (Ba et al., 2016). The idea was to standardize the normalization across an entire layer of the AI engine. The effect is to standardize the scaling parameters for each “batch” in a layer, rather than having different parameters for different batches in a layer. Since then, LayerNorm has largely superseded BatchNorm, and is widely used in Transformers.

The default LayerNorm algorithm had two parameters of “bias” and “gain,” which are the same as for BatchNorm. However, it was found that LayerNorm was better without these parameters, which was originally called “LayerNorm-simple” (Xu et al, 2019).

LayerNorm has been an innate part of most Transformer architectures since it was discovered, and has been continually shown to be important. For example, it helps ensure attention equally across all keys in the forward pass, and smooths the gradients in the backward pass for faster training. However, researchers are still struggling to understand why it works so well, and there are still research papers being written about LayerNorm.

• Next:

• Up: Table of Contents

• Buy: Generative AI in C++: Coding Transformers and LLMs

The new AI programming book by Aussie AI co-founders:

AI coding in C++
Transformer engine speedups
LLM models
Phone and desktop AI
Code examples
Research citations

Get your copy from Amazon: Generative AI in C++

Aussie AI

Layer Normalization

Layer Normalization

Quick Links

Product

New to Writing?

Writing Styles