Aussie AI

Why is Normalization Neededand?

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

Why is Normalization Needed?

To simplify the issue considerably, we don't want to have the interim “activations” (similar to logits numbers) getting too big or too negative, because overflowing into Inf or NaN would be like gimbal lock: hard to reverse and continuing onwards.

When a set of probabilities runs through a Transformer layer, the outputs are modified by the weights, which can be positive to increase the probability of outputting a token, or negative to decrease the probability (or zero if there's nothing to say). The results of dot products can therefore be numbers that are positive or negative, because of the various allowed weight values combined with the incoming probability vector.

If the input vectors contain any large positive or negative values, then these can get amplified by more weights in the dot product computation. Hence, if we allow this to happen repeatedly, across multiple Transformer layers, the magnitude of numbers can increase exponentially. This hampers training's calculations of gradients and also the risk increases of some type of overflow (e.g. to +Inf, -Inf or NaN). Normalization is therefore used at each layer to “re-normalize” the numbers to a more reasonable range, thereby avoiding problems with overflow at the positive or negative ends.

Overall, it works a lot better if each component and each layer is guaranteed that its inputs will be “reasonable” and in a “normalized” range of values (i.e. 0..1). Hence, Transformer layers typically have a normalization component that acts on the inputs prior to each layer, and other points between the layer components.

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++