Aussie AI

Norm Pruning

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

Norm Pruning

Some research has suggested that the normalization layer in Transformers can be removed without a major loss in model accuracy, which has been called “norm pruning.” If possible, nothing could be faster than this!

The concern with this approach is two-fold. Firstly, whether non-normalization will give rise to outliers that distort the accuracy of the output. Secondly, the practical matter of ensuring the computations don't cause a floating-point overflow into +Inf, -Inf or NaN. We could add some operations that ensure we don't have outliers and avoid those nasty floating-point oddities, but, umm, that's actually normalization, so we're just adding it back in again! Maybe it's faster to do a type of “mini-normalization” that fixes up some of these issues without fully normalizing every value. It's a little unclear, since these “norm pruning” approaches are only seen in research papers so far.

Even if we cannot remove them all, it is an important design decision how often we need to normalize. It's not a cheap operation, so we shouldn't re-normalize after every Transformer component. However, the typical Transformer architectures tend to use the normalization blocks quite heavily, in one way or another.

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++