Aussie AI

Norm Pruning

Book Excerpt from "Generative AI in C++"

by David Spuler, Ph.D.

Norm Pruning

Some research has suggested that the normalization layer in Transformers can be removed without a major loss in model accuracy, which has been called “norm pruning.” If possible, nothing could be faster than this!

The concern with this approach is two-fold. Firstly, whether non-normalization will give rise to outliers that distort the accuracy of the output. Secondly, the practical matter of ensuring the computations don't cause a floating-point overflow into +Inf, -Inf or NaN. We could add some operations that ensure we don't have outliers and avoid those nasty floating-point oddities, but, umm, that's actually normalization, so we're just adding it back in again! Maybe it's faster to do a type of “mini-normalization” that fixes up some of these issues without fully normalizing every value. It's a little unclear, since these “norm pruning” approaches are only seen in research papers so far.

Even if we cannot remove them all, it is an important design decision how often we need to normalize. It's not a cheap operation, so we shouldn't re-normalize after every Transformer component. However, the typical Transformer architectures tend to use the normalization blocks quite heavily, in one way or another.

• Next:

• Up: Table of Contents

• Buy: Generative AI in C++: Coding Transformers and LLMs

The new AI programming book by Aussie AI co-founders:

AI coding in C++
Transformer engine speedups
LLM models
Phone and desktop AI
Code examples
Research citations

Get your copy from Amazon: Generative AI in C++

Aussie AI

Norm Pruning

Norm Pruning

Quick Links

Product

New to Writing?

Writing Styles