Aussie AI

Softmax Normalization

Book Excerpt from "Generative AI in C++"

by David Spuler, Ph.D.

Softmax Normalization

The Softmax component is used as part of the attention head. It normalizes the output values into proper probabilities (i.e., not negative and not too large), by scaling them using a “sum-of-exponentials” method. This also ensures that all of the distribution sums to one, as probabilities should. See Chapter 25 for more about Softmax.

Some research papers use a different normalization in the attention heads. For example, the “Hardmax” function can be used instead of Softmax, which makes it a different type of distribution that isn't a range of probabilities. Another possibility is the “Sparsemax” function. However, only Softmax has mainstream acceptance in Transformer architectures.

• Next:

• Up: Table of Contents

• Buy: Generative AI in C++: Coding Transformers and LLMs

The new AI programming book by Aussie AI co-founders:

AI coding in C++
Transformer engine speedups
LLM models
Phone and desktop AI
Code examples
Research citations

Get your copy from Amazon: Generative AI in C++

Aussie AI

Softmax Normalization

Softmax Normalization

Quick Links

Product

New to Writing?

Writing Styles