Aussie AI
Softmax Normalization
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
Softmax Normalization
The Softmax component is used as part of the attention head. It normalizes the output values into proper probabilities (i.e., not negative and not too large), by scaling them using a “sum-of-exponentials” method. This also ensures that all of the distribution sums to one, as probabilities should. See Chapter 25 for more about Softmax.
Some research papers use a different normalization in the attention heads. For example, the “Hardmax” function can be used instead of Softmax, which makes it a different type of distribution that isn't a range of probabilities. Another possibility is the “Sparsemax” function. However, only Softmax has mainstream acceptance in Transformer architectures.
• Next: • Up: Table of Contents |
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |