Aussie AI
What is Softmaxand?
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
What is Softmax?
Softmax is a relatively simple component of the Transformer. There are no tensors or matrix multiplications. All it does is operate on a vector of numbers and change the numbers. It is a type of “normalization” (like BatchNorm or LayerNorm in the prior chapter) but Softmax is used for many different reasons in a Transformer.
The purpose of Softmax is to take a set of values in a vector of calculated values, and normalize them into probabilities in a new output vector. After Softmax, the output vector contains a new normalized set of values which all add up to 1, and they are intended to represent probabilities of the likelihood of each token/word associated with each vector element.
The Softmax algorithm is basically:
- Exponentiate each vector element.
- Add up the exponentials.
- Divide every vector element by this sum.
In fewer words, scale by the sum of the exponentials.
So, why do we need all the exponentials? The idea is that the input vector contains “logits,” which are logarithms of probabilities, so we are exponentiating each one to bring it out of log-domain into real-domain. Then with the division step, we are normalizing them all so that they are probabilities that total exactly 1.
• Next: • Up: Table of Contents |
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |