Aussie AI
Inputs, Outputs and Dimensions
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
Inputs, Outputs and Dimensions
To understand Softmax, let's examine its inputs and outputs in more detail. Overall, Softmax is a “vector-to-vector” algorithm. The input and output vectors have the same dimension, which is the model size.
Softmax is not an “element-wise” vector operation. The change to each element in the vector depends on all the elements in the vector, not only on the current element. For example, it adds up all the elements to use as a scaling factor (after exponentiating them).
The input to Softmax is a vector of floating-point numbers containing logits. These are coming out of the model's calculation layers, and are a rough representation of word probabilities.
However, the input vector is a bit messy.
Firstly, they are in the “log-domain” rather than real probabilities.
In other words, they are the logarithm of the probabilities.
Secondly, the values in the vectors are not “normalized” so there are numbers outside the ranges 0...1
,
including large numbers and negatives.
Thirdly, the numbers also don't nicely add up to 1 like disjoint probabilities should.
The output of Softmax is a beautiful vector that's perfect in every way,
with harps playing softly in the background.
The log-domain has been fixed by exponentiation.
All of the numbers are scaled into 0...1
,
and they all add up to 1 in total like a good probability distribution of disjoint events.
The output vector from Softmax fills every Statistician's heart with joy.
• Next: • Up: Table of Contents |
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |