Aussie AI
Softmax and Temperature
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
Softmax and Temperature
One important use of Softmax is in the decoding step. At the end of each decoder sequence, the Softmax function is used to normalize the logits before processing by a decoding algorithm to choose an output token with the highest probability. As part of this method, the Softmax function is usually changed to a “scaled Softmax” that uses an extra parameter called the “temperature.”
What is the temperature? The purpose of the temperature parameter is as a hyper-parameter that influences the level of randomness or unpredictability in the output. A higher setting for temperature means that the decoder is more likely to output the lower-probability tokens (i.e., it has a fever and says silly stuff). If the temperature is low, the decoder is mostly going to output the highest probability token, meaning it is much less random (like a cold-hearted robot).
What is the value of the temperature? The temperature is a non-zero positive floating-point number that can be between 0 and 1, or can also be greater than 1. A temperature of zero cannot be used as it would cause divide-by-zero errors. If the temperature equals 1.0, it doesn't change the Softmax function at all (i.e. continues harmlessly without scaling). Since the Softmax function is scaled by the reciprocal of the temperature, the effect is to make randomness higher with a larger temperature setting (so it runs “hotter” and gets more “bubbly”). If the temperature is below 1.0, making it a fraction, the effect is to spread out the logits more, which has the effect of reducing randomness of the output. If the temperature is greater than 1.0, this contracts the logits towards each other, making the decoder more likely to choose each of them (although still with some randomness), thereby increasing output randomness.
• Next: • Up: Table of Contents |
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |