Aussie AI
Top-k Decoding
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
Top-k Decoding
Top-k decoding is a generalization of greedy decoding, where the output token is chosen from the k tokens with the highest probabilities. Greedy decoding is simply top-k decoding with k=1. Top-k decoding chooses the output token randomly, so this is a stochastic algorithm that intentionally injects unpredictability in order to increase variety and quality of the output.
Top-k is regarded as an improvement over greedy search, largely fixing the “neural text degeneration” problem of repetitive output. However, like greedy decoding, top-k decoding only considers a single token at a time, so it isn't good when there's a better pair of two tokens that should be output at the current situation. Hence, top-k decoding isn't as accurate as beam search decoding with more lookahead.
Example: Top-k Decoding Algorithm: As an example, the basic algorithm for top-50 decoding on a vocabulary size of 50,000 is:
- Softmax-normalize all the 50,000 logits into probabilities.
- Top-k sort the array of 50,000 probabilities.
- Choose the top k items (i.e. the top 50 probabilities).
- Randomly pick one of these 50 tokens to output.
In the last step, randomly choosing from the top 50 items doesn't just choose each token with a 1-out-of-50 probability. Instead, it uses random choice according to their probability distribution. This makes sure that the randomness is according to the top 50 probabilities, so that higher-probability items in those 50 tokens are more likely to be output.
• Next: • Up: Table of Contents |
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |