Aussie AI

Top-k Decoding

Book Excerpt from "Generative AI in C++"

by David Spuler, Ph.D.

Top-k Decoding

Top-k decoding is a generalization of greedy decoding, where the output token is chosen from the k tokens with the highest probabilities. Greedy decoding is simply top-k decoding with k=1. Top-k decoding chooses the output token randomly, so this is a stochastic algorithm that intentionally injects unpredictability in order to increase variety and quality of the output.

Top-k is regarded as an improvement over greedy search, largely fixing the “neural text degeneration” problem of repetitive output. However, like greedy decoding, top-k decoding only considers a single token at a time, so it isn't good when there's a better pair of two tokens that should be output at the current situation. Hence, top-k decoding isn't as accurate as beam search decoding with more lookahead.

Example: Top-k Decoding Algorithm: As an example, the basic algorithm for top-50 decoding on a vocabulary size of 50,000 is:

Softmax-normalize all the 50,000 logits into probabilities.
Top-k sort the array of 50,000 probabilities.
Choose the top k items (i.e. the top 50 probabilities).
Randomly pick one of these 50 tokens to output.

In the last step, randomly choosing from the top 50 items doesn't just choose each token with a 1-out-of-50 probability. Instead, it uses random choice according to their probability distribution. This makes sure that the randomness is according to the top 50 probabilities, so that higher-probability items in those 50 tokens are more likely to be output.

• Next:

• Up: Table of Contents

• Buy: Generative AI in C++: Coding Transformers and LLMs

The new AI programming book by Aussie AI co-founders:

AI coding in C++
Transformer engine speedups
LLM models
Phone and desktop AI
Code examples
Research citations

Get your copy from Amazon: Generative AI in C++

Aussie AI

Top-k Decoding

Top-k Decoding

Quick Links

Product

New to Writing?

Writing Styles