Aussie AI

Top-k Decoding

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

Top-k Decoding

Top-k decoding is a generalization of greedy decoding, where the output token is chosen from the k tokens with the highest probabilities. Greedy decoding is simply top-k decoding with k=1. Top-k decoding chooses the output token randomly, so this is a stochastic algorithm that intentionally injects unpredictability in order to increase variety and quality of the output.

Top-k is regarded as an improvement over greedy search, largely fixing the “neural text degeneration” problem of repetitive output. However, like greedy decoding, top-k decoding only considers a single token at a time, so it isn't good when there's a better pair of two tokens that should be output at the current situation. Hence, top-k decoding isn't as accurate as beam search decoding with more lookahead.

Example: Top-k Decoding Algorithm: As an example, the basic algorithm for top-50 decoding on a vocabulary size of 50,000 is:

  • Softmax-normalize all the 50,000 logits into probabilities.
  • Top-k sort the array of 50,000 probabilities.
  • Choose the top k items (i.e. the top 50 probabilities).
  • Randomly pick one of these 50 tokens to output.

In the last step, randomly choosing from the top 50 items doesn't just choose each token with a 1-out-of-50 probability. Instead, it uses random choice according to their probability distribution. This makes sure that the randomness is according to the top 50 probabilities, so that higher-probability items in those 50 tokens are more likely to be output.

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++