Aussie AI

What is Decodingand?

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

What is Decoding?

The decoding algorithm is the method whereby the decoder emits tokens for the output message. At the end of each decoder sequence, the output from the final layer is a vector of “logits” with predicted probabilities of the best token. The algorithm by which the decoder decides to output one token, or multiple tokens, and which ones, is called the “decoding algorithm.”

The decoding algorithm is a relatively simple piece of code. There are no tensors or matrix multiplications involved. The input is a vector of probabilities, and the output is a single token. In some advanced decoding algorithms, it is possible to output multiple tokens at once, but the basic method is to output only a single token, and then go around again to start predicting the next one.

Note that the output of the decoding algorithm is a sequence of tokens, emitted one number at a time. To actually output the text from that, you need to “untokenize” or “decode” the tokens into printable letters. Some special internal-use tokens might also need to be removed from the output, rather than shown to users. This is discussed in the tokenization chapter.

There are two main classes of decoding algorithm:

  • Autoregressive decoding
  • Parallel decoding (non-autoregressive)

There are several possible decoding algorithms:

  • Greedy decoding
  • Top-k sampling
  • Top-p sampling
  • Beam search decoding
  • Aggressive decoding

Multi-model decoding algorithms have also been examined where two or more AI engines assist in choosing words:

  • Speculative decoding
  • Supervised decoding (“big-little” architectures)
  • Ensemble decoding

The decoding algorithm may also be combined with other optimizations that improve the decoding process, such as:

  • Non-autoregressive decoding
  • Token pruning
  • Prompt compression

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++