Aussie AI

Consensus Decoding

Book Excerpt from "Generative AI in C++"

by David Spuler, Ph.D.

Consensus Decoding

Consensus-based decoding involves running the same query on multiple models, and then somehow deciding which answer to use. The key to having a smarter overall model is in deciding which of the models to listen to the most. Should you listen to the loudest one or the quiet achiever sitting in the corner? Various decision methods have been tried to choose the best output:

Majority decision
Maximum certainty (highest probability calculated)
Weighted averages (giving some engines more votes)

There are various pros and cons to the different options. For example, majority decision has a problem if all of the models come up with a different answer. Note that the algorithms for deciding can consider not only a single token output from each model, but multiple vectors of the top-k tokens with their predicted probabilities available.

In the basic consensus architecture, all of the models run to completion, so there isn't a speedup by having a smaller model involved. However, a variation is to add a time-dependent cut-off where models that take too long to complete are excluded. This will be faster on average, but the risk to accuracy in this approach is that the entire architecture ends up always following the smaller models.

• Next:

• Up: Table of Contents

• Buy: Generative AI in C++: Coding Transformers and LLMs

The new AI programming book by Aussie AI co-founders:

AI coding in C++
Transformer engine speedups
LLM models
Phone and desktop AI
Code examples
Research citations

Get your copy from Amazon: Generative AI in C++

Aussie AI

Consensus Decoding

Consensus Decoding

Quick Links

Product

New to Writing?

Writing Styles