Aussie AI
Consensus Decoding
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
Consensus Decoding
Consensus-based decoding involves running the same query on multiple models, and then somehow deciding which answer to use. The key to having a smarter overall model is in deciding which of the models to listen to the most. Should you listen to the loudest one or the quiet achiever sitting in the corner? Various decision methods have been tried to choose the best output:
- Majority decision
- Maximum certainty (highest probability calculated)
- Weighted averages (giving some engines more votes)
There are various pros and cons to the different options. For example, majority decision has a problem if all of the models come up with a different answer. Note that the algorithms for deciding can consider not only a single token output from each model, but multiple vectors of the top-k tokens with their predicted probabilities available.
In the basic consensus architecture, all of the models run to completion, so there isn't a speedup by having a smaller model involved. However, a variation is to add a time-dependent cut-off where models that take too long to complete are excluded. This will be faster on average, but the risk to accuracy in this approach is that the entire architecture ends up always following the smaller models.
• Next: • Up: Table of Contents |
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |