Aussie AI
Easy vs Hard Queries
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
Easy vs Hard Queries
One type of adaptive inference is to use a heuristic to determine whether a user query is easy or hard. An “easy” query can be processed faster using some simpler method or a small model, whereas a “hard” query has to be processed fully by a large model. There are various multi-model ensemble architectures that perform adaptive inference with this type of approach of choosing between two or more models:
- Model selection algorithms
- Big-little architectures
- Mixture-of-Experts (MoE)
- Speculative decoding
This is not the same as caching, since one of the models is always executed, but the two ideas can be combined (i.e., cache first, then multi-model). These heuristic decision methods are discussed under multi-model methods in Chapter 54.
• Next: • Up: Table of Contents |
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |