Aussie AI

Easy vs Hard Queries

Book Excerpt from "Generative AI in C++"

by David Spuler, Ph.D.

Easy vs Hard Queries

One type of adaptive inference is to use a heuristic to determine whether a user query is easy or hard. An “easy” query can be processed faster using some simpler method or a small model, whereas a “hard” query has to be processed fully by a large model. There are various multi-model ensemble architectures that perform adaptive inference with this type of approach of choosing between two or more models:

Model selection algorithms
Big-little architectures
Mixture-of-Experts (MoE)
Speculative decoding

This is not the same as caching, since one of the models is always executed, but the two ideas can be combined (i.e., cache first, then multi-model). These heuristic decision methods are discussed under multi-model methods in Chapter 54.

• Next:

• Up: Table of Contents

• Buy: Generative AI in C++: Coding Transformers and LLMs

The new AI programming book by Aussie AI co-founders:

AI coding in C++
Transformer engine speedups
LLM models
Phone and desktop AI
Code examples
Research citations

Get your copy from Amazon: Generative AI in C++

Aussie AI

Easy vs Hard Queries

Easy vs Hard Queries

Quick Links

Product

New to Writing?

Writing Styles