Aussie AI

Easy vs Hard Queries

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

Easy vs Hard Queries

One type of adaptive inference is to use a heuristic to determine whether a user query is easy or hard. An “easy” query can be processed faster using some simpler method or a small model, whereas a “hard” query has to be processed fully by a large model. There are various multi-model ensemble architectures that perform adaptive inference with this type of approach of choosing between two or more models:

  • Model selection algorithms
  • Big-little architectures
  • Mixture-of-Experts (MoE)
  • Speculative decoding

This is not the same as caching, since one of the models is always executed, but the two ideas can be combined (i.e., cache first, then multi-model). These heuristic decision methods are discussed under multi-model methods in Chapter 54.

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++