Aussie AI

What is Adaptive Inferenceand?

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

What is Adaptive Inference?

The default execution of AI inference is a brute-force computation using all the weights. The same huge computation is done over-and-over, repeatedly for each token, regardless of what's in the user input string.

Adaptive inference tries to shake that up by adding dynamic choices to this simple algorithm, so that the model uses different computations for different user inputs. The method adds various dynamic tests that change how the computations progress, rather than brute-force of everything.

The first thing to understand about adaptive inference, is that it is not the default. Although AI engines produce different outputs according to different prompts, the steps they go through are largely fixed. Each encoder or decoder runs through a fix number of layers, with fixed sets of precomputed weights from the model file, where all of these weights are used in a brute-force computation. There's only a small amount of variability in the decoding algorithm to create some creativity in responses (e.g. randomly picking from the top-50 possible words).

Although it's a huge amount of runtime computation, there's something about the whole inference algorithm that is inherently static. As I've said before, it's as if the code has no “if” statements, and always goes through a fixed sequence of steps. With adaptive inference methods, the AI engine modifies its inference algorithm to operate differently in ways that depend on the user's input prompt.

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++