Aussie AI

What is Adaptive Inferenceand?

Book Excerpt from "Generative AI in C++"

by David Spuler, Ph.D.

What is Adaptive Inference?

The default execution of AI inference is a brute-force computation using all the weights. The same huge computation is done over-and-over, repeatedly for each token, regardless of what's in the user input string.

Adaptive inference tries to shake that up by adding dynamic choices to this simple algorithm, so that the model uses different computations for different user inputs. The method adds various dynamic tests that change how the computations progress, rather than brute-force of everything.

The first thing to understand about adaptive inference, is that it is not the default. Although AI engines produce different outputs according to different prompts, the steps they go through are largely fixed. Each encoder or decoder runs through a fix number of layers, with fixed sets of precomputed weights from the model file, where all of these weights are used in a brute-force computation. There's only a small amount of variability in the decoding algorithm to create some creativity in responses (e.g. randomly picking from the top-50 possible words).

Although it's a huge amount of runtime computation, there's something about the whole inference algorithm that is inherently static. As I've said before, it's as if the code has no “if” statements, and always goes through a fixed sequence of steps. With adaptive inference methods, the AI engine modifies its inference algorithm to operate differently in ways that depend on the user's input prompt.

• Next:

• Up: Table of Contents

• Buy: Generative AI in C++: Coding Transformers and LLMs

The new AI programming book by Aussie AI co-founders:

AI coding in C++
Transformer engine speedups
LLM models
Phone and desktop AI
Code examples
Research citations

Get your copy from Amazon: Generative AI in C++

Aussie AI

What is Adaptive Inferenceand?

What is Adaptive Inference?

Quick Links

Product

New to Writing?

Writing Styles