Aussie AI

AI Models are Static

Book Excerpt from "Generative AI in C++"

by David Spuler, Ph.D.

AI Models are Static

An AI model is inherently static after it's been trained and fine-tuned, and this characteristic offers many opportunities for “offline” speedups. At the highest level there are the model compression optimizations (e.g. quantization, pruning) that create a smaller model file. In addition, some of the other model meta-parameters also have a significant impact on what the C++ compiler can do.

Internal model dimension — i.e. the “embedding size”
Context window size — maximum input token length
Number of layers — depth of the model

These are all constant for both training and inference. It is strongly recommended that you use these parameters to create a model-specific C++ engine that is specialized for this particular model, rather than a generalized AI engine that can handle multiple model sizes. In simpler terms, make all of these meta-parameters as “const” in your code and turn the optimizer up to eleven.

Anywhere in the C++ kernels that these numbers are used gives the optimizer an opportunity to make smarter efficiency choices. These optimizations range from full auto-vectorization of loops into parallel execution if the compiler can see that they are a fixed length, to the simpler arithmetic strength reduction optimizations, such as using bitshifts if a constant meta-parameter is a power-of-two (and they should be).

• Next:

• Up: Table of Contents

• Buy: Generative AI in C++: Coding Transformers and LLMs

The new AI programming book by Aussie AI co-founders:

AI coding in C++
Transformer engine speedups
LLM models
Phone and desktop AI
Code examples
Research citations

Get your copy from Amazon: Generative AI in C++

Aussie AI

AI Models are Static

AI Models are Static

Quick Links

Product

New to Writing?

Writing Styles