Aussie AI
AI Models are Static
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
AI Models are Static
An AI model is inherently static after it's been trained and fine-tuned, and this characteristic offers many opportunities for “offline” speedups. At the highest level there are the model compression optimizations (e.g. quantization, pruning) that create a smaller model file. In addition, some of the other model meta-parameters also have a significant impact on what the C++ compiler can do.
- Internal model dimension — i.e. the “embedding size”
- Context window size — maximum input token length
- Number of layers — depth of the model
These are all constant for both training and inference.
It is strongly recommended that you use these parameters to create
a model-specific C++ engine that is specialized for this particular model,
rather than a generalized AI engine that can handle multiple model sizes.
In simpler terms, make all of these meta-parameters as “const
” in your
code and turn the optimizer up to eleven.
Anywhere in the C++ kernels that these numbers are used gives the optimizer an opportunity to make smarter efficiency choices. These optimizations range from full auto-vectorization of loops into parallel execution if the compiler can see that they are a fixed length, to the simpler arithmetic strength reduction optimizations, such as using bitshifts if a constant meta-parameter is a power-of-two (and they should be).
• Next: • Up: Table of Contents |
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |