Aussie AI

AI Models are Static

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

AI Models are Static

An AI model is inherently static after it's been trained and fine-tuned, and this characteristic offers many opportunities for “offline” speedups. At the highest level there are the model compression optimizations (e.g. quantization, pruning) that create a smaller model file. In addition, some of the other model meta-parameters also have a significant impact on what the C++ compiler can do.

  • Internal model dimension — i.e. the “embedding size”
  • Context window size — maximum input token length
  • Number of layers — depth of the model

These are all constant for both training and inference. It is strongly recommended that you use these parameters to create a model-specific C++ engine that is specialized for this particular model, rather than a generalized AI engine that can handle multiple model sizes. In simpler terms, make all of these meta-parameters as “const” in your code and turn the optimizer up to eleven.

Anywhere in the C++ kernels that these numbers are used gives the optimizer an opportunity to make smarter efficiency choices. These optimizations range from full auto-vectorization of loops into parallel execution if the compiler can see that they are a fixed length, to the simpler arithmetic strength reduction optimizations, such as using bitshifts if a constant meta-parameter is a power-of-two (and they should be).

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++