Aussie AI
Arithmetic Optimizations
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
Arithmetic Optimizations
The biggest problem with AI engines is that they do too much arithmetic. There is no shortage of electrons being applied to multiplication in vector dot products and matrix multiplication. The methods of optimization to fix this bottleneck are basically:
- Do some other type of arithmetic instead.
- Do fewer multiplications.
Alternative forms of arithmetic include bitwise shifting or addition. The ways to do fewer multiplications tend to involve higher-level algorithmic changes to the model, such as pruning or quantization.
There are two basic ways that arithmetic computations can be sped up whilst retaining the same results:
- Single operator improvements
- Expression-level optimizations (multiple operators)
Some of the methods of speeding up arithmetic come from the theory of compiler optimization (e.g. strength reduction, sub-expression elimination). Hence, the compiler will often automatically perform these types of optimizations (when the optimizer is invoked). To some extent, this makes these transformations redundant. Even so, good programming practice is to avoid situations where these optimizations are needed on a large scale. The compiler does not look at the program as a whole and can miss some “obvious” optimizations.
The biggest problem with AI engines is that they do too much arithmetic. There is no shortage of electrons being applied to multiplication in vector dot products and matrix multiplication. The methods of optimization to fix this bottleneck are basically:
- Do some other type of arithmetic instead.
- Do fewer multiplications.
Alternative forms of arithmetic include bitwise shifting or addition. The ways to do fewer multiplications tend to involve higher-level algorithmic changes to the model, such as pruning or quantization.
There are two basic ways that arithmetic computations can be sped up whilst retaining the same results:
- Single operator improvements
- Expression-level optimizations (multiple operators)
Some of the methods of speeding up arithmetic come from the theory of compiler optimization (e.g. strength reduction, sub-expression elimination). Hence, the compiler will often automatically perform these types of optimizations (when the optimizer is invoked). To some extent, this makes these transformations redundant. Even so, good programming practice is to avoid situations where these optimizations are needed on a large scale. The compiler does not look at the program as a whole and can miss some “obvious” optimizations.
• Next: • Up: Table of Contents |
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |