Aussie AI

Operator Strength Reduction

Book Excerpt from "Generative AI in C++"

by David Spuler, Ph.D.

Operator Strength Reduction

Individual operations in C++ can be optimized in several ways. The general term is “strength reduction” because a stronger operator with high computation complexity is “reduced” to an equivalent operator that is simpler and faster. Strength reduction is a technique used in automatic optimization by compilers, but can also be used by programmers to improve algorithms.

The main “strong” operations that we're trying to avoid are:

Floating-point arithmetic (even addition)
Multiplication
Division
Remainder (% operator)
Math functions (e.g. sqrtf or expf)

Congratulations! The typical AI engine only has 3 out of 5.

Strength reduction has particular relevance to AI engines because the main bottleneck is floating-point multiplication. Many of the research papers on speedups are about replacing the floating-point multiplication operation with something simpler, like addition or integer arithmetic.

Some of the general approaches in regard to strength reduction include:

Bitwise operations (e.g. bitshifts can replace multiplication)
Multiplication is slower than addition.
Avoid division and modulo/remainder operators (they're the worst!)
Use integer arithmetic rather than floating-point (where possible)
Use float single-precision arithmetic, not double-precision.
Approximate arithmetic (e.g. for math functions)

Bitshift for multiplication: The canonical example that everybody knows is that shift operators can replace multiplications by a power of two. But it's only for integers, not for floating-point numbers. Note that there's a whole class of AI engines that rely on this integer multiply-vs-bitshift optimization called “logarithmic quantization” or “power-of-two quantization.” Here's a dummy example of integer multiplication;

   y = x * 4;

This can be more efficiently coded as a left bitshift:

   y = x << 2;

Bug alert! If you're making this code change, you're likely to introduce some bugs. The “<<” and “*” operators have different precedence levels, so make sure you add more parentheses. Also, consider whether you need to use “unsigned” type when switching to a bitwise operator.

Right shift for division: The use of bitshifting works for division, too (but only for unsigned):

   y = x / 4;  
   y = x >> 2u;  // faster

Bitwise remainder calculations: The arithmetic modulus operator (remainder) can also be optimized for power-of-two operands (but only on integers):

   y = x % 512;    // Remainder (mod)
   y = x & 511u;   // Bitwise-AND

And here's another one with integer relative comparisons versus bitwise-and, although this one might not necessarily be faster:

   if (x >= 512)
   if (x & ~511u)  // Bitwise-AND of the complement (unsigned)

Avoiding multiplication: There are some simple cases even with the most basic operators that have multiple options:

    y = x * 2;
    y = x + x;   // Addition
    y = x << 1;  // Shift

Automatic Strength Reduction: In theory, C++ compilers could know what will be faster on its platform, and perform all these optimizations automatically when compiling the program. The optimizers probably do some of them, but they cannot do them all.

Intrinsic Functions: Other more advanced types of strength reduction involve avoiding costly primitives, such as mathematical functions. For example, there are bitwise arithmetic tricks to quickly compute the integer log2 function.

GPU Strength Reduction: One final note is that when doing AI coding work, we aren't as concerned about which C++ operator works the best. The more important concern is which operation is most efficient in the GPU or other non-GPU hardware acceleration (e.g. AVX-512 on CPU).

Finally, note that these optimizations are local optimizations, and the same ideas apply globally to the entire AI engine architecture. There's been a lot of research trying to change all of the arithmetic in model inference from multiplication to bitshifting (see logarithmic power-of-two quantization in Chapter 44) or addition (see logarithmic number system models in Chapter 52, or “adder networks” and other “zero multiplication models” in Chapter 51).

• Next:

• Up: Table of Contents

• Buy: Generative AI in C++: Coding Transformers and LLMs

The new AI programming book by Aussie AI co-founders:

AI coding in C++
Transformer engine speedups
LLM models
Phone and desktop AI
Code examples
Research citations

Get your copy from Amazon: Generative AI in C++

Aussie AI

Operator Strength Reduction

Operator Strength Reduction

Quick Links

Product

New to Writing?

Writing Styles