Aussie AI
What are Zero-Multiplication Modelsand?
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
What are Zero-Multiplication Models?
Multiplication causes a lot of trouble. It's slower than addition or bitshifting, and AI models need to calculate the times tables lots of times (literally billions). That adds up to a lot of CPU and GPU time spent doing the same thing.
If it hurts a lot, just stop! So, why not try to do zero multiplications instead? It turns out that we're not the first to think of this, and I count at least eleven unique ways to get rid of multiplication in LLM inference:
1. Low-bit quantization — binary and ternary quantization.
2. Logarithmic quantization — power-of-two weights allow bitshifts (see Chapter 44 for more on logarithmic quantization).
3. Logarithmic Number System (LNS) — end-to-end models based on floating-point logarithms (see Chapter 52 on Logarithmic Models).
4. Adder or Additive neural networks — using addition-based computations.
5. Max-plus networks or min-max-plus networks — using “tropical algebra” that has maximum functions combined with addition.
6. Morphological networks — uses maximum, addition, and subtraction.
7. Log-sum-exp networks — logarithm of the sum of exponentials.
8. Difference-squared networks
9. Look-up Tables (LUTs) for multiplication
10. Approximate multiplication: similar to avoiding multiplication.
11. Bitwise operations (AND/OR/XOR)
Spoiler alert: none of these methods work very well. They're either fast but inaccurate, or even slower than hardware-accelerated multiplication. We might be stuck with the star for a while.
• Next: • Up: Table of Contents |
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |