Aussie AI

What are Zero-Multiplication Modelsand?

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

What are Zero-Multiplication Models?

Multiplication causes a lot of trouble. It's slower than addition or bitshifting, and AI models need to calculate the times tables lots of times (literally billions). That adds up to a lot of CPU and GPU time spent doing the same thing.

If it hurts a lot, just stop! So, why not try to do zero multiplications instead? It turns out that we're not the first to think of this, and I count at least eleven unique ways to get rid of multiplication in LLM inference:

    1. Low-bit quantization — binary and ternary quantization.

    2. Logarithmic quantization — power-of-two weights allow bitshifts (see Chapter 44 for more on logarithmic quantization).

    3. Logarithmic Number System (LNS) — end-to-end models based on floating-point logarithms (see Chapter 52 on Logarithmic Models).

    4. Adder or Additive neural networks — using addition-based computations.

    5. Max-plus networks or min-max-plus networks — using “tropical algebra” that has maximum functions combined with addition.

    6. Morphological networks — uses maximum, addition, and subtraction.

    7. Log-sum-exp networks — logarithm of the sum of exponentials.

    8. Difference-squared networks

    9. Look-up Tables (LUTs) for multiplication

    10. Approximate multiplication: similar to avoiding multiplication.

    11. Bitwise operations (AND/OR/XOR)

Spoiler alert: none of these methods work very well. They're either fast but inaccurate, or even slower than hardware-accelerated multiplication. We might be stuck with the star for a while.

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++