Aussie AI
Low Bit Quantization
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
Low Bit Quantization
Firstly, an interesting point is that quantization with a very low number of bits (one or two) can achieve zero-multiplication inference.
Binary quantization: 1-bit binary quantization achieves the replacement of multiplication with addition, or with sign-flips. If the weights are only 1 or 0, then the “multiplication” by 1 is an addition, and multiplication by zero becomes a null-test. If the weights are +1 and -1, which is more common, then it's a sign test followed by an addition or a subtraction, or simply by a sign-flip. Oftentimes, these are optimized with bit arithmetic, since binary quantization is 1-bit quantization. Binary quantization is very fast, but has well-known problems with model accuracy.
Ternary quantization: Similarly, ternary quantization with weights -1, 0, and 1, can be implemented as a sign test, null test, addition and subtraction. However, ternary quantization also has problems with model accuracy.
2-bit quantization: The four possible weights could be implemented by zero, one or two additions, instead of multiplication. This type of 2-bit quantization does not receive as much attention in the literature.
See Chapter 44 (Advanced Quantization) for more information about these low-bit quantization techniques and their research papers.
• Next: • Up: Table of Contents |
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |