Aussie AI
Add-as-Integer Networks
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
Add-as-Integer Networks
This method is the use of an approximate floating-point multiplication that is implemented via integer addition. This is a very weird idea and it seems almost magical that it works. It's basically this:
a) pretend that 32-bit floating-point numbers (with 1 sign bit, 8 exponent bits, and 23 mantissa bits) are actually 32-bit integers (signed), and
b) add them together using 32-bit signed integer addition.
It doesn't do full multiplication, and it seems like it should be just a dumb C++ bug, but it actually does something useful: an approximation called Mitchell's approximate multiplication.
Example: Add-as-Int Mogami Approximate Multiplication:
The method uses C++ casts to trick the compiler into using the float
operands as if they were int
types.
And then it needs to subtract an offset to correct extra bits.
Let's say we want to try optimizing a basic float multiply:
float fc = f1 * f2; // Floating-point multiply
This is slow, so we want to try the Mogami (2020) idea to change it into addition instead. Note that fancy coding is required. A simple version doesn't work:
int c = (int)f1 + (int)f2; // Not multiplication! float fc = (float)c;
That code isn't tricking the compiler and it isn't doing multiplication at all.
It does a full conversion from float
to int
, with all that entails,
and this is nothing like floating-point multiplication.
Instead, type casting is required.
Assuming that both int
and float
are 32-bit types,
a coded version in C++ looks like:
int c = *(int*)&(f1) + *(int*)&(f2) - 0x3f800000; // Mogami(2020) float fc = *(float*)&c;
How does this even work? I mean, it seems like hocus pocus. The effect is that integer addition on the 8-bit exponent is like doing a multiplication (because exponent bits are the powers). Adding the 23 mantissa bits together isn't really the same, it's not doing multiplication, but it's close enough that it's doing an approximate version of multiplication. Some of the theory of why this works is examined in Kosson & Jaggi (2022). Overall, it seems to work like multiplication on both positive and negative floating-point, but faster because it's using integer addition. The accuracy of the multiplication is such that the difference from regular float multiplication (i.e. the error) is less than 15%. In my testing it seemed like it was usually less than 12%, so it's a very good approximation of multiplication, for a significant speedup in arithmetic calculations.
Note that the temporary integer variable is hard to get rid of in C++,
and might require assembler instead.
The “+” operator puts the 32-bit integer into a C++ register,
but I can't find a way to re-interpret that temporary int
value as a 32-bit float
without first storing it to a temporary variable.
A simple typecast to float
doesn't work in C++:
float fc = (float) ( *(int*)&(f1) + *(int*)&(f2) - 0x3f800000 ); // Fails...
The above doesn't work because the integer is converted by the float
typecast,
which is very different from re-interpreting the 32-bit temporary integer as a 32-bit float
.
In fact, the code above is really just a bug, as I discovered myself.
It doesn't really compute anything very meaningful, not even approximately.
Example: Add-as-Integer Vector Dot Product: Here's what it looks like to put Mogami's method into a vector dot product to create an approximate version (but faster):
float aussie_vecdot_add_as_int_mogami( float v1[], float v2[], int n) { // Add as integer, Mogami(2020) float sum = 0.0; for (int i = 0; i < n; i++) { int c = *(int*)&(v1[i]) + *(int*)&(v2[i]) - 0x3f800000; sum += *(float*)&c; } return sum; }
This is not a fully optimized version. For example, the iterator variable i should be removed via pointer arithmetic, and vectorized addition is also possible (e.g. with AVX x86 intrinsics). A further optimization is a GPU version, since it's just doing integer addition, which I think a few GPUs might know how to do.
Research papers on add-as-integer networks:
- T. Mogami, 2020, Deep neural network training without multiplications, In Beyond BackPropagation WS at 34th Conference on Neural Information Processing Systems, 2020, https://arxiv.org/abs/2012.03458 (multiplication of floating-point numbers with integer addition, using Mitchell's approximate multiplication)
- Lingyun Yao, Martin Trapp, Karthekeyan Periasamy, Jelin Leslin, Gaurav Singh, Martin Andraud, June 2023, Logarithm-Approximate Floating-Point Multiplier for Hardware-efficient Inference in Probabilistic Circuits, Proceedings of The 6th Workshop on Tractable Probabilistic Modeling, https://openreview.net/forum?id=WL7YDLOLfK, PDF: https://openreview.net/pdf?id=WL7YDLOLfK (Probabilistic speed improvement; uses Mogami's approximate multiplier.)
- A Kosson, M Jaggi, 2023, Hardware-Efficient Transformer Training via Piecewise Affine Operations, arXiv preprint arXiv:2305.17190, https://arxiv.org/abs/2305.17190, Code: https://github.com/epfml/piecewise-affine-multiplication (Uses Mogami method with neural networks, including multiple components of the model, in training and inference; also a theoretical explanation of why Mogami integer addition works, including its correct handling of sign bits.)
- X Li, B Liu, RH Yang, V Courville, C Xing, VP Nia, 2023, DenseShift: Towards Accurate and Efficient Low-Bit Power-of-Two Quantization, Proceedings of the IEEE/CVF, https://openaccess.thecvf.com/content/ICCV2023/papers/Li_DenseShift_Towards_Accurate_and_Efficient_Low-Bit_Power-of-Two_Quantization_ICCV_2023_paper.pdf (Not a full add-as-integer method, but uses integer addition on the sign and exponent bits of IEEE 754 floating-point to perform bitshifts on floats to perform power-of-two number quantization on 32-bit floats.)
For more research papers on add-as-integer neural networks, see https://www.aussieai.com/research/zero-multiplication#addint.
• Next: • Up: Table of Contents |
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |