Aussie AI

51. Zero-Multiplication Models

Book Excerpt from "Generative AI in C++"

by David Spuler, Ph.D.

“These go to eleven.”

— This is Spinal Tap, 1984.

What are Zero-Multiplication Models?

Multiplication causes a lot of trouble. It's slower than addition or bitshifting, and AI models need to calculate the times tables lots of times (literally billions). That adds up to a lot of CPU and GPU time spent doing the same thing.

If it hurts a lot, just stop! So, why not try to do zero multiplications instead? It turns out that we're not the first to think of this, and I count at least eleven unique ways to get rid of multiplication in LLM inference:

1. Low-bit quantization — binary and ternary quantization.

2. Logarithmic quantization — power-of-two weights allow bitshifts (see Chapter 44 for more on logarithmic quantization).

3. Logarithmic Number System (LNS) — end-to-end models based on floating-point logarithms (see Chapter 52 on Logarithmic Models).

4. Adder or Additive neural networks — using addition-based computations.

5. Max-plus networks or min-max-plus networks — using “tropical algebra” that has maximum functions combined with addition.

6. Morphological networks — uses maximum, addition, and subtraction.

7. Log-sum-exp networks — logarithm of the sum of exponentials.

8. Difference-squared networks

9. Look-up Tables (LUTs) for multiplication

10. Approximate multiplication: similar to avoiding multiplication.

11. Bitwise operations (AND/OR/XOR)

Spoiler alert: none of these methods work very well. They're either fast but inaccurate, or even slower than hardware-accelerated multiplication. We might be stuck with the star for a while.

Low Bit Quantization

Firstly, an interesting point is that quantization with a very low number of bits (one or two) can achieve zero-multiplication inference.

Binary quantization: 1-bit binary quantization achieves the replacement of multiplication with addition, or with sign-flips. If the weights are only 1 or 0, then the “multiplication” by 1 is an addition, and multiplication by zero becomes a null-test. If the weights are +1 and -1, which is more common, then it's a sign test followed by an addition or a subtraction, or simply by a sign-flip. Oftentimes, these are optimized with bit arithmetic, since binary quantization is 1-bit quantization. Binary quantization is very fast, but has well-known problems with model accuracy.

Ternary quantization: Similarly, ternary quantization with weights -1, 0, and 1, can be implemented as a sign test, null test, addition and subtraction. However, ternary quantization also has problems with model accuracy.

2-bit quantization: The four possible weights could be implemented by zero, one or two additions, instead of multiplication. This type of 2-bit quantization does not receive as much attention in the literature.

See Chapter 44 (Advanced Quantization) for more information about these low-bit quantization techniques and their research papers.

Adder Neural Networks

If multiplication is so bad, can't we just use addition? Yes, we sure can. Cue the “adder” neural networks.

But this terminology is not the same thing as “additive models” (or “additive neural networks”), which is a term that is often used in the literature meaning something else, rather than arithmetic addition. Generalized Additive Neural Networks (GANNs) are also a different concept.

Can we change the multiplication operation generically to addition without quantization? I mean, we can change the matrix multiplication C++ code from “*” to “+” and we're done, right? Unsurprisingly, it's not a new idea to build a “dot product-like operation” using addition and subtraction. The earliest replacement of multiplication with addition seems to be Ritter and Sussner (1996), and there are many other papers on “adder” models.

Research papers on adder networks:

G. Ritter and P. Sussner, 1996, An introduction to morphological neural networks, Proceedings of 13th International Conference on Pattern Recognition (ICPR), vol. 4, pp. 709–717 vol.4, 1996, https://ieeexplore.ieee.org/abstract/document/547657 (Earliest multiplication-free neural network? Uses add and max.)
Hongyi Pan, Diaa Badawi, Xi Zhang & Ahmet Enis Cetin, 2019, Additive neural network for forest fire detection, 18 November 2019, https://link.springer.com/article/10.1007/s11760-019-01600-7 PDF: https://repository.bilkent.edu.tr/bitstreams/e1a00ff4-b85d-4cc0-b058-f885785d8eae/download (AddNet uses a multiplication-free operator to create a dot product-like operator based on addition of absolute values and sign bit tests. The neural network must be trained with this non-multiplication operator.)
Chen H, Wang Y, Xu C, Shi B, Xu C, Tian Q, Xu C., 2020, Addernet: Do we really need multiplications in deep learning?, In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 1468–1477, https://arxiv.org/abs/1912.13200, https://ieeexplore.ieee.org/document/9156624 (Code on GitHub at https://github.com/huaweinoah/AdderNet) (Uses an additive metric of the l-1 distance between the vector and the input feature, to constructed an additive network.)
Xu, Y.; Xu, C.; Chen, X.; Zhang, W.; Xu, C.; and Wang, Y. 2020. Kernel Based Progressive Distillation for Adder Neural Networks, In NeurIPS. https://proceedings.neurips.cc/paper/2020/hash/912d2b1c7b2826caf99687388d2e8f7c-Abstract.html, PDF: https://proceedings.neurips.cc/paper/2020/file/912d2b1c7b2826caf99687388d2e8f7c-Paper.pdf (Uses the l-1 additive distance between vectors, like AdderNet.)
H. Shu, J. Wang, H. Chen, L. Li, Y. Yang, and Y. Wang, 2021, Adder attention for vision transformer, In M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, editors, NeurIPS - Advances in Neural Information Processing Systems, volume 34, pages 19899–19909, 2021, https://openreview.net/forum?id=5Ld5bRB9jzY, PDF: https://proceedings.neurips.cc/paper/2021/file/a57e8915461b83adefb011530b711704-Paper.pdf, Supplementary PDF: https://openreview.net/attachment?id=5Ld5bRB9jzY&name=supplementary_material
Yunhe Wang, Mingqiang Huang, Kai Han, Hanting Chen, Wei Zhang, Chunjing Xu, and Dacheng Tao, 2021, Addernet and its minimalist hardware design for energy-efficient artificial intelligence, arXiv preprint arXiv:2101.10015, 2021, https://arxiv.org/abs/2101.10015
Wenshuo Li; Xinghao Chen; Jinyu Bai; Xuefei Ning; Yunhe Wang, 2022, Searching for energy-efficient hybrid adder-convolution neural networks, IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 19-20 June 2022, https://ieeexplore.ieee.org/document/9857279, PDF: https://openaccess.thecvf.com/content/CVPR2022W/NAS/papers/Li_Searching_for_Energy-Efficient_Hybrid_Adder-Convolution_Neural_Networks_CVPRW_2022_paper.pdf
Xinghao Chen, Chang Xu, Minjing Dong, Chunjing Xu, and Yunhe Wang, 2021, An empirical study of adder neural networks for object detection, In NeurIPS, 2021, https://arxiv.org/abs/2112.13608
Dehua Song, Yunhe Wang, Hanting Chen, Chang Xu, Chunjing Xu, and DaCheng Tao. 2021, Addersr: Towards energy efficient image super-resolution, In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15648–15657, 2021, https://arxiv.org/abs/2009.08891
C Liu, C Zhao, H Wu, X Han, S Li, 2022, Addlight: An energy-saving adder neural network for cucumber disease classification, Agriculture, 2022, 12(4), 452, https://doi.org/10.3390/agriculture12040452, https://www.mdpi.com/2077-0472/12/4/452
Hanting Chen, Yunhe Wang, Chang Xu, Chao Xu, Chunjing Xu, Tong Zhang, 2021, Universal Adder Neural Networks, May 2021, https://arxiv.org/abs/2105.14202
GuoRong Cai, ShengMing Yang, Jing Du, ZongYue Wang, Bin Huang, Yin Guan, SongJian Su, JinHe Su & SongZhi Su, 2021, Convolution without multiplication: A general speed up strategy for CNNs, Science China Technological Sciences, volume 64, pages 2627–2639 (2021), https://link.springer.com/article/10.1007/s11431-021-1936-2
Lulan Shen; Maryam Ziaeefard; Brett Meyer; Warren Gross; James J. Clark, 2022, Conjugate Adder Net (CAddNet) - a Space-Efficient Approximate CNN, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), https://ieeexplore.ieee.org/abstract/document/9857393 , PDF: https://openaccess.thecvf.com/content/CVPR2022W/ECV/papers/Shen_Conjugate_Adder_Net_CAddNet_-_A_Space-Efficient_Approximate_CNN_CVPRW_2022_paper.pdf
A. Afrasiyabi, O. Yildiz, B. Nasir, F. T. Yarman-Vural, and A. E. Çetin. 2017, Energy saving additive neural network, CoRR, abs/1702.02676, 2017, https://arxiv.org/abs/1702.02676 (Uses sum of absolute values instead of multiplication.)
Martin Hardieck; Tobias Habermann; Fabian Wagner; Michael Mecik; Martin Kumm; Peter Zipf, 2023, More AddNet: A deeper insight into DNNs using FPGA-optimized multipliers, 2023 IEEE International Symposium on Circuits and Systems (ISCAS), https://ieeexplore.ieee.org/abstract/document/10181827/
Y Zhang, B Sun, W Jiang, Y Ha, M Hu, 2022 WSQ-AdderNet: Efficient Weight Standardization based Quantized AdderNet FPGA Accelerator Design with High-Density INT8 DSP-LUT Co-Packing Optimization, 2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD), https://ieeexplore.ieee.org/document/10069557

For more research papers on adder networks, see https://www.aussieai.com/research/zero-multiplication#add.

Approximate Multiplication

Approximate multiplication algorithms can be used to avoid full multiplications. There is extensive literature on various approximations to multiplications. See Chapter 53 for more information on approximate multiplication arithmetic. Here are some of the research papers on approximate multiplication used in neural networks:

Min Soo Kim, Alberto Antonio Del Barrio Garcia, Hyunjin Kim, and Nader Bagherzadeh, 2020, The effects of approximate multiplication on convolutional neural networks, July 2020, IEEE Transactions on Emerging Topics, https://arxiv.org/abs/2007.10500
M. S. Kim, A. A. Del Barrio, L. T. Oliveira, R. Hermida, and N. Bagherzadeh, 2018, Efficient Mitchell’s approximate log multipliers for convolutional neural networks, IEEE Transactions on Computers, vol. 68, no. 5, pp. 660–675, 2018, https://ieeexplore.ieee.org/document/8532287
M. S. Ansari, V. Mrazek, B. F. Cockburn, L. Sekanina, Z. Vasicek, and J. Han, 2019, Improving the accuracy and hardware efficiency of neural networks using approximate multipliers, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 28, no. 2, pp. 317–328, 2019, https://ieeexplore.ieee.org/document/8863138
V. Mrazek, Z. Vasicek, L. Sekanina, M. A. Hanif, and M. Shafique, 2019, Alwann: Automatic layer-wise approximation of deep neural network accelerators without retraining, in 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). IEEE, 2019, pp. 1–8, https://arxiv.org/abs/1907.07229, Code: https://github.com/ehw-fit/tf-approximate
V. Mrazek, S. S. Sarwar, L. Sekanina, Z. Vasicek, and K. Roy, 2016, Design of power-efficient approximate multipliers for approximate artificial neural networks, in 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2016, pp. 1–7, https://ieeexplore.ieee.org/document/7827658
S. S. Sarwar, S. Venkataramani, A. Raghunathan, and K. Roy. 2016, Multiplier-less artificial neurons exploiting error resiliency for energy-efficient neural computing, In 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 145–150. IEEE, 2016, https://arxiv.org/abs/1602.08557 (Uses an approximate multiplier.)
R Yin, Y Li, A Moitra, P Panda, Sep 2023, MINT: Multiplier-less Integer Quantization for Spiking Neural Networks, https://arxiv.org/abs/2305.09850

Shift-Add Networks

Multiplication can be simulated via bitshifts and addition. There are two basic ideas:

Use shift counts as weights.
Approximate multiplication with shifts and adds.

The first idea is to use a single bitshift instead of multiplication (without any addition). This is called “power-of-two quantization” or “logarithmic quantization” because the shift-count weights are stored as integer-truncated log2 of the original weights. See Chapter 44 for more on the logarithmic quantization technique.

The second alternative are the “shift-add” networks, which mimic multiplications using bitshifting and addition. Papers on “shift-add networks” using a combination of bitshift and addition for zero-multiplication:

Research papers on shift-add neural networks:

Haoran You, Huihong Shi, Yipin Guo, Yingyan (Celine) Lin, 2023, ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer, arXiv preprint arXiv:2306.06446, https://arxiv.org/abs/2306.06446 (Uses a combination of addition and shifting.)
You H, Chen X, Zhang Y, Li C, Li S, Liu Z, Wang Z, Lin Y., 2020, Shiftaddnet: A hardware-inspired deep network, In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, Virtual, https://arxiv.org/abs/2010.12785
Abhishek Ramdas Nair, Pallab Kumar Nath, Shantanu Chakrabartty, Chetan Singh Thakur, 2022, Multiplierless MP-Kernel Machine For Energy-efficient Edge Devices, https://arxiv.org/pdf/2106.01958 (Uses addition, shift, comparison, and underflow/overflow operations.)
Haoran You, Xiaohan Chen, Yongan Zhang, Chaojian Li, Sicheng Li, Zihao Liu, Zhangyang Wang, and Yingyan Lin. 2020, Shiftaddnet: A hardware-inspired deep network, In NeurIPS, 2020, https://arxiv.org/abs/2010.12785
J. Faraone et al., 2020, AddNet: Deep neural networks using FPGA-optimized multipliers, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 28, no. 1, pp. 115–128, Jan. 2020. https://ieeexplore.ieee.org/document/8848603, https://arxiv.org/abs/1911.08097 (Uses bitshift and addition instead of multiplication.)

For more research papers on shift-add networks, see https://www.aussieai.com/research/zero-multiplication#shiftadd.

Add-as-Integer Networks

This method is the use of an approximate floating-point multiplication that is implemented via integer addition. This is a very weird idea and it seems almost magical that it works. It's basically this:

a) pretend that 32-bit floating-point numbers (with 1 sign bit, 8 exponent bits, and 23 mantissa bits) are actually 32-bit integers (signed), and

b) add them together using 32-bit signed integer addition.

It doesn't do full multiplication, and it seems like it should be just a dumb C++ bug, but it actually does something useful: an approximation called Mitchell's approximate multiplication.

Example: Add-as-Int Mogami Approximate Multiplication: The method uses C++ casts to trick the compiler into using the float operands as if they were int types. And then it needs to subtract an offset to correct extra bits. Let's say we want to try optimizing a basic float multiply:

     float fc = f1 * f2;   // Floating-point multiply

This is slow, so we want to try the Mogami (2020) idea to change it into addition instead. Note that fancy coding is required. A simple version doesn't work:

     int c = (int)f1 + (int)f2;  // Not multiplication!
     float fc = (float)c;

That code isn't tricking the compiler and it isn't doing multiplication at all. It does a full conversion from float to int, with all that entails, and this is nothing like floating-point multiplication.

Instead, type casting is required. Assuming that both int and float are 32-bit types, a coded version in C++ looks like:

     int c = *(int*)&(f1) + *(int*)&(f2) - 0x3f800000;  // Mogami(2020)
     float fc = *(float*)&c;

How does this even work? I mean, it seems like hocus pocus. The effect is that integer addition on the 8-bit exponent is like doing a multiplication (because exponent bits are the powers). Adding the 23 mantissa bits together isn't really the same, it's not doing multiplication, but it's close enough that it's doing an approximate version of multiplication. Some of the theory of why this works is examined in Kosson & Jaggi (2022). Overall, it seems to work like multiplication on both positive and negative floating-point, but faster because it's using integer addition. The accuracy of the multiplication is such that the difference from regular float multiplication (i.e. the error) is less than 15%. In my testing it seemed like it was usually less than 12%, so it's a very good approximation of multiplication, for a significant speedup in arithmetic calculations.

Note that the temporary integer variable is hard to get rid of in C++, and might require assembler instead. The “+” operator puts the 32-bit integer into a C++ register, but I can't find a way to re-interpret that temporary int value as a 32-bit float without first storing it to a temporary variable. A simple typecast to float doesn't work in C++:

     float fc = (float) ( *(int*)&(f1) + *(int*)&(f2) - 0x3f800000 );  // Fails...

The above doesn't work because the integer is converted by the float typecast, which is very different from re-interpreting the 32-bit temporary integer as a 32-bit float. In fact, the code above is really just a bug, as I discovered myself. It doesn't really compute anything very meaningful, not even approximately.

Example: Add-as-Integer Vector Dot Product: Here's what it looks like to put Mogami's method into a vector dot product to create an approximate version (but faster):

    float aussie_vecdot_add_as_int_mogami(
       float v1[], float v2[], int n)   
    {
        // Add as integer, Mogami(2020)
        float sum = 0.0;
        for (int i = 0; i < n; i++) {
            int c = *(int*)&(v1[i]) 
                    + *(int*)&(v2[i])
                    - 0x3f800000;
            sum += *(float*)&c;
        }
        return sum;
    }

This is not a fully optimized version. For example, the iterator variable i should be removed via pointer arithmetic, and vectorized addition is also possible (e.g. with AVX x86 intrinsics). A further optimization is a GPU version, since it's just doing integer addition, which I think a few GPUs might know how to do.

Research papers on add-as-integer networks:

T. Mogami, 2020, Deep neural network training without multiplications, In Beyond BackPropagation WS at 34th Conference on Neural Information Processing Systems, 2020, https://arxiv.org/abs/2012.03458 (multiplication of floating-point numbers with integer addition, using Mitchell's approximate multiplication)
Lingyun Yao, Martin Trapp, Karthekeyan Periasamy, Jelin Leslin, Gaurav Singh, Martin Andraud, June 2023, Logarithm-Approximate Floating-Point Multiplier for Hardware-efficient Inference in Probabilistic Circuits, Proceedings of The 6th Workshop on Tractable Probabilistic Modeling, https://openreview.net/forum?id=WL7YDLOLfK, PDF: https://openreview.net/pdf?id=WL7YDLOLfK (Probabilistic speed improvement; uses Mogami's approximate multiplier.)
A Kosson, M Jaggi, 2023, Hardware-Efficient Transformer Training via Piecewise Affine Operations, arXiv preprint arXiv:2305.17190, https://arxiv.org/abs/2305.17190, Code: https://github.com/epfml/piecewise-affine-multiplication (Uses Mogami method with neural networks, including multiple components of the model, in training and inference; also a theoretical explanation of why Mogami integer addition works, including its correct handling of sign bits.)
X Li, B Liu, RH Yang, V Courville, C Xing, VP Nia, 2023, DenseShift: Towards Accurate and Efficient Low-Bit Power-of-Two Quantization, Proceedings of the IEEE/CVF, https://openaccess.thecvf.com/content/ICCV2023/papers/Li_DenseShift_Towards_Accurate_and_Efficient_Low-Bit_Power-of-Two_Quantization_ICCV_2023_paper.pdf (Not a full add-as-integer method, but uses integer addition on the sign and exponent bits of IEEE 754 floating-point to perform bitshifts on floats to perform power-of-two number quantization on 32-bit floats.)

For more research papers on add-as-integer neural networks, see https://www.aussieai.com/research/zero-multiplication#addint.

Max-Plus and Tropical Algebra

The “max-plus” or “min-max-plus” networks use maximum or minimum operations, combined with addition, rather than multiplication. The theoretical basis of this idea is called “tropical algebra” which is a specialized mathematics consisting of min/max and addition to define a pseudo-multiplication operation.

Some other areas of theory are related to this area. Addition in the logarithmic number system can be approximated with maximum and addition (like “max-plus”). The tropical algebra is also relevant for “log-sum-exp networks” which calculate the logarithm of the sum of exponentials, which is similar to LNS addition, and can possibly be approximated similarly. Also, the calculation of the Softmax function has a denominator that is the sum-of-exponentials, so Softmax approximation is similar to LNS addition, and could involve theory from max-plus networks and tropical algebra.

Research papers on max-plus networks and tropical algebra:

Yunxiang Zhang, Samy Blusseau, Santiago Velasco-Forero, Isabelle Bloch, Jesus Angulo, 2019, Max-plus Operators Applied to Filter Selection and Model Pruning in Neural Networks, https://arxiv.org/abs/1903.08072, Code: https://github.com/yunxiangzhang (Analysis of the “max-plus” operation, based on maximum and addition, such as the maximum of the pairwise sum of an internal computation plus a weight, rather than the sum of multiplied pairs.)
Ye Luo, Shiqing Fan, 2021, Min-Max-Plus Neural Networks, Feb 2021, https://arxiv.org/abs/2102.06358 (Theory of max-plus networks, including “tropical math”, also with minimum.)
Philippe Bich; Luciano Prono; Mauro Mangia; Fabio Pareschi; Riccardo Rovatti; Gianluca Setti 2022, Aggressively prunable MAM²-based Deep Neural Oracle for ECG acquisition by Compressed Sensing, 2022 IEEE Biomedical Circuits and Systems Conference (BioCAS) 13-15 October 2022 DOI: 10.1109/BioCAS54905.2022.9948676, https://doi.org/10.1109/BioCAS54905.2022.9948676, https://ieeexplore.ieee.org/abstract/document/9948676, https://faculty.kaust.edu.sa/en/publications/aggressively-prunable-mam-based-deep-neural-oracle-for-ecg-acquis, (Not traditional max-plus; uses max-and-min with multiplication.)
Luciano Prono; Mauro Mangia; Fabio Pareschizy; Riccardo Rovatti; Gianluca Settizy, 2022, A Non-conventional Sum-and-Max based Neural Network layer for Low Power Classification, IEEE International Symposium on Circuits and Systems (ISCAS), June 2022, DOI: 10.1109/ISCAS48785.2022.9937576, https://ieeexplore.ieee.org/document/9937576
Luciano Prono, Philippe Bich, Mauro Mangia, Fabio Pareschi, 2023, A Multiply-And-Max/min Neuron Paradigm for Aggressively Prunable Deep Neural Networks, https://www.techrxiv.org/articles/preprint/A_Multiply-And-Max_min_Neuron_Paradigm_for_Aggressively_Prunable_Deep_Neural_Networks/22561567, PDF: https://www.techrxiv.org/articles/preprint/A_Multiply-And-Max_min_Neuron_Paradigm_for_Aggressively_Prunable_Deep_Neural_Networks/22561567/1/files/40119760.pdf
Yunsheng Li, Yinpeng Chen, Xiyang Dai, Dongdong Chen, Mengchen Liu, Lu Yuan, Zicheng Liu, Lei Zhang, Nuno Vasconcelos, 2021, MicroNet: Improving Image Recognition with Extremely Low FLOPs, https://ieeexplore.ieee.org/abstract/document/9857393 PDF: https://openaccess.thecvf.com/content/ICCV2021/papers/Li_MicroNet_Improving_Image_Recognition_With_Extremely_Low_FLOPs_ICCV_2021_paper.pdf (Uses shift-max algorithm.)
Goodfellow, I. J., Warde-Farley, D., Mirza, M., Courville, A., and Bengio, Y., 2013, Maxout networks, In Proceedings of the International Conference on Machine Learning (ICML), 2013, https://arxiv.org/abs/1302.4389 (Uses maximum operator arithmetic.)
S. Fan, L. Liu, and Y. Luo. 2021, An alternative practice of tropical convolution to traditional convolutional neural networks, In 2021 The 5th International Conference on Compute and Data Analysis, pages 162–168, 2021, https://arxiv.org/abs/2103.02096 (Tropical arithmetic)

For more research papers on max-plus neural networks and tropical algebra, see https://www.aussieai.com/research/zero-multiplication#maxplus.

Morphological Networks

Another type of neural network that uses max operations is called the “morphological network”. This uses maximum, addition, and subtraction operations.

Research papers on morphological networks:

Zamora, E., Sossa, H., 2017, Dendrite morphological neurons trained by stochastic gradient descent. Neurocomputing 260, 420–431 (2017), https://ieeexplore.ieee.org/document/7849933
G. Ritter and P. Sussner, 1996, An introduction to morphological neural networks, Proceedings of 13th International Conference on Pattern Recognition (ICPR), vol. 4, pp. 709–717 vol.4, 1996, https://ieeexplore.ieee.org/abstract/document/547657 (Earliest multiplication-free neural network? Uses add and max.)
Limonova E, Matveev D, Nikolaev D, Arlazarov VV., 2019, Bipolar morphological neural networks: convolution without multiplication, In: Twelfth International Conference on Machine Vision (ICMV 2019), 2020, vol. 11433, p. 11433, International Society for Optics and Photonics, https://arxiv.org/abs/1911.01971
Elena Limonova, Daniil Alfonso, Dmitry Nikolaev, Vladimir V. Arlazarov, 2021, ResNet-like Architecture with Low Hardware Requirements, https://arxiv.org/pdf/2009.07190 (Algorithm based on max and addition.)
G. X. Ritter, L. Iancu, and G. Urcid, 2003, Morphological perceptrons with dendritic structure, in The 12th IEEE International Conference on Fuzzy Systems (FUZZ), 2003. FUZZ ’03., vol. 2, May 2003, pp. 1296–1301 vol.2, https://ieeexplore.ieee.org/document/1206618 (Dendritic structure algorithm based on “lattice algebra”.)
Mondal R, Santra S, Mukherjee S, Chanda B., 2019, Morphological Network: How Far Can We Go with Morphological Neurons?, arXiv:1901.00109 http://arxiv.org/abs/1901.00109 (Examines the theory of maximum of a sum, called “dilation”, and minimum of differences, called “erosion”, in neural networks.)
Pessoa, L.F., Maragos, P., 2000, Neural networks with hybrid morphological/rank/linear nodes: a unifying framework with applications to handwritten character recognition, Pattern Recognition 33(6), 945–960 (2000), https://www.sciencedirect.com/science/article/abs/pii/S0031320399001570 (Various neurons including min-max methods.)
Goodfellow, I.J., Warde-Farley, D., Mirza, M., Courville, A., Bengio, Y., 2013, Maxout networks, arXiv preprint arXiv:1302.4389 (2013), https://arxiv.org/abs/1302.4389 (Paper on “dropout” using maximum function.)
Charisopoulos, V., Maragos, P., 2017, Morphological perceptrons: geometry and training algorithms, In: International Symposium on Mathematical Morphology and Its Applications to Signal and Image Processing. pp. 3–15 (2017), https://link.springer.com/chapter/10.1007/978-3-319-57240-6_1
Wilson, S.S., 1989, Morphological networks, In: Visual Communications and Image Processing IV. vol. 1199, pp. 483–496 (1989) https://www.spie.org/Publications/Proceedings/Paper/10.1117/12.970058?SSO=1
Davidson, J.L., Ritter, G.X., 1990, Theory of morphological neural networks, In: Digital Optical Computing II. vol. 1215, pp. 378–389 (1990), https://www.semanticscholar.org/paper/Theory-of-morphological-neural-networks-Davidson-Ritter/3d459fb68b8f1dc66e239d2404afb6702950a246
Ranjan Mondal, Sanchayan Santra, Soumendu Sundar Mukherjee, Bhabatosh Chanda, Dec 2022, Morphological Network: How Far Can We Go with Morphological Neurons?, https://arxiv.org/abs/1901.00109
Mondal, R., Santra, S., Chanda, B., 2019, Dense morphological network: An universal function approximator, arXiv preprint arXiv:1901.00109 (2019), https://arxiv.org/abs/1901.00109v1, PDF: https://arxiv.org/pdf/1901.00109v2.pdf
Peter Sussner; Estevao Laureano Esmi, 2009, An introduction to morphological perceptrons with competitive learning, 2009 International Joint Conference on Neural Networks https://ieeexplore.ieee.org/document/5178860

For more research papers on morphological neural networks, see https://www.aussieai.com/research/zero-multiplication#morph.

Other Addition Networks

These are research papers that use addition to attain zero-multiplication, but not the specific techniques above.

Research papers on other types of additive neural networks:

Baluja S, Marwood D, Covell M, Johnston N., 2018, No multiplication? no floating-point? no problem! training networks for efficient inference, arXiv preprint arXiv:1809.09244, http://arxiv.org/abs/1809.09244 (This paper is mainly about low-bit integer quantization to avoid multiplication.)
Kai Han, Yunhe Wang, Qi Tian, Jianyuan Guo, Chunjing Xu, and Chang Xu. 2019, Ghostnet: More features from cheap operations, arXiv preprint arXiv:1911.11907, 2019, https://arxiv.org/abs/1911.11907 (Applies linear operations to create extra “ghost features”, rather than a simple additive neural network.)
O. Yildiz. 2017, Training methodology for a multiplication free implementable operator based neural networks, Master’s thesis, Middle East Technical University, 2017. URL https://hdl.handle.net/11511/26664
O. Yildiz, 2017, Training Methodology for a Multiplication Free Implementable Operator Based Neural Networks, M.S. - Master of Science, Middle East Technical University, 2017. https://open.metu.edu.tr/handle/11511/26664 PDF: http://etd.lib.metu.edu.tr/upload/12621234/index.pdf
Atli Kosson, Martin Jaggi, 2023, Hardware-Efficient Transformer Training via Piecewise Affine Operations, May 2023, https://arxiv.org/abs/2305.17190
J. Johnson. 2018, Rethinking floating-point for deep learning, arXiv preprint arXiv:1811.01721, 2018, https://arxiv.org/abs/1811.01721 (“log float multiply-add” in hardware)
Mark Horowitz, 2014, 1.1 computing’s energy problem (and what we can do about it), In 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), pages 10–14. IEEE, 2014, https://doi.org/10.1109/ISSCC.2014.6757323
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018, Mobilenetv2: Inverted residuals and linear bottlenecks, In CVPR, pages 4510–4520, 2018, https://arxiv.org/abs/1801.04381
Z. Du, K. Palem, A. Lingamneni, O. Temam, Y. Chen, and C. Wu, 2014, Leveraging the error resilience of machine-learning applications for designing highly energy efficient accelerators, in 2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC), 2014, pp. 201–206. https://ieeexplore.ieee.org/document/6742890
Jingyong Cai, Masashi Takemoto,Yuming Qiu and Hironori Nakajo, 2021, Trigonometric Inference Providing Learning in Deep Neural Networks, Appl. Sci. 2021, 11(15), 6704; https://doi.org/10.3390/app11156704, https://www.mdpi.com/2076-3417/11/15/6704, PDF: https://www.mdpi.com/2076-3417/11/15/6704/pdf
Afrasiyabi A, Badawi D, Nasir B, Yildi O, Vural FTY, Çetin AE. 2018, Non-euclidean vector product for neural networks, In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2018, pp. 6862–6866. https://ieeexplore.ieee.org/document/8461709, PDF: https://par.nsf.gov/servlets/purl/10067379

For more research papers on other addition-based neural networks, see https://www.aussieai.com/research/zero-multiplication#addmisc.

Table Lookups Replace Multiplication

Table lookups of precomputed data are a well-known code optimization that has been applied to inference optimization. Pre-computation can be effective for low-bit arithmetic or for approximating the value of non-linear functions that are computationally expensive to evaluate exactly.

One partial method to remove multiplication is to use table lookups instead. This would seem to remove multiplication, although there's actually a hidden multiplication of the array indices involved in table lookups, though hopefully handled efficiently by the compiler (probably as a bitshift).

Research papers on LUT-based zero-multiplication networks:

Zhou, A.; Yao, A.; Guo, Y.; Xu, L.; and Chen, Y., 2017, Incremental network quantization: Towards lossless CNNs with low-precision weight, arXiv preprint arXiv:1702.03044, https://arxiv.org/abs/1702.03044 (bitshifting)
S Fanning, 2018, Fixed Point Multiplication-Free Implementation of Deep Neural Networks for Embedded Systems, Masters Thesis, School of Electrical and Electronic Engineering, University College Dublin 2018, https://seanfanning.eu/posts/projects/low-bitwidth-neural-networks/Thesis_SeanFanning_13360951.pdf
Mohammad Samragh Razlighi; Mohsen Imani; Farinaz Koushanfar; Tajana Rosing, 2017, LookNN: Neural network with no multiplication, Design, Automation & Test in Europe Conference & Exhibition (DATE), 27-31 March 2017, https://ieeexplore.ieee.org/document/7927280 (Lookup-table based multiplication.)

For more research papers on lookup tables as replacements for multiplication in neural networks, see https://www.aussieai.com/research/zero-multiplication#luts.

Difference-Squared Networks

Squaring the difference between two numbers is well-known in Euclidean distance calculations and statistical variance. This idea has been applied to neural networks as “diff-squared networks”. Some methods cited by other papers as “multiplication-free” model research compute a difference (subtraction), but then square it, which is technically still multiplication, but who's counting? There are bit tricks to compute square-root and inverse-square-root, so maybe someone has a trick for squaring with bitwise operators? Also, it isn't using multiplication by weights, so it's a distinct method.

Research papers on diff-squared networks:

Xinlin Li, Mariana Parazeres, Adam Oberman, Alireza Ghaffari, Masoud Asgharian & Vahid Partovi Nia, 2023, EuclidNets: An Alternative Operation for Efficient Inference of Deep Learning Models, SN Computer Science, volume 4, 2023, https://link.springer.com/article/10.1007/s42979-023-01921-y (This uses the square of the difference, which is really still multiplication.)
Xinlin Li, Mariana Parazeres, Adam Oberman, Alireza Ghaffari, Masoud Asgharian, Vahid Partovi Nia, 2022, EuclidNets: An Alternative Operation for Efficient Inference of Deep Learning Models, Dec 2022, https://arxiv.org/abs/2212.11803 (uses squares and Euclidean distances as weights)
S. Fan, L. Liu, and Y. Luo. 2021, An alternative practice of tropical convolution to traditional convolutional neural networks, In 2021 The 5th International Conference on Compute and Data Analysis, pages 162–168, 2021, https://arxiv.org/abs/2103.02096 (Tropical arithmetic)
Y. Luo and S. Fan. 2021, Min-max-plus neural networks, arXiv preprint arXiv:2102.06358, 2021, https://arxiv.org/abs/2102.06358 (Tropical arithmetic)
Robert Tibshirani, 1996, Regression Shrinkage and Selection via the Lasso, Journal of the Royal Statistical Society. Series B (Methodological), Vol. 58, No. 1 (1996), pp. 267-288, https://www.jstor.org/stable/2346178 (Low-level mathematical paper from 1996 about the additive-squares method.)

For more research papers on diff-squared neural networks, see https://www.aussieai.com/research/zero-multiplication#squares.

Bitwise Operators for Inference

Instead of addition, could any of the bitwise operations be used to replace multiplication? Yes, and there are various possibilities.

Bitshift operators: The bitwise shift operators are good candidates for replacing multiplication (or division) by a power-of-2 integer, as discussed under power-of-two quantization (logarithmic quantization). This is a well-known optimization and has considerable research.

Bitwise-or is a possible candidate, that doesn't seem to be considered in the research. When applied to positive integers, bitwise-or has some properties similar to addition, but its result will either equal or be less than an addition result, but greater than or equal to either of the two operands. This assumes bitwise-or for unsigned weights, such as via integer quantization to positive weights, because bitwise-or on negative signed numbers has various oddities. As such, bitwise-or with quantization could be considered for some of the above algorithms that use addition to replace multiplication. The accumulated sum based on bitwise-or would increase slightly more slowly than with pure addition.

Bitwise-and is another candidate, similar to bitwise-or, as it will also be less than or equal to the addition result, but the result will be less than or equal to either operand. This seems less desirable than bitwise-or, but it's all a bit unclear, and experiments are required.

Bitwise-xor seems too odd for realistic usage. It has rather strange mathematical properties. But, it has been used in various optimizations..

The use of the bitwise operators (or/and/xor) with quantization for non-multiplication inference is an area that needs some exploration. No papers were found yet.

• Next: Chapter 52. Logarithmic Models

• Up: Table of Contents

• Buy: Generative AI in C++: Coding Transformers and LLMs

The new AI programming book by Aussie AI co-founders:

AI coding in C++
Transformer engine speedups
LLM models
Phone and desktop AI
Code examples
Research citations

Get your copy from Amazon: Generative AI in C++