Aussie AI

Zero Multiplication Inference Algorithms

Last Updated 26 March, 2025

by David Spuler, Ph.D.

Multiplication causes a lot of trouble. It's slower than addition or bitshifting, and AI models need to calculate the times tables lots of times (literally billions). That adds up to a lot of CPU and GPU time spent doing the same thing.

So why not do zero multiplication instead?

Types of Zero-Multiplication Models

It turns out that there are several ways to get rid of multiplication in LLM inference.

Low-bit quantization (binary/ternary)
Logarithmic quantization (power-of-two operations allow bitshifts)
Logarithmic end-to-end models based on the Logarithmic Number System (LNS)
Adder or Additive neural networks (using addition based metrics; see below)
Max-plus networks or min-max-plus networks, using "tropical algebra" that has maximum functions combined with addition (see below)
Log-sum-exp networks (logarithm of the sum of exponentials)
Diff-squared networks
Look-up Tables (LUTs)
Approximate multiplication: similar to avoiding multiplication.
Weightless Neural Networks (WNNs)
Bitwise operations
Hadamard elementwise matrix multiplication operations

Newest Zero-Multiplication Models

Most of these models remain in the research labs, and are not widely used in industry. Even binary quantization and ternary quantization are not often implemented commercially because of accuracy loss.

Hadamard models. The latest research to get some attention is element-wise multiplication models, which is the "Hadamard product" of two matrices. See Hadamard multiplication models .

Low Bit Quantization for Zero-Multiplication Networks

Firstly, an interesting point is that quantization with a low number of bits can achieve zero-multiplication inference.

Binary quantization: 1-bit binary quantization achieves the replacement of multiplication with addition, or with sign-flips. If the weights are only 1 or 0, then the "multiplication" by 1 is an addition, and multiplication by zero becomes a null-test. If the weights are +1 and -1, then it's a sign test followed by an addition or a subtraction, or simply by a sign-flip. Oftentimes, these are optimized with bit arithmetic, since binary quantization is 1-bit quantization. Binary quantization is very fast, but has well-known problems with model accuracy. Read more about binary quantization.

Ternary quantization: Similarly, ternary quantization with weights -1, 0, and 1, can be implemented as a sign test, null test, addition and subtraction. However, ternary quantization also has problems with model accuracy. Read more about ternary quantization.

2-bit quantization: The 4 possible weights could be implemented by one or two additions, instead of multiplication. This type of 2-bit quantization does not receive as much attention in the literature.

Adder Neural Networks (using Addition-based Metrics)

If multiplication is so bad, can't we just use addition? Yes, we sure can. Cue the "adder" neural networks.

But it's not "additive models" (or "additive neural networks"), which is a term that is often used in the literature meaning something else, rather than arithmetic addition. Generalized Additive Neural Networks (GANNs) are a different concept.

So, can we change the multiplication operation generically to addition without quantization? I mean, we can change the matrix multiplication C++ code from "*" to "+" and we're done, right? Unsurprisingly, it's not a new idea to build a "dot product-like operation" using addition and subtraction. The earliest replacement of multiplication with addition seems to be Ritter and Sussner (1996).

There are various inference methods to use addition, and a few papers:

Hongyi Pan, Diaa Badawi, Xi Zhang & Ahmet Enis Cetin, Additive neural network for forest fire detection, 18 November 2019, https://link.springer.com/article/10.1007/s11760-019-01600-7 PDF: https://repository.bilkent.edu.tr/bitstreams/e1a00ff4-b85d-4cc0-b058-f885785d8eae/download (AddNet uses a multiplication-free operator to create a dot product-like operator based on addition of absolute values and sign bit tests. The neural network must be trained with this non-multiplication operator.)
Chen H, Wang Y, Xu C, Shi B, Xu C, Tian Q, Xu C., Addernet: Do we really need multiplications in deep learning? In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 1468–1477, https://arxiv.org/abs/1912.13200, https://ieeexplore.ieee.org/document/9156624 (Code on GitHub at https://github.com/huaweinoah/AdderNet) (Uses an additive metric of the l-1 distance between the vector and the input feature, to constructed an additive network.)
Xu et al. 2020 Xu, Y.; Xu, C.; Chen, X.; Zhang, W.; Xu, C.; and Wang, Y. 2020. Kernel Based Progressive Distillation for Adder Neural Networks. In NeurIPS. https://proceedings.neurips.cc/paper/2020/hash/912d2b1c7b2826caf99687388d2e8f7c-Abstract.html, PDF: https://proceedings.neurips.cc/paper/2020/file/912d2b1c7b2826caf99687388d2e8f7c-Paper.pdf (Uses the l-1 additive distance between vectors, like AdderNet.)
H. Shu, J. Wang, H. Chen, L. Li, Y. Yang, and Y. Wang, Adder attention for vision transformer, In M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, editors, NeurIPS - Advances in Neural Information Processing Systems, volume 34, pages 19899–19909, 2021, https://openreview.net/forum?id=5Ld5bRB9jzY, PDF: https://proceedings.neurips.cc/paper/2021/file/a57e8915461b83adefb011530b711704-Paper.pdf, Supplementary PDF: https://openreview.net/attachment?id=5Ld5bRB9jzY&name=supplementary_material
Yunhe Wang, Mingqiang Huang, Kai Han, Hanting Chen, Wei Zhang, Chunjing Xu, and Dacheng Tao, Addernet and its minimalist hardware design for energy-efficient artificial intelligence, arXiv preprint arXiv:2101.10015, 2021, https://arxiv.org/abs/2101.10015
Searching for energy-efficient hybrid adder-convolution neural networks Wenshuo Li; Xinghao Chen; Jinyu Bai; Xuefei Ning; Yunhe Wang, 2022, IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 19-20 June 2022, https://ieeexplore.ieee.org/document/9857279, PDF: https://openaccess.thecvf.com/content/CVPR2022W/NAS/papers/Li_Searching_for_Energy-Efficient_Hybrid_Adder-Convolution_Neural_Networks_CVPRW_2022_paper.pdf
Xinghao Chen, Chang Xu, Minjing Dong, Chunjing Xu, and Yunhe Wang, An empirical study of adder neural networks for object detection, In NeurIPS, 2021, https://arxiv.org/abs/2112.13608
Dehua Song, Yunhe Wang, Hanting Chen, Chang Xu, Chunjing Xu, and DaCheng Tao. Addersr: Towards energy efficient image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15648–15657, 2021, https://arxiv.org/abs/2009.08891
C Liu, C Zhao, H Wu, X Han, S Li, Addlight: An energy-saving adder neural network for cucumber disease classification, Agriculture, 2022, 12(4), 452, https://doi.org/10.3390/agriculture12040452, https://www.mdpi.com/2077-0472/12/4/452
Hanting Chen, Yunhe Wang, Chang Xu, Chao Xu, Chunjing Xu, Tong Zhang, Universal Adder Neural Networks, May 2021, https://arxiv.org/abs/2105.14202
Convolution without multiplication: A general speed up strategy for CNNs, GuoRong Cai, ShengMing Yang, Jing Du, ZongYue Wang, Bin Huang, Yin Guan, SongJian Su, JinHe Su & SongZhi Su, 2021, Science China Technological Sciences, volume 64, pages 2627–2639 (2021), https://link.springer.com/article/10.1007/s11431-021-1936-2
Lulan Shen; Maryam Ziaeefard; Brett Meyer; Warren Gross; James J. Clark, Conjugate Adder Net (CAddNet) - a Space-Efficient Approximate CNN, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), https://ieeexplore.ieee.org/abstract/document/9857393 , PDF: https://openaccess.thecvf.com/content/CVPR2022W/ECV/papers/Shen_Conjugate_Adder_Net_CAddNet_-_A_Space-Efficient_Approximate_CNN_CVPRW_2022_paper.pdf
A. Afrasiyabi, O. Yildiz, B. Nasir, F. T. Yarman-Vural, and A. E. Çetin. Energy saving additive neural network. CoRR, abs/1702.02676, 2017, https://arxiv.org/abs/1702.02676 (Uses sum of absolute values instead of multiplication.)
Martin Hardieck; Tobias Habermann; Fabian Wagner; Michael Mecik; Martin Kumm; Peter Zipf, 2023, More AddNet: A deeper insight into DNNs using FPGA-optimized multipliers, 2023 IEEE International Symposium on Circuits and Systems (ISCAS), https://ieeexplore.ieee.org/abstract/document/10181827/
Y Zhang, B Sun, W Jiang, Y Ha, M Hu, 2022 WSQ-AdderNet: Efficient Weight Standardization based Quantized AdderNet FPGA Accelerator Design with High-Density INT8 DSP-LUT Co-Packing Optimization, 2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD), https://ieeexplore.ieee.org/document/10069557
Hou-I Liu, Marco Galindo, Hongxia Xie, Lai-Kuan Wong, Hong-Han Shuai, Yung-Yui Li, Wen-Huang Cheng, 8 Apr 2024, Lightweight Deep Learning for Resource-Constrained Environments: A Survey, https://arxiv.org/abs/2404.07236 (A survey of various optimizations, with a lot of focus on image and vision models, including CNNs, RNNs, and Transformers.)
Giorgos Armeniakos, Georgios Zervakis, Dimitrios Soudris, Jörg Henkel, 2022, Hardware Approximate Techniques for Deep Neural Network Accelerators: A Survey, ACM Computing Surveys, Volume 55, Issue 4, No. 83, pp 1–36 https://doi.org/10.1145/3527156, https://dl.acm.org/doi/10.1145/3527156, https://arxiv.org/abs/2203.08737
David Spuler, March 2024, Chapter 51. Zero-Multiplication Models, Generative AI in C++: Coding Transformers and LLMs, https://www.amazon.com/dp/B0CXJKCWX9
Sue Ann Campbell, Delay independent stability for additive neural networks, January 2001, Differential Equations and Dynamical Systems 9(3), https://www.researchgate.net/publication/268246122_Delay_independent_stability_for_additive_neural_networks PDF: https://www.math.uwaterloo.ca/~sacampbe/preprints/deds01.pdf
Xu Y, Xu C, Chen X, Zhang W, Xu C, Wang Y. Kernel based progressive distillation for adder neural networks. 2020 arXiv preprint arXiv:2009.13044 http://arxiv.org/abs/2009.13044
Haoran You, Yipin Guo, Yichao Fu, Wei Zhou, Huihong Shi, Xiaofan Zhang, Souvik Kundu, Amir Yazdanbakhsh, Yingyan (Celine)Lin, 11 Jun 2024 (v2), ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization, https://arxiv.org/abs/2406.05981 Code: https://github.com/GATECH-EIC/ShiftAddLLM (Post-training conversion of LLMs to non-multiplication shift-add architectures.)
Rui-Jie Zhu, Yu Zhang, Ethan Sifferman, Tyler Sheaves, Yiqiao Wang, Dustin Richmond, Peng Zhou, Jason K. Eshraghian, 4 Jun 2024, Scalable MatMul-free Language Modeling, https://arxiv.org/abs/2406.02528 Code: https://github.com/ridgerchu/matmulfreellm (Uses addition via ternary quantization and elementwise Hadamard products to replace MatMul.)
Yipin Guo, Zihao Li, Yilin Lang, Qinyuan Ren, 3 Jul 2024, ShiftAddAug: Augment Multiplication-Free Tiny Neural Network with Hybrid Computation, https://arxiv.org/abs/2407.02881 (Hybrid of fast shift-add multiplication-free computations with some slower multiplication operators.)
Zhang, Ning, He Chen, Liang Chen, Jue Wang, Guoqing Wang, and Wenchao Liu. 2024. Q-A2NN: Quantized All-Adder Neural Networks for Onboard Remote Sensing Scene Classification, Remote Sensing 16, no. 13: 2403. https://doi.org/10.3390/rs16132403, https://www.mdpi.com/2072-4292/16/13/2403
V Marino, L Lavagno, Oct 2024, Hardware Acceleration of AdderNet via High-Level Synthesis for FPGA, Masters Thesis, Politechnico Di Torino, Itality https://webthesis.biblio.polito.it/33219/1/tesi.pdf
Zihao Zheng, Yuanchun Li, Jiayu Chen, Peng Zhou, Xiang Chen, Yunxin Liu, 18 Dec 2024, Threshold Neuron: A Brain-inspired Artificial Neuron for Efficient On-device Inference, https://arxiv.org/abs/2412.13902 (Multiplication-free model architecture using comparisons and subtraction, including a threshold mechanism that make it analogous to activation sparsification.)
Chang Liu, Rui Zhang, Xishan Zhang, Yifan Hao, Zidong Du, Xing Hu, Ling Li, Qi Guo, 28 Feb 2023, Ultra-low Precision Multiplication-free Training for Deep Neural Networks, https://arxiv.org/abs/2302.14458
Hongyin Luo, Wei Sun, 2 Oct 2024 (v2), Addition is All You Need for Energy-efficient Language Models, https://arxiv.org/abs/2410.00907 (This looks similar to the Mogami add-as-integer method.)
Wei-Pau Kiat, Wai Kong Lee, Hung-Khoon Tan, Hui-Fuang Ng, 02 January 2025, Pipeline ShiftAddNet: An FPGA-Based CNN Implementation With Low Hardware Consumption Targeting Constrained Devices, International Journal of Circuit Theory and Applications, https://doi.org/10.1002/cta.4419 https://onlinelibrary.wiley.com/doi/abs/10.1002/cta.4419
Edoardo Maria Ponti, Adrian Łańcucki, Johannes Rausch and David Tarjan, Jan 24, 2025, Dynamic Memory Compression, https://developer.nvidia.com/blog/dynamic-memory-compression/ (Using addition for KV cache compression.)
Y. Zhang, O. A. Kailani, B. Zhou and W. Zhao, "AdderNet 2.0: Optimal AdderNet Accelerator Designs With Activation-Oriented Quantization and Fused Bias Removal-Based Memory Optimization," in IEEE Transactions on Circuits and Systems I: Regular Papers, doi: 10.1109/TCSI.2025.3539912. https://ieeexplore.ieee.org/abstract/document/10884535/
N. Zhang, S. Ni, L. Chen, T. Wang and H. Chen, "High-Throughput and Energy-Efficient FPGA-Based Accelerator for All Adder Neural Networks," in IEEE Internet of Things Journal, doi: 10.1109/JIOT.2025.3543213. https://ieeexplore.ieee.org/abstract/document/10896587/
Z. Qian et al., "An RRAM-Based Computing-in-Memory Macro With Low-Power Readout/Hold Circuits and Activation Differential Strategy for AdderNet," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, doi: 10.1109/TVLSI.2025.3546684. https://ieeexplore.ieee.org/abstract/document/10929692/

Approximate Multiplier Zero-Multiplication Networks

Approximate multiplication algorithms can be used to avoid full multiplications. There is extensive literature on various approximations to multiplications; see Approximate multiplication.

A few of the approximate multiplication papers specific to zero-multiplication neural networks include:

Min Soo Kim, Alberto Antonio Del Barrio Garcia, Hyunjin Kim, and Nader Bagherzadeh, The effects of approximate multiplication on convolutional neural networks, July 2020, IEEE Transactions on Emerging Topics, https://arxiv.org/abs/2007.10500
M. S. Kim, A. A. Del Barrio, L. T. Oliveira, R. Hermida, and N. Bagherzadeh, “Efficient Mitchell’s approximate log multipliers for convolutional neural networks,” IEEE Transactions on Computers, vol. 68, no. 5, pp. 660–675, 2018, https://ieeexplore.ieee.org/document/8532287
M. S. Ansari, V. Mrazek, B. F. Cockburn, L. Sekanina, Z. Vasicek, and J. Han, “Improving the accuracy and hardware efficiency of neural networks using approximate multipliers,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 28, no. 2, pp. 317–328, 2019, https://ieeexplore.ieee.org/document/8863138
V. Mrazek, Z. Vasicek, L. Sekanina, M. A. Hanif, and M. Shafique, “Alwann: Automatic layer-wise approximation of deep neural network accelerators without retraining,” in 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). IEEE, 2019, pp. 1–8, https://arxiv.org/abs/1907.07229, Code: https://github.com/ehw-fit/tf-approximate
V. Mrazek, S. S. Sarwar, L. Sekanina, Z. Vasicek, and K. Roy, “Design of power-efficient approximate multipliers for approximate artificial neural networks,” in 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2016, pp. 1–7, https://ieeexplore.ieee.org/document/7827658
S. S. Sarwar, S. Venkataramani, A. Raghunathan, and K. Roy. Multiplier-less artificial neurons exploiting error resiliency for energy-efficient neural computing. In 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 145–150. IEEE, 2016, https://arxiv.org/abs/1602.08557 (Uses an approximate multiplier.)
MINT: Multiplier-less Integer Quantization for Spiking Neural Networks, R Yin, Y Li, A Moitra, P Panda, Sep 2023, https://arxiv.org/abs/2305.09850

Shift-Add Networks

Multiplication can be simulated via bitshifts and addition. See also Logarithmic quantization for power-of-two multiplications via bitshifts.

Papers on "shift-add networks" using a combination of bitshift and addition for zero-multiplication:

Haoran You, Huihong Shi, Yipin Guo, Yingyan (Celine) Lin, 2023, ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer, arXiv preprint arXiv:2306.06446, https://arxiv.org/abs/2306.06446 (Uses a combination of addition and shifting.)
You H, Chen X, Zhang Y, Li C, Li S, Liu Z, Wang Z, Lin Y., Shiftaddnet: A hardware-inspired deep network. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, Virtual, https://arxiv.org/abs/2010.12785
Abhishek Ramdas Nair, Pallab Kumar Nath, Shantanu Chakrabartty, Chetan Singh Thakur, Multiplierless MP-Kernel Machine For Energy-efficient Edge Devices, 2022, https://arxiv.org/pdf/2106.01958 (Uses addition, shift, comparison, and underflow/overflow operations.)
Haoran You, Xiaohan Chen, Yongan Zhang, Chaojian Li, Sicheng Li, Zihao Liu, Zhangyang Wang, and Yingyan Lin. Shiftaddnet: A hardware-inspired deep network. In NeurIPS, 2020, https://arxiv.org/abs/2010.12785
J. Faraone et al., “AddNet: Deep neural networks using FPGA-optimized multipliers,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 28, no. 1, pp. 115–128, Jan. 2020. https://ieeexplore.ieee.org/document/8848603, https://arxiv.org/abs/1911.08097 (Uses bitshift and addition instead of multiplication.)
Yipin Guo, Zihao Li, Yilin Lang, Qinyuan Ren, 3 Jul 2024, ShiftAddAug: Augment Multiplication-Free Tiny Neural Network with Hybrid Computation, https://arxiv.org/abs/2407.02881 (Hybrid of fast shift-add multiplication-free computations with some slower multiplication operators.)
Haoran You, Huihong Shi, Yipin Guo, Yingyan Lin, 2023, ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer, NeurIPS 2023, Part of Advances in Neural Information Processing Systems 36 (NeurIPS 2023), https://proceedings.neurips.cc/paper_files/paper/2023/file/69c49f75ca31620f1f0d38093d9f3d9b-Paper-Conference.pdf https://proceedings.neurips.cc/paper_files/paper/2023/hash/69c49f75ca31620f1f0d38093d9f3d9b-Abstract-Conference.html
Haoran You, Yipin Guo, Yichao Fu, Wei Zhou, Huihong Shi, Xiaofan Zhang, Souvik Kundu, Amir Yazdanbakhsh, Yingyan (Celine)Lin, 11 Jun 2024 (v2), ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization, https://arxiv.org/abs/2406.05981 Code: https://github.com/GATECH-EIC/ShiftAddLLM (Post-training conversion of LLMs to non-multiplication shift-add architectures.)
X. Geng, S. Liu, J. Jiang, K. Jiang and H. Jiang, 2024, Compact Powers-of-Two: An Efficient Non-Uniform Quantization for Deep Neural Networks, 2024 Design, Automation & Test in Europe Conference & Exhibition (DATE), Valencia, Spain, 2024, pp. 1-6, doi: 10.23919/DATE58400.2024.10546652, https://ieeexplore.ieee.org/abstract/document/10546652
Wei-Pau Kiat, Wai Kong Lee, Hung-Khoon Tan, Hui-Fuang Ng, 02 January 2025, Pipeline ShiftAddNet: An FPGA-Based CNN Implementation With Low Hardware Consumption Targeting Constrained Devices, International Journal of Circuit Theory and Applications, https://doi.org/10.1002/cta.4419 https://onlinelibrary.wiley.com/doi/abs/10.1002/cta.4419

Add-as-Integer Approximate Multiplication Networks

The use of an approximate multiplication that is implemented via integer addition. This is a very weird idea and it seems almost magical that it works. It's basically pretend that 32-bit floating point (with its 1 sign bit, 8 exponent bits, and 23 mantissa bits) is actually a 32-bit integer (signed), and add them together. It doesn't do full multiplication, but it does an approximation called Mitchell's approximate multiplication.

Example: Add-as-Int Mogami Approximate Multiplication: The method uses C++ casts to trick the compiler into using the floats as if they were ints. And then it needs to subtract an offset to correct extra bits. Let's say we want to try optimizing a basic float multiply:

     float fc = f1 * f2;   // Floating-point multiply

This is slow, so we want to try the Mogami (2020) idea to change it into addition instead. Note that fancy coding is required. A simple version doesn't work:

     int c = (int)f1 + (int)f2;  // Not multiplication!
     float fc = (float)c;

That code isn't tricking the compiler and it isn't doing multiplication at all. It does a full conversion from float to int, with all that entails, and this is nothing like floating point multiplication.

Instead, type casting is required. Assuming that both int and float are 32-bit types, a coded version in C++ looks like:

     int c = *(int*)&(f1) + *(int*)&(f2) - 0x3f800000;  // Mogami(2020)
     float fc = *(float*)&c;

How does this even work? I mean, it seems like hocus pocus. The effect is that integer addition on the 8-bit exponent is like doing a multiplication (because exponent bits are the powers). Adding the 23 mantissa bits together isn't really the same, it's not doing multiplication, but it's close enough that it's doing an approximate version of multiplication. Some of the theory of why this works is examined in Kosson & Jaggi (2022). Overall, it seems to work like multiplication on both positive and negative floating point, but faster because it's using integer addition. The accuracy of the multiplication is such that the difference from regular float multiplication (i.e. the error) is less than 15%. In my testing it seemed like it was usually less than 12%, so it's a very good approximation of multiplication, for a significant speedup in arithmetic calculations.

Note that the temporary integer variable is hard to get rid of in C++, and might require assembler instead. The "+" operator puts the 32-bit integer into a C++ register, but I can't find a way to re-interpret that temporary int value as a 32-bit float without first storing it to a temporary variable. A simple typecast to float doesn't work in C++:

     float fc = (float) ( *(int*)&(f1) + *(int*)&(f2) - 0x3f800000 );  // Fails...

The above doesn't work because the integer is converted by the float typecast, which is very different from re-interpreting the 32-bit temporary integer as a 32-bit float. In fact, the code above is really just a bug, as I discovered myself. It doesn't really compute anything very meaningful, not even approximately.

Example: Add-as-Integer Vector Dot Product: Here's what it looks like to put Mogami's method into a vector dot product to create an approximate version (but faster):

    float yapi_vecdot_add_as_int_mogami(float v1[], float v2[], int n)   // Add as integer
    {
	float sum = 0.0;
	for (int i = 0; i < n; i++) {
		int c = *(int*)&(v1[i]) + *(int*)&(v2[i]) - 0x3f800000;  // Mogami(2020)
		sum += *(float*)&c;
	}
	return sum;
    }

This is not a fully optimized version. For example, the iterator variable i should be removed via pointer arithmetic.

Why does Mogami's add-as-integer idea even work? There are a few points that help to explain why it is a close approximation:

Implicit leading-1 in the mantissa bits. It is important to note that because both mantissa bits don't include the impliciting leading 1, that the integer addition is not adding these bits together. Instead, it is only adding the lower-order bits of the mantissa, which helps explain why the error rate is low.
Fails for fix-point representations. the Mogami add-as-integer approach is unlikely to be a useful approximation of fixed-point numbers. Note that multiplication of fixed-point numbers are already implemented as integer multiplication. Also note that fixed-point does not use an implicit 1 leading mantissa bit. Further, the same comments apply to hybrid fixed-point approaches such as block floating-point, where the add-as-integer approximation is unlikely to be very accurate.
Adjacency of exponent and mantissa bits. With integer addition used on the mantissa bits, if their highest order bits are both set, then a 1 is carried up to the next bit position. This is in the exponent, since it is next to the mantissa in the IEEE 754 format. Hence, an extra 1 is added to the exponent in this case. Note that the leading 1 in the explicit mantissa bits is not actually the highest order bit overall, but the second-highest, because there is an implicit leading-1 bit in both mantissas. So, addition of the second-highest bit in the mantissa, where both are 1, may then add 1 to the exponent (in addition to the two exponents being added together).

Research Papers on Add-as-Int Approximate Multiplication:

T. Mogami, Deep neural network training without multiplications, In Beyond BackPropagation WS at 34th Conference on Neural Information Processing Systems, 2020, https://arxiv.org/abs/2012.03458 (multiplication of floating-point numbers with integer addition, using Mitchell's approximate multiplication)
Lingyun Yao, Martin Trapp, Karthekeyan Periasamy, Jelin Leslin, Gaurav Singh, Martin Andraud, June 2023, Logarithm-Approximate Floating-Point Multiplier for Hardware-efficient Inference in Probabilistic Circuits, Proceedings of The 6th Workshop on Tractable Probabilistic Modeling, https://openreview.net/forum?id=WL7YDLOLfK, PDF: https://openreview.net/pdf?id=WL7YDLOLfK (Probabilistic speed improvement; uses Mogami's approximate multiplier.)
A Kosson, M Jaggi, 2023, Hardware-Efficient Transformer Training via Piecewise Affine Operations, arXiv preprint arXiv:2305.17190, https://arxiv.org/abs/2305.17190, Code: https://github.com/epfml/piecewise-affine-multiplication (Uses Mogami method with neural networks, including multiple components of the model, in training and inference; also a theoretical explanation of why Mogami integer addition works, including its correct handling of sign bits.)
X Li, B Liu, RH Yang, V Courville, C Xing, VP Nia, 2023, DenseShift: Towards Accurate and Efficient Low-Bit Power-of-Two Quantization, Proceedings of the IEEE/CVF, https://openaccess.thecvf.com/content/ICCV2023/papers/Li_DenseShift_Towards_Accurate_and_Efficient_Low-Bit_Power-of-Two_Quantization_ICCV_2023_paper.pdf (Not a full add-as-integer method, but uses integer addition on the sign and exponent bits of IEEE 754 floating point to perform bitshifts on floats to perform power-of-two number quantization on 32-bit floats.)
David Spuler, March 2024, Chapter 51. Zero-Multiplication Models, Generative AI in C++: Coding Transformers and LLMs, https://www.amazon.com/dp/B0CXJKCWX9
Lingyun Yao, Martin Trapp, Jelin Leslin, Gaurav Singh, Peng Zhang, Karthekeyan Periasamy, Martin Andraud, 22 May 2024, On Hardware-efficient Inference in Probabilistic Circuits, https://arxiv.org/abs/2405.13639
C. Hakert, K. -H. Chen and J. -J. Chen, 2024, FLInt: Exploiting Floating Point Enabled Integer Arithmetic for Efficient Random Forest Inference, 2024 Design, Automation & Test in Europe Conference & Exhibition (DATE), Valencia, Spain, 2024, pp. 1-2, doi: 10.23919/DATE58400.2024.10546851, https://ieeexplore.ieee.org/abstract/document/10546851
David Spuler, March 2024, Example: Add-as-int Approximate Multiply, in Generative AI in C++, https://www.aussieai.com/book/ch9-example-add-as-integer
Hongyin Luo, Wei Sun, 2 Oct 2024 (v2), Addition is All You Need for Energy-efficient Language Models, https://arxiv.org/abs/2410.00907 (This looks similar to Mogami add-as-integer method.)

Max-Plus and Related Tropical Networks

The "max-plus" or "min-max-plus" networks use maximum or minimum operations, combined with addition, rather than multiplication. The theoretical basis of this idea is called "tropical algebra" which is a specialized mathematics consistenting of min/max and addition to define a pseudo-multiplication operation.

Some other areas of theory are related to this area. Addition in the logarithmic number system can be approximated with maximum and addition (like "max-plus"). The tropical algebra is also relevant for "log-sum-exp networks" which calculate the logarithm of the sum of exponentials, which is similar to LNS addition, and can possibly be approximated similarly. Also, the calculation of the Softmax function has a denominator that is the sum-of-exponentials, so Softmax approximation is similar to LNS addition, and could involve theory from max-plus networks and tropical algebra.

Papers on max-plus neural networks, and others based on max or min operations, include:

Yunxiang Zhang, Samy Blusseau, Santiago Velasco-Forero, Isabelle Bloch, Jesus Angulo, 2019, Max-plus Operators Applied to Filter Selection and Model Pruning in Neural Networks, https://arxiv.org/abs/1903.08072, Code: https://github.com/yunxiangzhang (Analysis of the "max-plus" operation, based on maximum and addition, such as the maximum of the pairwise sum of an internal computation plus a weight, rather than the sum of multiplied pairs.)
Ye Luo, Shiqing Fan, Min-Max-Plus Neural Networks, Feb 2021, https://arxiv.org/abs/2102.06358 (Theory of max-plus networks, including "tropical math", also with minimum.)
Aggressively prunable MAM²-based Deep Neural Oracle for ECG acquisition by Compressed Sensing, Philippe Bich; Luciano Prono; Mauro Mangia; Fabio Pareschi; Riccardo Rovatti; Gianluca Setti 2022, 2022 IEEE Biomedical Circuits and Systems Conference (BioCAS) 13-15 October 2022 DOI: 10.1109/BioCAS54905.2022.9948676, https://doi.org/10.1109/BioCAS54905.2022.9948676, https://ieeexplore.ieee.org/abstract/document/9948676, https://faculty.kaust.edu.sa/en/publications/aggressively-prunable-mam-based-deep-neural-oracle-for-ecg-acquis, (Not traditional max-plus; uses max-and-min with multiplication.)
Luciano Prono; Mauro Mangia; Fabio Pareschizy; Riccardo Rovatti; Gianluca Settizy, A Non-conventional Sum-and-Max based Neural Network layer for Low Power Classification, 2022, IEEE International Symposium on Circuits and Systems (ISCAS), June 2022, DOI: 10.1109/ISCAS48785.2022.9937576, https://ieeexplore.ieee.org/document/9937576
Luciano Prono, Philippe Bich, Mauro Mangia, Fabio Pareschi, A Multiply-And-Max/min Neuron Paradigm for Aggressively Prunable Deep Neural Networks, 2023, https://www.techrxiv.org/articles/preprint/A_Multiply-And-Max_min_Neuron_Paradigm_for_Aggressively_Prunable_Deep_Neural_Networks/22561567, PDF: https://www.techrxiv.org/articles/preprint/A_Multiply-And-Max_min_Neuron_Paradigm_for_Aggressively_Prunable_Deep_Neural_Networks/22561567/1/files/40119760.pdf
Yunsheng Li, Yinpeng Chen, Xiyang Dai, Dongdong Chen, Mengchen Liu, Lu Yuan, Zicheng Liu, Lei Zhang, Nuno Vasconcelos, MicroNet: Improving Image Recognition with Extremely Low FLOPs, 2021, https://ieeexplore.ieee.org/abstract/document/9857393 PDF: https://openaccess.thecvf.com/content/ICCV2021/papers/Li_MicroNet_Improving_Image_Recognition_With_Extremely_Low_FLOPs_ICCV_2021_paper.pdf (Uses shift-max algorithm.)
Goodfellow, I. J., Warde-Farley, D., Mirza, M., Courville, A., and Bengio, Y., Maxout networks. In Proceedings of the International Conference on Machine Learning (ICML), 2013, https://arxiv.org/abs/1302.4389 (Uses maximum operator arithmetic.)
S. Fan, L. Liu, and Y. Luo. An alternative practice of tropical convolution to traditional convolutional neural networks. In 2021 The 5th International Conference on Compute and Data Analysis, pages 162–168, 2021, https://arxiv.org/abs/2103.02096 (Tropical arithmetic)
David Spuler, March 2024, Chapter 51. Zero-Multiplication Models, Generative AI in C++: Coding Transformers and LLMs, https://www.amazon.com/dp/B0CXJKCWX9

Morphological Networks

Another type of neural network that uses max operations is called the "morphological network". This uses maximum, addition, and subtraction operations.

Zamora, E., Sossa, H.: Dendrite morphological neurons trained by stochastic gradient descent. Neurocomputing 260, 420–431 (2017), https://ieeexplore.ieee.org/document/7849933
G. Ritter and P. Sussner, “An introduction to morphological neural networks,” Proceedings of 13th International Conference on Pattern Recognition (ICPR), vol. 4, pp. 709–717 vol.4, 1996, https://ieeexplore.ieee.org/abstract/document/547657 (Earliest multiplication-free neural network? Uses add and max.)
Limonova E, Matveev D, Nikolaev D, Arlazarov VV., Bipolar morphological neural networks: convolution without multiplication, In: Twelfth International Conference on Machine Vision (ICMV 2019), 2020, vol. 11433, p. 11433, International Society for Optics and Photonics, https://arxiv.org/abs/1911.01971
Elena Limonova, Daniil Alfonso, Dmitry Nikolaev, Vladimir V. Arlazarov, ResNet-like Architecture with Low Hardware Requirements, 2021, https://arxiv.org/pdf/2009.07190 (Algorithm based on max and addition.)
G. X. Ritter, L. Iancu, and G. Urcid, “Morphological perceptrons with dendritic structure,” in The 12th IEEE International Conference on Fuzzy Systems (FUZZ), 2003. FUZZ ’03., vol. 2, May 2003, pp. 1296–1301 vol.2, https://ieeexplore.ieee.org/document/1206618 (Dendritic structure algorithm based on "lattice algebra".)
Mondal R, Santra S, Mukherjee S, Chanda B., Morphological Network: How Far Can We Go with Morphological Neurons? 2022. arXiv:1901.00109 http://arxiv.org/abs/1901.00109 (Examines the theory of maximum of a sum, called "dilation", and minimum of differences, called "erosion", in neural networks.)
Pessoa, L.F., Maragos, P.: Neural networks with hybrid morphological/rank/linear nodes: a unifying framework with applications to handwritten character recognition. Pattern Recognition 33(6), 945–960 (2000), https://www.sciencedirect.com/science/article/abs/pii/S0031320399001570 (Various neurons including min-max methods.)
Goodfellow, I.J., Warde-Farley, D., Mirza, M., Courville, A., Bengio, Y.: Maxout networks. arXiv preprint arXiv:1302.4389 (2013), https://arxiv.org/abs/1302.4389 (Paper on "dropout" using maximum function.)
Charisopoulos, V., Maragos, P.: Morphological perceptrons: geometry and training algorithms. In: International Symposium on Mathematical Morphology and Its Applications to Signal and Image Processing. pp. 3–15 (2017), https://link.springer.com/chapter/10.1007/978-3-319-57240-6_1
Wilson, S.S.: Morphological networks. In: Visual Communications and Image Processing IV. vol. 1199, pp. 483–496 (1989) https://www.spie.org/Publications/Proceedings/Paper/10.1117/12.970058?SSO=1
Davidson, J.L., Ritter, G.X.: Theory of morphological neural networks. In: Digital Optical Computing II. vol. 1215, pp. 378–389 (1990), https://www.semanticscholar.org/paper/Theory-of-morphological-neural-networks-Davidson-Ritter/3d459fb68b8f1dc66e239d2404afb6702950a246
Ranjan Mondal, Sanchayan Santra, Soumendu Sundar Mukherjee, Bhabatosh Chanda, Dec 2022, Morphological Network: How Far Can We Go with Morphological Neurons? https://arxiv.org/abs/1901.00109
Mondal, R., Santra, S., Chanda, B.: Dense morphological network: An universal function approximator. arXiv preprint arXiv:1901.00109 (2019), https://arxiv.org/abs/1901.00109v1, PDF: https://arxiv.org/pdf/1901.00109v2.pdf
Peter Sussner; Estevao Laureano Esmi, An introduction to morphological perceptrons with competitive learning, 2009 International Joint Conference on Neural Networks https://ieeexplore.ieee.org/document/5178860
Ritter, G., Urcid, G., 2003, Lattice algebra approach to single-neuron computation. IEEE Transactions on Neural Networks 14(2), 282–295 (2003) https://ieeexplore.ieee.org/document/1189627
David Spuler, March 2024, Chapter 51. Zero-Multiplication Models, Generative AI in C++: Coding Transformers and LLMs, https://www.amazon.com/dp/B0CXJKCWX9

Other Addition-Related Zero-Multiplication Networks

These are papers that use addition to attain zero-multiplication, but not the specific techniques above.

Baluja S, Marwood D, Covell M, Johnston N., No multiplication? no floating point? no problem! training networks for efficient inference, 2018, arXiv preprint arXiv:1809.09244, http://arxiv.org/abs/1809.09244 (This paper is mainly about low-bit integer quantization to avoid multiplication.)
Kai Han, Yunhe Wang, Qi Tian, Jianyuan Guo, Chunjing Xu, and Chang Xu. Ghostnet: More features from cheap operations. arXiv preprint arXiv:1911.11907, 2019, https://arxiv.org/abs/1911.11907 (Applies linear operations to create extra "ghost features", rather than a simple additive neural network.)
O. Yildiz. Training methodology for a multiplication free implementable operator based neural networks. Master’s thesis, Middle East Technical University, 2017. URL https://hdl.handle.net/11511/26664
O. Yildiz, “Training Methodology for a Multiplication Free Implementable Operator Based Neural Networks,” M.S. - Master of Science, Middle East Technical University, 2017. https://open.metu.edu.tr/handle/11511/26664 PDF: http://etd.lib.metu.edu.tr/upload/12621234/index.pdf
Atli Kosson, Martin Jaggi, Hardware-Efficient Transformer Training via Piecewise Affine Operations, May 2023, https://arxiv.org/abs/2305.17190
J. Johnson. Rethinking floating point for deep learning. arXiv preprint arXiv:1811.01721, 2018, https://arxiv.org/abs/1811.01721 ("log float multiply-add" in hardware)
Mark Horowitz, 1.1 computing’s energy problem (and what we can do about it), In 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), pages 10–14. IEEE, 2014, https://doi.org/10.1109/ISSCC.2014.6757323
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. Mobilenetv2: Inverted residuals and linear bottlenecks. In CVPR, pages 4510–4520, 2018, https://arxiv.org/abs/1801.04381
Z. Du, K. Palem, A. Lingamneni, O. Temam, Y. Chen, and C. Wu, “Leveraging the error resilience of machine-learning applications for designing highly energy efficient accelerators,” in 2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC), 2014, pp. 201–206. https://ieeexplore.ieee.org/document/6742890
Jingyong Cai, Masashi Takemoto,Yuming Qiu andHironori Nakajo, Trigonometric Inference Providing Learning in Deep Neural Networks, Appl. Sci. 2021, 11(15), 6704; https://doi.org/10.3390/app11156704, https://www.mdpi.com/2076-3417/11/15/6704, PDF: https://www.mdpi.com/2076-3417/11/15/6704/pdf
Afrasiyabi A, Badawi D, Nasir B, Yildi O, Vural FTY, Çetin AE. Non-euclidean vector product for neural networks. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2018, pp. 6862–6866. https://ieeexplore.ieee.org/document/8461709, PDF: https://par.nsf.gov/servlets/purl/10067379

Table Lookups Replace Multiplication

Table lookups of precomputed data are a well-known code optimization that has been applied to inference optimization. Pre-computation can be effective for low-bit arithmetic or for approximating the value of non-linear functions that are computationally expensive to evaluate exactly.

One partial method to remove multiplication is to use table lookups instead. This would seem to remove multiplication, although there's actually a hidden multiplication of the array indices involved in table lookups, though hopefully handled efficiently by the compiler (probably as a bitshift).

Zhou, A.; Yao, A.; Guo, Y.; Xu, L.; and Chen, Y., 2017, Incremental network quantization: Towards lossless CNNs with low-precision weight, arXiv preprint arXiv:1702.03044, https://arxiv.org/abs/1702.03044 (bitshifting)
S Fanning, Fixed Point Multiplication-Free Implementation of Deep Neural Networks for Embedded Systems, Masters Thesis, School of Electrical and Electronic Engineering, University College Dublin 2018, https://seanfanning.eu/posts/projects/low-bitwidth-neural-networks/Thesis_SeanFanning_13360951.pdf
Mohammad Samragh Razlighi; Mohsen Imani; Farinaz Koushanfar; Tajana Rosing LookNN: Neural network with no multiplication, Design, Automation & Test in Europe Conference & Exhibition (DATE), 27-31 March 2017, https://ieeexplore.ieee.org/document/7927280 (Lookup-table based multiplication.)
Daniel Gerlinghoff, Benjamin Chen Ming Choong, Rick Siow Mong Goh, Weng-Fai Wong, Tao Luo, 18 Mar 2024, Table-Lookup MAC: Scalable Processing of Quantised Neural Networks in FPGA Soft Logic, https://arxiv.org/abs/2403.11414 (Replacing Multiply-Accumulate MAC on hardware with lookup-tables for low-bit quantization.)

Multiplication-Free Neural Networks

There are zero-multiplication models that use some other arithmetic method instead of addition. In the literature, these algorithms are often called "zero multiplication" or "multiplication-free" algorithms, such as Multiplication-Free Neural Networks (MFNNs). Binary quantization and ternary quantization are not the only options. Papers on inference without any multiplication operations:

Elhoushi, M.; Chen, Z.; Shafiq, F.; Tian, Y. H.; and Li, J. Y., 2019, Deepshift: Towards multiplication-less neural networks, arXiv preprint arXiv:1905.13298, https://arxiv.org/abs/1905.13298 (bitshifting)
Covell M, Marwood D, Baluja S, Johnston N., Table-based neural units: Fully quantizing networks for multiply-free inference, 2019, arXiv preprint arXiv:1906.04798, http://arxiv.org/abs/1906.04798
S Fanning, Fixed Point Multiplication-Free Implementation of Deep Neural Networks for Embedded Systems, Masters Thesis, School of Electrical and Electronic Engineering, University College Dublin 2018, https://seanfanning.eu/posts/projects/low-bitwidth-neural-networks/Thesis_SeanFanning_13360951.pdf
Dogaru Radu, Chua Leon O., The comparative synapse: a multiplication free approach to neuro-fuzzy classifiers, IEEE Transact Circuits Syst. 1999;46(11):1366–71, https://ieeexplore.ieee.org/document/802828
Akbaş CE, Bozkurt A, çetin AE, çetin-Atalay R, Üner A., Multiplication-free neural networks, In: 2015 23nd Signal Processing and Communications Applications Conference (SIU), 2015, pp. 2416–2418, https://doi.org/10.1109/SIU.2015.7130369
Mallah M. Multiplication free neural networks. PhD thesis, Bilkent University, 2018, https://repository.bilkent.edu.tr/handle/11693/35722
D. Badawi, E. Akhan, M. Mallah, A. Uner, R. Cetin-Atalay, and A. E. Cetin, “Multiplication free neural network for cancer stem cell detection in h-and-e stained liver images,” in SPIE Commercial+ Scientific Sensing and Imaging, pp. 102110C–102110C, International Society for Optics and Photonics, 2017, https://www.semanticscholar.org/paper/Multiplication-free-neural-network-for-cancer-stem-Badawi-Akhan/7fcd649a83cc3ded38296a26d87e96a718ea8cf2
A. Suhre, F. Keskin, T. Ersahin, R. Cetin-Atalay, R. Ansari, and A. E. Cetin, “A multiplication-free framework for signal processing and applications in biomedical image analysis,” in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, pp. 1123–1127, IEEE, 2013, https://ieeexplore.ieee.org/document/6637825
H. Tuna, I. Onaran, and A. E. Cetin, “Image description using a multiplier-less operator,” IEEE Signal Processing Letters, vol. 16, no. 9, pp. 751–753, 2009, https://www.infona.pl/resource/bwmeta1.element.ieee-art-000005067301
Zhouhan Lin, Matthieu Courbariaux, Roland Memisevic, Yoshua Bengio, Feb 2016, Neural Networks with Few Multiplications, https://arxiv.org/abs/1510.03009v1
S Fanning, Fixed Point Multiplication-Free Implementation of Deep Neural Networks for Embedded Systems, Masters Thesis, School of Electrical and Electronic Engineering, University College Dublin 2018, https://seanfanning.eu/posts/projects/low-bitwidth-neural-networks/Thesis_SeanFanning_13360951.pdf
Efficient Deep Spiking Multi-Layer Perceptrons with Multiplication-Free Inference Boyan Li, Luziwei Leng, Ran Cheng, Shuaijie Shen, Kaixuan Zhang, Jianguo Zhang, and Jianxing Liao, 2023, arXiv preprint, https://arxiv.org/abs/2306.12465
Chang Liu, Rui Zhang, Xishan Zhang, Yifan Hao, Zidong Du, Xing Hu, Ling Li, Qi Guo, Ultra-low Precision Multiplication-free Training for Deep Neural Networks, Feb 2023, https://arxiv.org/abs/2302.14458
Bichen Wu, Alvin Wan, Xiangyu Yue, Peter Jin, Sicheng Zhao, Noah Golmant, Amir Gholaminejad, Joseph Gonzalez, and Kurt Keutzer. Shift: A Zero FLOP, Zero Parameter Alternative to Spatial Convolutions, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9127–9135, 2018. https://arxiv.org/abs/1711.08141
Giorgos Armeniakos, Georgios Zervakis, Dimitrios Soudris, Jörg Henkel, "Hardware Approximate Techniques for Deep Neural Network Accelerators: A Survey", ACM Computing Surveys, vol.55, no.4, pp.1, 2023. https://doi.org/10.1145/3527156, https://arxiv.org/abs/2203.08737 (Survey of many approximate techiques in AI including multiplier-less algorithms.)
J Cai, 2022, Log-or-Trig: Towards efficient learning in deep neural networks Thesis, Graduate School of Engineering, Tokyo University of Agriculture and Technology, https://tuat.repo.nii.ac.jp/?action=repository_action_common_download&item_id=1994&item_no=1&attribute_id=16&file_no=3, PDF: https://tuat.repo.nii.ac.jp/index.php?action=pages_view_main&active_action=repository_action_common_download&item_id=1994&item_no=1&attribute_id=16&file_no=1&page_id=13&block_id=39 (Examines LNS and trigonometric approximations.)
Jiajia Chen & Chip-Hong Chang, 2017, Double-Base Number System and Its Application in FIR Filter Design, Embedded Systems Design with Special Arithmetic and Number Systems, pp. 277–310, https://link.springer.com/chapter/10.1007/978-3-319-49742-6_11 (Uses DBNS for multiplication-free algorithm in signal processing.)

Diff-Squared Networks

Squaring the difference between two numbers is well-known in Euclidean distance calculations and statistical variance. This idea has been applied to neural networks as "diff-squared networks". Some methods cited by other papers as "multiplication-free" model research compute a difference (subtraction), but then square it, which is technically still multiplication, but who's counting? However, it isn't using multiplication by weights, so it's a distinct method.

Research papers on squared differences and neural networks:

Xinlin Li, Mariana Parazeres, Adam Oberman, Alireza Ghaffari, Masoud Asgharian & Vahid Partovi Nia, EuclidNets: An Alternative Operation for Efficient Inference of Deep Learning Models, SN Computer Science, volume 4, 2023, https://link.springer.com/article/10.1007/s42979-023-01921-y (This uses the square of the difference, which is really still multiplication.)
Xinlin Li, Mariana Parazeres, Adam Oberman, Alireza Ghaffari, Masoud Asgharian, Vahid Partovi Nia, EuclidNets: An Alternative Operation for Efficient Inference of Deep Learning Models Dec 2022, https://arxiv.org/abs/2212.11803 (uses squares and Euclidean distances as weights)
S. Fan, L. Liu, and Y. Luo. An alternative practice of tropical convolution to traditional convolutional neural networks. In 2021 The 5th International Conference on Compute and Data Analysis, pages 162–168, 2021, https://arxiv.org/abs/2103.02096 (Tropical arithmetic)
Y. Luo and S. Fan. Min-max-plus neural networks. arXiv preprint arXiv:2102.06358, 2021, https://arxiv.org/abs/2102.06358 (Tropical arithmetic)
Robert Tibshirani, Regression Shrinkage and Selection via the Lasso Journal of the Royal Statistical Society. Series B (Methodological), Vol. 58, No. 1 (1996), pp. 267-288, https://www.jstor.org/stable/2346178 (Low-level mathematical paper from 1996 about the additive-squares method.)
David Spuler, March 2024, Chapter 51. Zero-Multiplication Models, Generative AI in C++: Coding Transformers and LLMs, https://www.amazon.com/dp/B0CXJKCWX9

Bitwise Operators for Inference

Instead of addition, could any of the bitwise operations be used to replace multiplication? Yes, and there are various possibilities. Some examples are below; see also bitwise operators for inference efficiency.

Bitshift operators: The bitwise shift operators are good candidates for replacing multiplication (or division) by a power-of-2 integer, as discussed under power-of-two quantization (logarithmic quantization). This is a well-known optimization and has considerable research.

Bitwise-or is a possible candidate, that doesn't seem to be considered in the research. When applied to positive integers, bitwise-or has some properties similar to addition, but its result will either equal or be less than addition result, but greater than or equal to either of the two operands. This assumes bitwise-or on unsigned weights, such as via integer quantization to positive weights, because bitwise-or on negative signed numbers has various oddities. As such, bitwise-or with quantization could be considered for some of the above algorithms that use addition to replace multiplication. The accumulated sum based on bitwise-or would increase slightly more slowly than with pure addition.

Bitwise-and is another candidate, similar to bitwise-or, as it will also be less than or equal to the addition result, but the result will be less than or equal to either operand. This seems less desirable than bitwise-or, but it's all a bit unclear. Experiments are required.

Bitwise-xor seems too odd for realistic usage. It has rather strange mathematical properties. But, who knows.

The use of the bitwise operators (or/and/xor) with quantization for non-multiplication inference is an area that needs some exploration. No papers were found yet.

More Zero Multiplication Research

More general papers on zero-multiplication models:

Rui-Jie Zhu, Yu Zhang, Ethan Sifferman, Tyler Sheaves, Yiqiao Wang, Dustin Richmond, Peng Zhou, Jason K. Eshraghian, 4 Jun 2024, Scalable MatMul-free Language Modeling, https://arxiv.org/abs/2406.02528 Code: https://github.com/ridgerchu/matmulfreellm (Uses addition via ternary quantization and elementwise Hadamard products to replace MatMul.)
Giorgos Armeniakos, Georgios Zervakis, Dimitrios Soudris, Jörg Henkel, 2022, Hardware Approximate Techniques for Deep Neural Network Accelerators: A Survey, ACM Computing Surveys, Volume 55, Issue 4, No. 83, pp 1–36 https://doi.org/10.1145/3527156, https://dl.acm.org/doi/10.1145/3527156, https://arxiv.org/abs/2203.08737
A Roy, K Roy, 2023, HADES: Hardware/Algorithm Co-design in DNN accelerators using Energy-efficient Approximate Alphabet Set Multipliers, arXiv preprint arXiv:2302.01990, https://arxiv.org/abs/2302.01990
Tae Jun Ham, Sung Jun Jung, Seonghak Kim, Young H Oh, Yeonhong Park, Yoonho Song, Jung-Hun Park, Sanghee Lee, Kyoung Park, Jae W Lee, et al. A^3: Accelerating attention mechanisms in neural networks with approximation. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 328–341. IEEE, 2020. https://arxiv.org/abs/2002.10941
O. Yildiz. 2017. Training methodology for a multiplication free implementable operator based neural networks. Master’s thesis, Middle East Technical University. URL https://hdl.handle.net/11511/26664.
Gudovskiy, D. A.; and Rigazio, L., 2017, Shiftcnn: Generalized low-precision architecture for inference of convolutional neural networks, arXiv preprint arXiv:1706.0239, https://arxiv.org/abs/1706.02393 -
Bichen Wu, Alvin Wan, Xiangyu Yue, Peter Jin, Sicheng Zhao, Noah Golmant, Amir Gholaminejad, Joseph Gonzalez, and Kurt Keutzer. 2018. Shift: A zero flop, zero parameter alternative to spatial convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9127–9135, https://arxiv.org/pdf/1711.08141.pdf
David Spuler, March 2024, Chapter 51. Zero-Multiplication Models, Generative AI in C++: Coding Transformers and LLMs, https://www.amazon.com/dp/B0CXJKCWX9
Yuxiang Huan, Yifan Qin, Yantian You, Lirong Zheng, and Zhuo Zou. Sep 2016. A multiplication reduction technique with near-zero approximation for embedded learning in IoT devices. 2016 29th IEEE International System-on-Chip Conference (SOCC), 102–107. https://ieeexplore.ieee.org/abstract/document/7905445 (Avoids near-zero low multiplications on small values, by counting the number of prefix zeros in the floating point representation using bitwise arithmetic.)
O Spantidi, I Anagnostopoulos, 2023, The Perfect Match: Selecting Approximate Multipliers for Energy-Efficient Neural Network Inference, https://ieeexplore.ieee.org/abstract/document/10147918/
RP Tripathi, M Tiwari, A Dhawan, SK Jha, 2023, Efficient Multiplier-less Perceptron Architecture for Realization of Multilayer Perceptron Inference Models, https://link.springer.com/article/10.1007/s00034-023-02318-1
A Siddique, MI Vai, SH Pun, 2023, A low cost neuromorphic learning engine based on a high performance supervised SNN learning algorithm, https://www.nature.com/articles/s41598-023-32120-7
S Dey, P Dasgupta, PP Chakrabarti, 2023, DietCNN: Multiplication-free Inference for Quantized CNNs, https://arxiv.org/abs/2305.05274
Q Song, W Cui, L Sun, G Jin, 2023, Design and Implementation of a Universal Shift Convolutional Neural Network Accelerator, https://ieeexplore.ieee.org/abstract/document/10005141/
Y Zhang, S Wang, Y Kang, 2023, MF-DSNN: An Energy-efficient High-performance Multiplication-free Deep Spiking Neural Network Accelerator, https://ieeexplore.ieee.org/abstract/document/10168643/
AR Nair, PK Nath, S Chakrabartty, 2023, Multiplierless In-filter Computing for tinyML Platforms, https://arxiv.org/abs/2304.11816
Tschannen, M., Khanna, A. & Anandkumar, A, 2018, StrassenNets: deep learning with a multiplication budget. In International Conference on Machine Learning 4985–4994 (PMLR, 2018). https://arxiv.org/abs/1712.03942
Weihong Xu; Zaichen Zhang; Xiaohu You; Chuan Zhang, 2018, Efficient Deep Convolutional Neural Networks Accelerator without Multiplication and Retraining, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), DOI: 10.1109/ICASSP.2018.8461627, https://ieeexplore.ieee.org/abstract/document/8461627 PDF slides: https://sigport.org/sites/default/files/docs/ICASSP%20Paper%20%232491%20Slides.pdf
J. Cloutier; P.Y. Simard, Sep 1994, Hardware implementation of the backpropagation without multiplication, Proceedings of the Fourth International Conference on Microelectronics for Neural Networks and Fuzzy Systems, 26-28 September 1994, https://ieeexplore.ieee.org/abstract/document/593174
H. Tann, S. Hashemi, I. Bahar, and S. Reda, “Hardware-software codesign of accurate, multiplier-free deep neural networks,” CoRR, vol. abs/1705.04288, 2017. [Online]. Available: http://arxiv.org/abs/1705.04288
Tianyi Zhang, Jonah Wonkyu Yi, Bowen Yao, Zhaozhuo Xu, Anshumali Shrivastava, 2 Mar 2024, NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-add-free Attention, https://arxiv.org/abs/2403.01273 Code: https://github.com/tonyzhang617/nomad-dist (Converts 4-bit vector dot products to using SIMD registers as lookup tables on CPUs.)
Emily Cerf, June 20, 2024, Researchers run high-performing large language model on the energy needed to power a lightbulb, https://news.ucsc.edu/2024/06/matmul-free-llm.html
Benj Edwards, 26 June, 2024, Researchers upend AI status quo by eliminating matrix multiplication in LLMs, https://arstechnica.com/information-technology/2024/06/researchers-upend-ai-status-quo-by-eliminating-matrix-multiplication-in-llms/
Haoran You, Yipin Guo, Yichao Fu, Wei Zhou, Huihong Shi, Xiaofan Zhang, Souvik Kundu, Amir Yazdanbakhsh, Yingyan (Celine)Lin, 11 Jun 2024 (v2), ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization, https://arxiv.org/abs/2406.05981 Code: https://github.com/GATECH-EIC/ShiftAddLLM (Post-training conversion of LLMs to non-multiplication shift-add architectures.)
X. Geng, S. Liu, J. Jiang, K. Jiang and H. Jiang, 2024, Compact Powers-of-Two: An Efficient Non-Uniform Quantization for Deep Neural Networks, 2024 Design, Automation & Test in Europe Conference & Exhibition (DATE), Valencia, Spain, 2024, pp. 1-6, doi: 10.23919/DATE58400.2024.10546652, https://ieeexplore.ieee.org/abstract/document/10546652
Yipin Guo, Zihao Li, Yilin Lang, Qinyuan Ren, 3 Jul 2024, ShiftAddAug: Augment Multiplication-Free Tiny Neural Network with Hybrid Computation, https://arxiv.org/abs/2407.02881 (Hybrid of fast shift-add multiplication-free computations with some slower multiplication operators.)
Van Minh Nguyen, Cristian Ocampo, Aymen Askri, Louis Leconte, Ba-Hien Tran, 25 May 2024, BOLD: Boolean Logic Deep Learning, https://arxiv.org/abs/2405.16339 (Unique method of training using Booleans, which is similar to binary networks or zero-multiplication models.)
H. Diao et al., 2024, A Multiply-Less Approximate SRAM Compute-In-Memory Macro for Neural-Network Inference, IEEE Journal of Solid-State Circuits, doi: 10.1109/JSSC.2024.3433417, https://ieeexplore.ieee.org/abstract/document/10622078
C. Hakert, K. -H. Chen and J. -J. Chen, 2024, FLInt: Exploiting Floating Point Enabled Integer Arithmetic for Efficient Random Forest Inference, 2024 Design, Automation & Test in Europe Conference & Exhibition (DATE), Valencia, Spain, 2024, pp. 1-2, doi: 10.23919/DATE58400.2024.10546851, https://ieeexplore.ieee.org/abstract/document/10546851
Davis Blalock, John Guttag, 21 Jun 2021, Multiplying Matrices Without Multiplying, https://arxiv.org/abs/2106.10860
Duy-Thanh Nguyen, Abhiroop Bhattacharjee, Abhishek Moitra, Priyadarshini Panda, 9 Feb 2023, DeepCAM: A Fully CAM-based Inference Accelerator with Variable Hash Lengths for Energy-efficient Deep Neural Networks, https://arxiv.org/abs/2302.04712
Jie Ran, Rui Lin, Jason Chun Lok Li, Jiajun Zhou, Ngai Wong, 13 Aug 2022, PECAN: A Product-Quantized Content Addressable Memory Network, https://arxiv.org/abs/2208.13571
Chang Liu, Rui Zhang, Xishan Zhang, Yifan Hao, Zidong Du, Xing Hu, Ling Li, Qi Guo, 28 Feb 2023, Ultra-low Precision Multiplication-free Training for Deep Neural Networks, https://arxiv.org/abs/2302.14458
Hongyin Luo, Wei Sun, 2 Oct 2024 (v2), Addition is All You Need for Energy-efficient Language Models, https://arxiv.org/abs/2410.00907 (This looks similar to Mogami add-as-integer method.)
Felix Petersen, Hilde Kuehne, Christian Borgelt, Julian Welzel, Stefano Ermon, 7 Nov 2024, Convolutional Differentiable Logic Gate Networks, 38th Conference on Neural Information Processing Systems (NeurIPS 2024), https://arxiv.org/abs/2411.04732
Luca Colombo, Fabrizio Pittorino, Manuel Roveri, 28 Nov 2024, Training Multi-Layer Binary Neural Networks With Local Binary Error Signals, https://arxiv.org/abs/2412.00119
Zihao Zheng, Yuanchun Li, Jiayu Chen, Peng Zhou, Xiang Chen, Yunxin Liu, 18 Dec 2024, Threshold Neuron: A Brain-inspired Artificial Neuron for Efficient On-device Inference, https://arxiv.org/abs/2412.13902 (Multiplication-free model architecture using comparisons and subtraction, including a threshold mechanism that make it analogous to activation sparsification.)
SK Jha, S Jha, R Ewetz, A Velasquez, 2024, Onthe Design of Novel XOR-based A!ention Mechanism for Enhanced E"iciency of Transformers, DAC’24, June 23–27, 2024, San Francisco, CA, USA, https://sumitkumarjha.com/assets/pdf/2024_DAC_Jha_XOR_Attention_Transformer.pdf

Zero Skipping (Avoiding Multiplication by Zero)

See Zero skipping research.

Hadamard Product Matrix Multiplication Models

These models are not technically "zero multiplication" LLMs, but they offer a tenfold reduction in multiplication arithmetic. The idea involves simple element-wise multiplication of each element of two matrices, rather than a dot product computaton for each element, which is called the "Hadamard product" of two matrices. Basic matrix multiplication is O(n³) whereas Hadamard computations are O(n²), so it's potentially a tenfold decrease, and also a simpler algorithm that's more amenable to followup kernel optimizations like vectorization and kernel fusion.

Research papers on Hadamard products in AI engines include:

Uzair Shah1, Jens Schneider, Giovanni Pintore, Enrico Gobbetti, Mahmood Alzubaidi, Mowafa Househ, Marco Agus, 2024, EleViT: exploiting element-wise products for designing efficient and lightweight vision transformers, https://publications.crs4.it/pubdocs/2024/SSPGAHA24/cvprt4v2024-elevit.pdf
Rui-Jie Zhu, Yu Zhang, Ethan Sifferman, Tyler Sheaves, Yiqiao Wang, Dustin Richmond, Peng Zhou, Jason K. Eshraghian, 18 Jun 2024 (v5), Scalable MatMul-free Language Modeling, https://arxiv.org/abs/2406.02528 Code: https://github.com/ridgerchu/matmulfreellm (Uses addition and element-wise Hadamard product to replace full matrix multiplication.)
Emily Cerf, June 20, 2024, Researchers run high-performing large language model on the energy needed to power a lightbulb, https://news.ucsc.edu/2024/06/matmul-free-llm.html
Benj Edwards, 26 June, 2024, Researchers upend AI status quo by eliminating matrix multiplication in LLMs, https://arstechnica.com/information-technology/2024/06/researchers-upend-ai-status-quo-by-eliminating-matrix-multiplication-in-llms/
John Yang, Le An, Su Inn Park, 11 Jun 2024, ReduceFormer: Attention with Tensor Reduction by Summation, https://arxiv.org/abs/2406.07488
David Spuler, 25th August, 2024, Hot Inference Optimization Techniques, https://www.aussieai.com/blog/hot-inference-research
Ziyang Chen, Yongjun Zhang, Wenting Li, Bingshu Wang, Yabo Wu, Yong Zhao, C.L. Philip Chen, 2 Jan 2025, Hadamard Attention Recurrent Transformer: A Strong Baseline for Stereo Matching Transformer, https://arxiv.org/abs/2501.01023 https://github.com/ZYangChen/HART
Krish Agarwal, Rishi Astra, Adnan Hoque, Mudhakar Srivatsa, Raghu Ganti, Less Wright, Sijia Chen, December 02, 2024, HadaCore: Tensor Core Accelerated Hadamard Transform Kernel, IBM and Meta, https://pytorch.org/blog/hadacore/