Aussie AI

Integer Arithmetic

  • Last Updated 28 November, 2024
  • by David Spuler, Ph.D.

Integer arithmetic replacing floating point calculations is a well-known optimization. In regard to AI, everyone thinks of quantization, which is the most common use of integer arithmetic. However, it's not the only place where integer arithmetic optimizations can be used.

Quantization: A list of AI quantization techniques that involve integer arithmetic include:

A full integer-only implementation of quantization will also have integer arithmetic not just in the MatMuls, but also in all Transformer components:

Non-Quantization Integers: A list of AI non-quantization optimization techniques that involve integer arithmetic includes:

Integer Arithmetic

Research papers on integer arithmetic in AI models:

  • Radha Gulhane, 2024, Accelerated and Memory-Efficient Distributed Deep Learning: Leveraging Quantization, Parallelism Techniques, and Mix-Match Runtime Communication , Masters Thesis, Computer Science and Engineering , The Ohio State University, https://etd.ohiolink.edu/acprod/odb_etd/ws/send_file/send?accession=osu1713381834648517&disposition=inline
  • Z Zou, C Zhang, S Chen, H Kou, B Liu, March 2024, Integer Arithmetic-Based and Activation-Aware GELU Optimization for Vision Transformer, 2024 Conference of Science and Technology for Integrated Circuits (CSTIC), 17-18 March 2024, https://ieeexplore.ieee.org/abstract/document/10531966/
  • Yi Guo, Fanliu Kong, Xiaoyang Li, Hui Li, Wei Chen, Xiaogang Tian, Jinping Cai, Yang Zhang, Shouda Liu, 19 Apr 2024, decoupleQ: Towards 2-bit Post-Training Uniform Quantization via decoupling Parameters into Integer and Floating Points, https://arxiv.org/abs/2404.12759 Code: https://github.com/bytedance/decoupleQ (Decouple parameters into integer and floating-point parts for more accurate quantization at low bitwidths.)
  • Xing Hu, Yuan Chen, Dawei Yang, Sifan Zhou, Zhihang Yuan, Jiangyong Yu, Chen Xu, 28 May 2024, I-LLM: Efficient Integer-Only Inference for Fully-Quantized Low-Bit Large Language Models, https://arxiv.org/abs/2405.17849 Code: https://anonymous.4open.science/r/I-LLM-F242/
  • Ruiqi Sun, Yinchen Ni, Xin He, Jie Zhao, An Zou, 1 Feb 2024, ONE-SA: Enabling Nonlinear Operations in Systolic Arrays for Efficient and Flexible Neural Network Inference, https://arxiv.org/abs/2402.00395
  • Alberto Marchisio, Davide Dura, Maurizio Capra, Maurizio Martina, Guido Masera, Muhammad Shafique, Apr 2023, SwiftTron: An Efficient Hardware Accelerator for Quantized Transformers, https://arxiv.org/abs/2304.03986 Code: https://github.com/albertomarchisio/SwiftTron
  • Ghadeer Jaradat, Mohammed Tolba, Ghada Alsuhli, Hani Saleh, Mahmoud Al-Qutayri, Thanos Stouraitis, Baker Mohammad, 7 Jul 2024, Hybrid Dynamic Pruning: A Pathway to Efficient Transformer Inference, https://arxiv.org/abs/2407.12893
  • Mohammadreza Tayaranian, Seyyed Hasan Mozafari, James J. Clark, Brett Meyer, Warren Gross, 2 Feb 2024, Faster Inference of Integer SWIN Transformer by Removing the GELU Activation, https://arxiv.org/abs/2402.01169 (Replace GELU with RELU.)
  • Fuwen Tan, Royson Lee, Łukasz Dudziak, Shell Xu Hu, Sourav Bhattacharya, Timothy Hospedales, Georgios Tzimiropoulos, Brais Martinez, 25 Aug 2024, MobileQuant: Mobile-friendly Quantization for On-device Language Models, https://arxiv.org/abs/2408.13933 https://github.com/saic-fi/MobileQuant
  • Penghao Xiao, Chunjie Zhang, Qian Guo, Xiayang Xiao, Haipeng Wang, 2024, Neural Networks Integer Computation: Quantizing Convolutional Neural Networks of Inference and Training for Object Detection in Embedded Systems, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, DOI 10.1109/JSTARS.2024.3452321, PDF: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10660473
  • Donghyeon Yi, Seoyoung Lee, Jongho Kim, Junyoung Kim, Sohmyung Ha, Ik Joon Chang, Minkyu Je, 22 Nov 2024, FLARE: FP-Less PTQ and Low-ENOB ADC Based AMS-PiM for Error-Resilient, Fast, and Efficient Transformer Acceleration, https://arxiv.org/abs/2411.14733

End-to-End Integer Arithmetic

Integers everywhere. That's the goal of end-to-end integer arithmetic for inference in a Transformer. The weights and activations as integers is the realm of integer-only arithmetic quantization. But other components also need to be processed as integers to achieve end-to-end integer-only inference, such as activation functions, normalization, and Softmax components.

Research papers on end-to-end integer arithmetic:

  • J Zhong, Z Liu, X Chen, Apr 2023, Transformer-based models and hardware acceleration analysis in autonomous driving: A survey, https://arxiv.org/abs/2304.10891 (Mainly focused on 8-bit integer arithmetic for machine vision Transformers.)
  • Zhewei Yao, Zhen Dong, Zhangcheng Zheng, Amir Gholami, Jiali Yu, Eric Tan, Leyuan Wang, Qijing Huang, Yida Wang, Michael Mahoney, Kurt Keutzer, HAWQ-V3: Dyadic Neural Network Quantization, Proceedings of the 38th International Conference on Machine Learning, PMLR 139:11875-11886, 2021, https://arxiv.org/abs/2011.10680 (Integers only in quantized weights and activations with INT4 or INT8, but also uses integers for batch normalization and residual connection components, too.)
  • Y. Lin, Y. Li, T. Liu et al., “Towards fully 8-bit integer inference for the transformer model,” in Proc. of IJCAI, 2020, pp. 3759–3765. https://arxiv.org/abs/2009.08034 (Integers for weights, but also for Softmax, layer normalization, and other components, by replacing or approximating non-linear functions such as exponential and square-root.)
  • Peng Peng, Mingyu You, Weisheng Xu, and Jiaxin Li. Fully integer-based quantization for mobile convolutional neural network inference. Neurocomputing, 432:194–205, 2021, https://www.sciencedirect.com/science/article/abs/pii/S0925231220319354 (Quantizes with INT4, but not only weights, but also has integer batch normalization.)
  • Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer, I-BERT: Integer-only BERT Quantization, Proceedings of the 38th International Conference on Machine Learning, PMLR 139:5506-5518, 2021, https://arxiv.org/abs/2101.01321, https://proceedings.mlr.press/v139/kim21d.html (I-BERT uses quantization, but also has integer arithmetic for GELU, Softmax, and Layer Normalization.)
  • Dong, Z., Yao, Z., Gholami, A., Mahoney, M. W., Keutzer, K., HAWQ: Hessian AWare Quantization of neural networks with mixed-precision. In The IEEE International Conference on Computer Vision (ICCV), October 2019. https://ieeexplore.ieee.org/document/9009512, https://arxiv.org/abs/1905.03696 (Early paper that isn't quite end-to-end with integers.)
  • Ruokai Yin, Yuhang Li, Abhishek Moitra, Priyadarshini Panda, Dec 2022, Training Integer-Only Deep Recurrent Neural Networks https://arxiv.org/abs/2212.11791 (Integer-only version of RNNs called iRNN, with integer-only layer normalization, integer-only attention, and piecewise linear approximation for integer-only activation functions such as tanh and sigmoid.)
  • R Yin, Y Li, A Moitra, P Panda, Sep 2023, MINT: Multiplier-less Integer Quantization for Spiking Neural Networks, https://arxiv.org/abs/2305.09850
  • Shuo Huai, Di Liu, Xiangzhong Luo, Hui Chen, Weichen Liu, Ravi Subramaniam, 2023, Crossbar-Aligned & Integer-Only Neural Network Compression for Efficient In-Memory Acceleration, ASPDAC '23: Proceedings of the 28th Asia and South Pacific Design Automation Conference, January 2023, Pages 234–239, https://doi.org/10.1145/3566097.3567856, https://dl.acm.org/doi/abs/10.1145/3566097.3567856
  • Z Zhang, B He, Z Zhang, 2023, Practical Edge Kernels for Integer-Only Vision Transformers Under Post-training Quantization, Proceedings of Machine Learning and Systems 5 pre-proceedings (MLSys 2023) mlsys2023, https://proceedings.mlsys.org/paper_files/paper/2023/hash/023560744aae353c03f7ae787f2998dd-Abstract-mlsys2023.html, PDF: https://proceedings.mlsys.org/paper_files/paper/2023/file/023560744aae353c03f7ae787f2998dd-Paper-mlsys2023.pdf (Integer-only-arithmetic quantization with integer-only versions of Softmax, LayerNorm, and GELU.)
  • Eyyüb Sari, Vanessa Courville, Vahid Partovi Nia, Feb 2022, iRNN: Integer-only Recurrent Neural Network, https://arxiv.org/abs/2109.09828
  • J Bartels, A Hagihara, L Minati, An Integer-Only Resource-Minimized RNN on FPGA for Low-Frequency Sensors in Edge-AI, 2023, IEEE Sensors Journal, Volume 23, Issue 15, 01 August 2023, https://ieeexplore.ieee.org/abstract/document/10161725/, PDF: https://ieeexplore.ieee.org/iel7/7361/4427201/10161725.pdf
  • Lin, Y., Zhang, T., Sun, P., Li, Z., and Zhou, S. FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer. Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22, pp. 1173–1179, 2022. https://arxiv.org/abs/2111.13824
  • A. Rock, A. Untether, O. Khalil, O. Shai, and P. Grouchy, 2022, INT8 Transformers for Inference Acceleration, 36th Conference on Neural Information Processing Systems (NeurIPS), PDF: https://neurips2022-enlsp.github.io/papers/paper_52.pdf
  • Victor J.B. Jung, Alessio Burrello, Moritz Scherer, Francesco Conti, Luca Benini, 3 Apr 2024, Optimizing the Deployment of Tiny Transformers on Low-Power MCUs, https://arxiv.org/abs/2404.02945 (Uses an approach called "Fused Weight Self-Attention" that fuses some of the QKV matrices and also tiling in multi-head attention, along with 8-bit integer quantization and integerized Softmax.)
  • David Spuler, March 2024, Chapter 53. Arithmetic Optimization Research, Generative AI in C++: Coding Transformers and LLMs, https://www.amazon.com/dp/B0CXJKCWX9
  • Yifei Liu, Jicheng Wen, Yang Wang, Shengyu Ye, Li Lyna Zhang, Ting Cao, Cheng Li, Mao Yang, 25 Sep 2024, VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models, https://arxiv.org/abs/2409.17066 https://arxiv.org/pdf/2409.17066

Integer Dot Product

The dot product of two vectors containing integers is also an integer.

Research papers on integer-based vector dot products:

More AI Research

Read more about: