Aussie AI

Bitshift Quantization (Power-of-Two)

Book Excerpt from "Generative AI in C++"

by David Spuler, Ph.D.

Bitshift Quantization (Power-of-Two)

The idea with bitshift quantization is to use power-of-2 integer weights and bitshift operations rather than integer multiplication. There is a significant trade-off in terms of accuracy of the model, since the number of distinct weights is greatly reduced. This is an active area of research that is well-known, with the earliest papers dating back to 1992 and 1993. However, software algorithms using bitshift seem unlikely to outperform hardware acceleration of integer multiplication, and hardware support is limited. Extending hardware accelerators to use bitshifting or the highest power-of-two approximate multiplication in hardware, presumably requiring fewer operations and less computing power (and reduced heat generation) seems an open area for further research. Note that the highest bit of an integer can be efficiently calculated using Brian Kernighan's algorithm (1988).

Research papers on bitshift power-of-two quantization:

Maarten Vandersteegen, Kristof Van Beeck and Toon Goedemé, 2023, Integer-Only CNNs with 4 Bit Weights and Bit-Shift Quantization Scales at Full-Precision Accuracy, Electronics, October 2021, 10(22), 2823, https://www.mdpi.com/2079-9292/10/22/2823
Yiren Zhao, Xitong Gao, Daniel Bates, Robert Mullins, Cheng-Zhong Xu, 2019, Focused Quantization for Sparse CNNs, Advances in Neural Information Processing Systems 32 (NeurIPS 2019), 2019, https://proceedings.neurips.cc/paper/2019/hash/58aaee7ae94b52697ad3b9275d46ec7f-Abstract.html
Dominika Przewlocka-Rus, Syed Shakib Sarwar, H. Ekin Sumbul, Yuecheng Li, Barbara De Salvo, 2022, Power-of-Two Quantization for Low Bitwidth and Hardware Compliant Neural Networks, Feb 2022, https://arxiv.org/abs/2203.05025
Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio, Quantized neural networks: Training neural networks with low precision weights and activations, The Journal of Machine Learning Research, 18(1):6869–6898, 2017, https://arxiv.org/abs/1609.07061.
T. Hokchhay, S. Hashemi, R. I. Bahar, and S. Reda, 2017, Hardware-software codesign of accurate, multiplier-free deep neural networks, in Proc. 54th Annu. Design Autom. Conf. (DAC), 2017, pp. 1–6., https://arxiv.org/abs/1705.04288
Yuhang Li, Xin Dong, and Wei Wang, 2020, Additive powers-of-two quantization: An efficient non-uniform discretization for neural networks, International Conference on Learning Representations, February 2020, https://arxiv.org/abs/1909.13144
Z. Lin, M. Courbariaux, R. Memisevic, and Y. Bengio. 2015, Neural networks with few multiplications, CoRR, abs/1510.03009, 2015. https://arxiv.org/abs/1510.03009 (Power-of-Two Quantization)
Soheil Hashemi; Nicholas Anthony; Hokchhay Tann; R. Iris Bahar; Sherief Reda, Understanding the impact of precision quantization on the accuracy and energy of neural networks, Design, Automation & Test in Europe Conference & Exhibition, March 2017, https://ieeexplore.ieee.org/abstract/document/7927224
Marchesi, Michele, Orlandi, Gianni, Piazza, Francesco, and Uncini, Aurelio, 1993, Fast neural networks without multipliers, IEEE Transactions on Neural Networks , 4(1):53–62, 1993, https://ieeexplore.ieee.org/document/182695
A. White and M. 1. Elmasry, 1992, The digi-neocognitron: a digital neocognitron neural network model for VLSI, IEEE Trans. Neural Networks, vol. 3. pp. 73-85, Jan. 1992, https://ieeexplore.ieee.org/document/105419
Kwan, Hon Keung and Tang, CZ, 1993, Multiplierless multilayer feedforward neural network design suitable for continuous input-output mapping, Electronics Letters, 29(14):1259–1260, 1993, https://digital-library.theiet.org/content/journals/10.1049/el_19930841
Sean Eron Anderson, 2023, Bit Twiddling Hacks (Kernighan Algorithm), https://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetKernighan
Peter Wegner, 1960, A technique for counting ones in a binary computer, Communications of the ACM, Volume 3, Issue 5, May 1960, https://doi.org/10.1145/367236.367286
Daisuke Miyashita, Edward H. Lee, and Boris Murmann, 2016, Convolutional Neural Networks using Logarithmic Data Representation, CoRR abs/1603.01025 (2016), https://arxiv.org/abs/1603.01025
Edward H. Lee, Daisuke Miyashita, Elaina Chai, Boris Murmann, and S. Simon Wong, 2017, LogNet: Energy-efficient neural networks using logarithmic computation, In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2017, New Orleans, LA, USA, March 5-9, 2017. 5900–5904. https://doi.org/10.1109/ICASSP.2017.7953288
Elhoushi, M.; Chen, Z.; Shafiq, F.; Tian, Y. H.; and Li, J. Y., 2019, Deepshift: Towards multiplication-less neural networks, arXiv preprint arXiv:1905.13298, https://arxiv.org/abs/1905.13298
Zhou, A.; Yao, A.; Guo, Y.; Xu, L.; and Chen, Y., 2017, Incremental network quantization: Towards lossless CNNs with low-precision weight, arXiv preprint arXiv:1702.03044, https://arxiv.org/abs/1702.03044
J Cai, M Takemoto, H Nakajo, 2018, A deep look into logarithmic quantization of model parameters in neural networks, https://dl.acm.org/doi/abs/10.1145/3291280.3291800
HyunJin Kim; Min Soo Kim; Alberto A. Del Barrio; Nader Bagherzadeh, 2019, A cost-efficient iterative truncated logarithmic multiplication for convolutional neural networks, IEEE 26th Symposium on Computer Arithmetic (ARITH), https://ieeexplore.ieee.org/abstract/document/8877474
X Li, B Liu, RH Yang, V Courville, C Xing, VP Nia, 2023, DenseShift: Towards Accurate and Efficient Low-Bit Power-of-Two Quantization, Proceedings of the IEEE/CVF, https://openaccess.thecvf.com/content/ICCV2023/papers/Li_DenseShift_Towards_Accurate_and_Efficient_Low-Bit_Power-of-Two_Quantization_ICCV_2023_paper.pdf (Extends log quantization to floating-point numbers efficiently by using a bitwise trick of integer addition on the sign and exponent bits of 32-bit IEEE 754 floats.)

See also more sum papers on power-of-two quantization at https://www.aussieai.com/research/quantization#logarithmic.

• Next:

• Up: Table of Contents

• Buy: Generative AI in C++: Coding Transformers and LLMs

The new AI programming book by Aussie AI co-founders:

AI coding in C++
Transformer engine speedups
LLM models
Phone and desktop AI
Code examples
Research citations

Get your copy from Amazon: Generative AI in C++

Aussie AI

Bitshift Quantization (Power-of-Two)

Bitshift Quantization (Power-of-Two)

Quick Links

Product

New to Writing?

Writing Styles