Aussie AI

Bitshift Quantization (Power-of-Two)

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

Bitshift Quantization (Power-of-Two)

The idea with bitshift quantization is to use power-of-2 integer weights and bitshift operations rather than integer multiplication. There is a significant trade-off in terms of accuracy of the model, since the number of distinct weights is greatly reduced. This is an active area of research that is well-known, with the earliest papers dating back to 1992 and 1993. However, software algorithms using bitshift seem unlikely to outperform hardware acceleration of integer multiplication, and hardware support is limited. Extending hardware accelerators to use bitshifting or the highest power-of-two approximate multiplication in hardware, presumably requiring fewer operations and less computing power (and reduced heat generation) seems an open area for further research. Note that the highest bit of an integer can be efficiently calculated using Brian Kernighan's algorithm (1988).

Research papers on bitshift power-of-two quantization:

  1. Maarten Vandersteegen, Kristof Van Beeck and Toon Goedemé, 2023, Integer-Only CNNs with 4 Bit Weights and Bit-Shift Quantization Scales at Full-Precision Accuracy, Electronics, October 2021, 10(22), 2823, https://www.mdpi.com/2079-9292/10/22/2823
  2. Yiren Zhao, Xitong Gao, Daniel Bates, Robert Mullins, Cheng-Zhong Xu, 2019, Focused Quantization for Sparse CNNs, Advances in Neural Information Processing Systems 32 (NeurIPS 2019), 2019, https://proceedings.neurips.cc/paper/2019/hash/58aaee7ae94b52697ad3b9275d46ec7f-Abstract.html
  3. Dominika Przewlocka-Rus, Syed Shakib Sarwar, H. Ekin Sumbul, Yuecheng Li, Barbara De Salvo, 2022, Power-of-Two Quantization for Low Bitwidth and Hardware Compliant Neural Networks, Feb 2022, https://arxiv.org/abs/2203.05025
  4. Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio, Quantized neural networks: Training neural networks with low precision weights and activations, The Journal of Machine Learning Research, 18(1):6869–6898, 2017, https://arxiv.org/abs/1609.07061.
  5. T. Hokchhay, S. Hashemi, R. I. Bahar, and S. Reda, 2017, Hardware-software codesign of accurate, multiplier-free deep neural networks, in Proc. 54th Annu. Design Autom. Conf. (DAC), 2017, pp. 1–6., https://arxiv.org/abs/1705.04288
  6. Yuhang Li, Xin Dong, and Wei Wang, 2020, Additive powers-of-two quantization: An efficient non-uniform discretization for neural networks, International Conference on Learning Representations, February 2020, https://arxiv.org/abs/1909.13144
  7. Z. Lin, M. Courbariaux, R. Memisevic, and Y. Bengio. 2015, Neural networks with few multiplications, CoRR, abs/1510.03009, 2015. https://arxiv.org/abs/1510.03009 (Power-of-Two Quantization)
  8. Soheil Hashemi; Nicholas Anthony; Hokchhay Tann; R. Iris Bahar; Sherief Reda, Understanding the impact of precision quantization on the accuracy and energy of neural networks, Design, Automation & Test in Europe Conference & Exhibition, March 2017, https://ieeexplore.ieee.org/abstract/document/7927224
  9. Marchesi, Michele, Orlandi, Gianni, Piazza, Francesco, and Uncini, Aurelio, 1993, Fast neural networks without multipliers, IEEE Transactions on Neural Networks , 4(1):53–62, 1993, https://ieeexplore.ieee.org/document/182695
  10. A. White and M. 1. Elmasry, 1992, The digi-neocognitron: a digital neocognitron neural network model for VLSI, IEEE Trans. Neural Networks, vol. 3. pp. 73-85, Jan. 1992, https://ieeexplore.ieee.org/document/105419
  11. Kwan, Hon Keung and Tang, CZ, 1993, Multiplierless multilayer feedforward neural network design suitable for continuous input-output mapping, Electronics Letters, 29(14):1259–1260, 1993, https://digital-library.theiet.org/content/journals/10.1049/el_19930841
  12. Sean Eron Anderson, 2023, Bit Twiddling Hacks (Kernighan Algorithm), https://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetKernighan
  13. Peter Wegner, 1960, A technique for counting ones in a binary computer, Communications of the ACM, Volume 3, Issue 5, May 1960, https://doi.org/10.1145/367236.367286
  14. Daisuke Miyashita, Edward H. Lee, and Boris Murmann, 2016, Convolutional Neural Networks using Logarithmic Data Representation, CoRR abs/1603.01025 (2016), https://arxiv.org/abs/1603.01025
  15. Edward H. Lee, Daisuke Miyashita, Elaina Chai, Boris Murmann, and S. Simon Wong, 2017, LogNet: Energy-efficient neural networks using logarithmic computation, In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2017, New Orleans, LA, USA, March 5-9, 2017. 5900–5904. https://doi.org/10.1109/ICASSP.2017.7953288
  16. Elhoushi, M.; Chen, Z.; Shafiq, F.; Tian, Y. H.; and Li, J. Y., 2019, Deepshift: Towards multiplication-less neural networks, arXiv preprint arXiv:1905.13298, https://arxiv.org/abs/1905.13298
  17. Zhou, A.; Yao, A.; Guo, Y.; Xu, L.; and Chen, Y., 2017, Incremental network quantization: Towards lossless CNNs with low-precision weight, arXiv preprint arXiv:1702.03044, https://arxiv.org/abs/1702.03044
  18. J Cai, M Takemoto, H Nakajo, 2018, A deep look into logarithmic quantization of model parameters in neural networks, https://dl.acm.org/doi/abs/10.1145/3291280.3291800
  19. HyunJin Kim; Min Soo Kim; Alberto A. Del Barrio; Nader Bagherzadeh, 2019, A cost-efficient iterative truncated logarithmic multiplication for convolutional neural networks, IEEE 26th Symposium on Computer Arithmetic (ARITH), https://ieeexplore.ieee.org/abstract/document/8877474
  20. X Li, B Liu, RH Yang, V Courville, C Xing, VP Nia, 2023, DenseShift: Towards Accurate and Efficient Low-Bit Power-of-Two Quantization, Proceedings of the IEEE/CVF, https://openaccess.thecvf.com/content/ICCV2023/papers/Li_DenseShift_Towards_Accurate_and_Efficient_Low-Bit_Power-of-Two_Quantization_ICCV_2023_paper.pdf (Extends log quantization to floating-point numbers efficiently by using a bitwise trick of integer addition on the sign and exponent bits of 32-bit IEEE 754 floats.)

See also more sum papers on power-of-two quantization at https://www.aussieai.com/research/quantization#logarithmic.

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++