Aussie AI

Sum of Two Bitshifts Quantization

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

Sum of Two Bitshifts Quantization

The downside of logarithmic quantization is that there are relatively few unique weights, limiting precision, even if the number of bits used is maximized using a large scaling factor. An alternative implementation is to use two bitshift operations and an addition (or use of “shift-and-add” operations). In this way, the two highest bits of the quantized integer weight can be used, which improves model precision at the cost of more computation. This assumes that two integer shifts and an integer addition are less than the cost of a single integer multiplication. An early mention of this “sums of powers of two” method is in Marchesi et al. (1993).

Research papers on sum-of-two-bitshifts quantization:

  1. Sung-En Chang, Yanyu Li, Mengshu Sun, Runbin Shi, Hayden K.-H. So, Xuehai Qian, Yanzhi Wang, and Xue Lin, 2021, Mix and Match: A Novel FPGA-Centric Deep Neural Network Quantization Framework, 2021, In 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, Seoul, Korea (South), 208–220, https://doi.org/10.1109/HPCA51647.2021.00027
  2. You, H.; Chen, X.; Zhang, Y.; Li, C.; Li, S.; Liu, Z.; Wang, Z.; and Lin, Y., 2020, ShiftAddNet: A Hardware-Inspired Deep Network, In NeurIPS, https://arxiv.org/abs/2010.12785
  3. Marchesi, Michele, Orlandi, Gianni, Piazza, Francesco, and Uncini, Aurelio, 1993, Fast neural networks without multipliers, IEEE Transactions on Neural Networks , 4(1):53–62, 1993, https://ieeexplore.ieee.org/document/182695
  4. Robert Eisele, 2010, Optimizing integer multiplication, April 29th, 2010, https://www.xarg.org/2010/04/optimizing-integer-multiplication/
  5. Yuhang Li, Xin Dong, and Wei Wang, 2020, Additive powers-of-two quantization: An efficient non-uniform discretization for neural networks, International Conference on Learning Representations, February 2020, https://arxiv.org/abs/1909.13144

See also more sum of two bitshift quantization papers at https://www.aussieai.com/research/quantization#logarithmic.

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++