Aussie AI

2-Bit Quantization (INT2)

Book Excerpt from "Generative AI in C++"

by David Spuler, Ph.D.

2-Bit Quantization (INT2)

This section refers to non-ternary 2-bit quantization, using 4 distinct weights. In practice, 2-bit quantization is regarded as still having some problems with model accuracy, whereas 4-bit integer quantization is considered a more reasonable tradeoff of speed-vs-accuracy. On the other hand, maybe this is unwarranted, since Liu et al (2022) tested lots of models with 2-bits, 3-bits, and 4-bits (see Table 1 in their paper), and the extra accuracy of 4-bits over 2-bits was usually only a couple of percentage points (for double the space).

Research papers on 2-bit quantization:

Jungwook Choi, Pierce I-Jen Chuang, Zhuo Wang, Swagath Venkataramani, Vijayalakshmi Srinivasan, Kailash Gopalakrishnan, July 2018, Bridging the Accuracy Gap for 2-bit Quantized Neural Networks (QNN), https://arxiv.org/abs/1807.06964
Jungwook Choi, Swagath Venkataramani, Vijayalakshmi (Viji) Srinivasan, Kailash Gopalakrishnan, Zhuo Wang, Pierce Chuang, 2019, Accurate and Efficient 2-bit Quantized Neural Networks, Proceedings of Machine Learning and Systems 1 (MLSys 2019), https://proceedings.mlsys.org/paper/2019/file/006f52e9102a8d3be2fe5614f42ba989-Paper.pdf
S. Zhou, Y. Wu, Z. Ni, X. Zhou, H. Wen and Y. Zou, 2016, DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients, arXiv:1606.06160, 2016. https://arxiv.org/abs/1606.06160 (Has binary weights, 2-bit activations)
Z. Cai, X. He, J. Sun and N. Vasconcelos, 2017, Deep learning with low precision by half-wave Gaussian quantization, Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 5918-5926, Jul. 2017. https://arxiv.org/abs/1702.00953 (Has binary weights, 2-bit activations)
Han-Byul Kim, Eunhyeok Park, and Sungjoo Yoo. 2022. BASQ: Branch-wise Activation-clipping Search Quantization for Sub-4-bit Neural Networks, In European Conference on Computer Vision, Cham: Springer Nature Switzerland, 17-33. https://link.springer.com/chapter/10.1007/978-3-031-19775-8_2 (Evaluates quantization precision from 2-bits to 4-bits.)
Zechun Liu, Kwang-Ting Cheng, Dong Huang, Eric Xing, Zhiqiang Shen. Apr 2022. Nonuniform-to-uniform quantization: Towards accurate quantization via generalized straight-through estimation, In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4942-4952, https://arxiv.org/abs/2111.14826, Code: https://github.com/liuzechun/Nonuniform-to-Uniform-Quantization (Contains an extensive review of models with 2-bit weights and 2-bit activations, and also 3-bits and 4-bits.)
E Kloberdanz, W Le, Sep 2023, MixQuant: Mixed Precision Quantization with a Bit-width Optimization Search, arXiv preprint arXiv:2309.17341, https://arxiv.org/pdf/2309.17341.pdf (Various tests of quantization from 2-bits to 8-bits.)
Xiaofan Lin, Cong Zhao, and Wei Pan. 2017, Towards accurate binary convolutional neural network, Advances in Neural Information Processing Systems, 30, 2017. https://arxiv.org/abs/1711.11294 (Unique 2-bit quantization approach is really a double-binarized quantization method.)
NM Ho, DT Nguyen, JL Gustafson, WF Wong, 2023, Bedot: Bit Efficient Dot Product for Deep Generative Models, CoNGA 2023: Next Generation Arithmetic, pp. 19–37, https://link.springer.com/chapter/10.1007/978-3-031-32180-1_2, PDF: https://www.comp.nus.edu.sg/~wongwf/papers/CONGA23-Bedot.pdf (2–3 bits for weights and 2–5 bits for activation.)
Li, Y., Gong, R., Tan, X., Yang, Y., Hu, P., Zhang, Q., Yu, F., Wang, W., and Gu, S., 2021, BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction, ArXiv, abs/2102.05426. https://arxiv.org/abs/2102.05426 Code: https://github.com/yhhhli/BRECQ (Tests 2 to 4 bits for weights, and mixed-precision quantization.)
Yuji Chai, John Gkountouras, Glenn G. Ko, David Brooks, Gu-Yeon Wei, June 2023, INT2.1: Towards Fine-Tunable Quantized Large Language Models with Error Correction through Low-Rank Adaptation, arXiv preprint arXiv:2306.08162, https://arxiv.org/abs/2306.08162

See more papers on 2-bit quantization (INT2) at: https://www.aussieai.com/research/quantization#int3

• Next:

• Up: Table of Contents

• Buy: Generative AI in C++: Coding Transformers and LLMs

The new AI programming book by Aussie AI co-founders:

AI coding in C++
Transformer engine speedups
LLM models
Phone and desktop AI
Code examples
Research citations

Get your copy from Amazon: Generative AI in C++

Aussie AI

2-Bit Quantization (INT2)

2-Bit Quantization (INT2)

Quick Links

Product

New to Writing?

Writing Styles