Aussie AI

7-Bit Quantization (INT7)

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

7-Bit Quantization (INT7)

Research papers on 7-bit quantization:

  1. E Kloberdanz, W Le, Sep 2023, MixQuant: Mixed Precision Quantization with a Bit-width Optimization Search, arXiv preprint arXiv:2309.17341, https://arxiv.org/pdf/2309.17341.pdf (Various tests of quantization from 2-bits to 8-bits.)
  2. Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, Dmitry Kalenichenko, 2018, Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 2704-2713, https://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf
  3. M Giacobbe, TA Henzinger, M Lechner, 2020, How many bits does it take to quantize your neural network?, TACAS 2020, https://link.springer.com/chapter/10.1007/978-3-030-45237-7_5, PDF: https://link.springer.com/content/pdf/10.1007/978-3-030-45237-7_5.pdf (Ran experiments from 6-bit to 10-bit quantization.)
  4. B Gouin-Ferland, R Coffee, AC Therrien, 2022, Data reduction through optimized scalar quantization for more compact neural networks, Frontiers in Physics, https://www.frontiersin.org/articles/10.3389/fphy.2022.957128/full (Examined 3 to 7 bit weights for quantization.)

See more papers on 7-bit quantization (INT7) at: https://www.aussieai.com/research/quantization#int7

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++