Aussie AI

Integer-Only-Arithmetic Quantization

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

Integer-Only-Arithmetic Quantization

Integer-only quantization is integer quantization where only integer multiplication is performed. The assumption that this is true for all integer quantization algorithms is false. Several types of integer quantization may store weights as quantized integers, but then de-quantize them back to floating-point at various points (even for weight multiplication in some algorithms). Methods that strictly restrict arithmetic to avoid floating-point operations are more precisely named “integer-only-arithmetic quantization algorithms”.

Even these integer-only quantization algorithms may still have floating-point computations in some components of the Transformer. Methods that also fully quantize non-linear components to integers, such as Softmax and normalization components, are called “end-to-end integer Transformers.”

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++