Aussie AI

Integer-Only-Arithmetic Quantization

Book Excerpt from "Generative AI in C++"

by David Spuler, Ph.D.

Integer-Only-Arithmetic Quantization

Integer-only quantization is integer quantization where only integer multiplication is performed. The assumption that this is true for all integer quantization algorithms is false. Several types of integer quantization may store weights as quantized integers, but then de-quantize them back to floating-point at various points (even for weight multiplication in some algorithms). Methods that strictly restrict arithmetic to avoid floating-point operations are more precisely named “integer-only-arithmetic quantization algorithms”.

Even these integer-only quantization algorithms may still have floating-point computations in some components of the Transformer. Methods that also fully quantize non-linear components to integers, such as Softmax and normalization components, are called “end-to-end integer Transformers.”

• Next:

• Up: Table of Contents

• Buy: Generative AI in C++: Coding Transformers and LLMs

The new AI programming book by Aussie AI co-founders:

AI coding in C++
Transformer engine speedups
LLM models
Phone and desktop AI
Code examples
Research citations

Get your copy from Amazon: Generative AI in C++

Aussie AI

Integer-Only-Arithmetic Quantization

Integer-Only-Arithmetic Quantization

Quick Links

Product

New to Writing?

Writing Styles