Aussie AI

Ternary Quantization

Book Excerpt from "Generative AI in C++"

by David Spuler, Ph.D.

Ternary Quantization

Ternary quantization (or “ternarization”) is the use of 3 weights: -1, 0, and 1. This requires 2 bits for representation of the weights in the model, so why wouldn't you just use 4 weights? The answer is that ternary quantization can use zero-multiplication arithmetic in the inference algorithm, with an addition (for +1), a subtraction (for -1), and a null test (for 0).

However, like binary quantization, ternary quantization still suffers from accuracy degradation. It is highly efficient in terms of space and time, but the model loses some capabilities. Nevertheless, there are many research papers attempting to improve this.

Research papers on ternary quantization:

N. Mellempudi, A. Kundu, D. Mudigere, D. Das, B. Kaul, and P. Dubey, Ternary neural networks with fine-grained quantization , CoRR, vol. abs/1705.01462, 2017, https://arxiv.org/abs/1705.01462
Fengfu Li, Bo Zhang, and Bin Liu. 2016. Ternary weight networks . arXiv preprint arXiv:1605.04711 (2016), https://arxiv.org/abs/1605.04711
Zhu et al. 2016] Zhu, C.; Han, S.; Mao, H.; and Dally, W. J. 2016. Trained ternary quantization . arXiv preprint arXiv:1612.01064, https://arxiv.org/abs/1612.01064
D Liu, X Liu, 2023, Ternary Quantization: A Survey , arXiv preprint arXiv:2303.01505, 2023, https://arxiv.org/abs/2303.01505
E Yvinec, A Dapogny, K Bailly, 2023, Designing strong baselines for ternary neural network quantization through support and mass equalization , arXiv preprint arXiv:2306.17442, 2023, https://arxiv.org/abs/2306.17442
Fengfu Li, Bin Liu, Xiaoxing Wang, Bo Zhang, Junchi Yan, Nov 2022, Ternary Weight Networks , https://arxiv.org/abs/1605.04711, Code: https://github.com/Thinklab-SJTU/twns
M Kim, S Lee, J Lee, S Hong, DS Chang, 2023, Token-Scaled Logit Distillation for Ternary Weight Generative Language Models , 2023, https://arxiv.org/abs/2308.06744,
Dan Liu, Xi Chen, Chen Ma, Xue Liu, Dec 2022, Hyperspherical Quantization: Toward Smaller and More Accurate Models , https://arxiv.org/abs/2212.12653
Kota Ando; Kodai Ueyoshi; Kentaro Orimo; Haruyoshi Yonekawa; Shimpei Sato; Hiroki Nakahara; Masayuki Ikebe, 2017, BRein memory: A 13-layer 4.2 K neuron/0.8 M synapse binary/ternary reconfigurable in-memory deep neural network accelerator in 65 nm CMOS, Proc. Symp. VLSI Circuits, pp. C24-C25, Jun. 2017. https://ieeexplore.ieee.org/document/8008533
S. K. Esser et al., 2016, Convolutional networks for fast energy-efficient neuromorphic computing, Proc. Nat. Acad. Sci. USA, vol. 113, no. 41, pp. 11441-11446, 2016. https://arxiv.org/abs/1603.08270 (Ternary weights, binary activations.)

See more papers on ternary quantization at: https://www.aussieai.com/research/quantization#ternary

• Next:

• Up: Table of Contents

• Buy: Generative AI in C++: Coding Transformers and LLMs

The new AI programming book by Aussie AI co-founders:

AI coding in C++
Transformer engine speedups
LLM models
Phone and desktop AI
Code examples
Research citations

Get your copy from Amazon: Generative AI in C++

Aussie AI

Ternary Quantization

Ternary Quantization

Quick Links

Product

New to Writing?

Writing Styles