Aussie AI
Ternary Quantization
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
Ternary Quantization
Ternary quantization (or “ternarization”) is the use of 3 weights: -1, 0, and 1. This requires 2 bits for representation of the weights in the model, so why wouldn't you just use 4 weights? The answer is that ternary quantization can use zero-multiplication arithmetic in the inference algorithm, with an addition (for +1), a subtraction (for -1), and a null test (for 0).
However, like binary quantization, ternary quantization still suffers from accuracy degradation. It is highly efficient in terms of space and time, but the model loses some capabilities. Nevertheless, there are many research papers attempting to improve this.
Research papers on ternary quantization:
- N. Mellempudi, A. Kundu, D. Mudigere, D. Das, B. Kaul, and P. Dubey, Ternary neural networks with fine-grained quantization , CoRR, vol. abs/1705.01462, 2017, https://arxiv.org/abs/1705.01462
- Fengfu Li, Bo Zhang, and Bin Liu. 2016. Ternary weight networks . arXiv preprint arXiv:1605.04711 (2016), https://arxiv.org/abs/1605.04711
- Zhu et al. 2016] Zhu, C.; Han, S.; Mao, H.; and Dally, W. J. 2016. Trained ternary quantization . arXiv preprint arXiv:1612.01064, https://arxiv.org/abs/1612.01064
- D Liu, X Liu, 2023, Ternary Quantization: A Survey , arXiv preprint arXiv:2303.01505, 2023, https://arxiv.org/abs/2303.01505
- E Yvinec, A Dapogny, K Bailly, 2023, Designing strong baselines for ternary neural network quantization through support and mass equalization , arXiv preprint arXiv:2306.17442, 2023, https://arxiv.org/abs/2306.17442
- Fengfu Li, Bin Liu, Xiaoxing Wang, Bo Zhang, Junchi Yan, Nov 2022, Ternary Weight Networks , https://arxiv.org/abs/1605.04711, Code: https://github.com/Thinklab-SJTU/twns
- M Kim, S Lee, J Lee, S Hong, DS Chang, 2023, Token-Scaled Logit Distillation for Ternary Weight Generative Language Models , 2023, https://arxiv.org/abs/2308.06744,
- Dan Liu, Xi Chen, Chen Ma, Xue Liu, Dec 2022, Hyperspherical Quantization: Toward Smaller and More Accurate Models , https://arxiv.org/abs/2212.12653
- Kota Ando; Kodai Ueyoshi; Kentaro Orimo; Haruyoshi Yonekawa; Shimpei Sato; Hiroki Nakahara; Masayuki Ikebe, 2017, BRein memory: A 13-layer 4.2 K neuron/0.8 M synapse binary/ternary reconfigurable in-memory deep neural network accelerator in 65 nm CMOS, Proc. Symp. VLSI Circuits, pp. C24-C25, Jun. 2017. https://ieeexplore.ieee.org/document/8008533
- S. K. Esser et al., 2016, Convolutional networks for fast energy-efficient neuromorphic computing, Proc. Nat. Acad. Sci. USA, vol. 113, no. 41, pp. 11441-11446, 2016. https://arxiv.org/abs/1603.08270 (Ternary weights, binary activations.)
See more papers on ternary quantization at: https://www.aussieai.com/research/quantization#ternary
• Next: • Up: Table of Contents |
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |