Aussie AI

Ternary Quantization

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

Ternary Quantization

Ternary quantization (or “ternarization”) is the use of 3 weights: -1, 0, and 1. This requires 2 bits for representation of the weights in the model, so why wouldn't you just use 4 weights? The answer is that ternary quantization can use zero-multiplication arithmetic in the inference algorithm, with an addition (for +1), a subtraction (for -1), and a null test (for 0).

However, like binary quantization, ternary quantization still suffers from accuracy degradation. It is highly efficient in terms of space and time, but the model loses some capabilities. Nevertheless, there are many research papers attempting to improve this.

Research papers on ternary quantization:

  1. N. Mellempudi, A. Kundu, D. Mudigere, D. Das, B. Kaul, and P. Dubey, Ternary neural networks with fine-grained quantization , CoRR, vol. abs/1705.01462, 2017, https://arxiv.org/abs/1705.01462
  2. Fengfu Li, Bo Zhang, and Bin Liu. 2016. Ternary weight networks . arXiv preprint arXiv:1605.04711 (2016), https://arxiv.org/abs/1605.04711
  3. Zhu et al. 2016] Zhu, C.; Han, S.; Mao, H.; and Dally, W. J. 2016. Trained ternary quantization . arXiv preprint arXiv:1612.01064, https://arxiv.org/abs/1612.01064
  4. D Liu, X Liu, 2023, Ternary Quantization: A Survey , arXiv preprint arXiv:2303.01505, 2023, https://arxiv.org/abs/2303.01505
  5. E Yvinec, A Dapogny, K Bailly, 2023, Designing strong baselines for ternary neural network quantization through support and mass equalization , arXiv preprint arXiv:2306.17442, 2023, https://arxiv.org/abs/2306.17442
  6. Fengfu Li, Bin Liu, Xiaoxing Wang, Bo Zhang, Junchi Yan, Nov 2022, Ternary Weight Networks , https://arxiv.org/abs/1605.04711, Code: https://github.com/Thinklab-SJTU/twns
  7. M Kim, S Lee, J Lee, S Hong, DS Chang, 2023, Token-Scaled Logit Distillation for Ternary Weight Generative Language Models , 2023, https://arxiv.org/abs/2308.06744,
  8. Dan Liu, Xi Chen, Chen Ma, Xue Liu, Dec 2022, Hyperspherical Quantization: Toward Smaller and More Accurate Models , https://arxiv.org/abs/2212.12653
  9. Kota Ando; Kodai Ueyoshi; Kentaro Orimo; Haruyoshi Yonekawa; Shimpei Sato; Hiroki Nakahara; Masayuki Ikebe, 2017, BRein memory: A 13-layer 4.2 K neuron/0.8 M synapse binary/ternary reconfigurable in-memory deep neural network accelerator in 65 nm CMOS, Proc. Symp. VLSI Circuits, pp. C24-C25, Jun. 2017. https://ieeexplore.ieee.org/document/8008533
  10. S. K. Esser et al., 2016, Convolutional networks for fast energy-efficient neuromorphic computing, Proc. Nat. Acad. Sci. USA, vol. 113, no. 41, pp. 11441-11446, 2016. https://arxiv.org/abs/1603.08270 (Ternary weights, binary activations.)

See more papers on ternary quantization at: https://www.aussieai.com/research/quantization#ternary

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++