Aussie AI

Weight Precomputations

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

Weight Precomputations

Weights are static during inference, so why not fiddle with them before we start? Of course, that's exactly the underlying idea of quantization and static pruning. Quantization precomputes new versions of the weights that are quantized to integers or lower precision floating-point. Pruning removes weights by changing some of them to zero.

However, this section looks at other precomputation ideas in general. What useful information can we discern by preprocessing the weights and doing precomputations? Since the weight data is available after training, we can do intervening weight calculations “offline” without affecting inference speed, and use the precomputed data in some way to speed up dynamic runtime inference thereafter.

Research papers on weight precomputation:

  1. T. J. Ham, S. J. Jung, S. Kim et al., 2020, A3: Accelerating attention mechanisms in neural networks with approximation, in Proc. of HPCA. IEEE, 2020, pp. 328–341. https://arxiv.org/abs/2002.10941 (Preprocessing of the key matrix in attention, with focus on large positive and negative values.)
  2. Q. Chen, C. Sun, Z. Lu, and C. Gao, 2022, Enabling energy-efficient inference for self-attention mechanisms in neural networks, in IEEE 4th International Conference on Artificial Intelligence Circuits and Systems (AICAS), 2022, pp. 25–28, https://ieeexplore.ieee.org/document/9869924
  3. Tae Jun Ham; Yejin Lee; Seong Hoon Seo; Soosung Kim; Hyunji Choi; Sung Jun Jung; Jae W. Lee, 2021, ELSA: Hardware-software co-design for efficient, lightweight self-attention mechanism in neural networks, 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA), https://ieeexplore.ieee.org/abstract/document/9499860/, https://taejunham.github.io/data/elsa_isca21.pdf (Precomputations involve the key and value matrices including dot products, hashing, and similarity checking.)

For research on weight precomputations, see also https://www.aussieai.com/research/weight-precomputations.

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++