Aussie AI

Weight Precomputations

Book Excerpt from "Generative AI in C++"

by David Spuler, Ph.D.

Weight Precomputations

Weights are static during inference, so why not fiddle with them before we start? Of course, that's exactly the underlying idea of quantization and static pruning. Quantization precomputes new versions of the weights that are quantized to integers or lower precision floating-point. Pruning removes weights by changing some of them to zero.

However, this section looks at other precomputation ideas in general. What useful information can we discern by preprocessing the weights and doing precomputations? Since the weight data is available after training, we can do intervening weight calculations “offline” without affecting inference speed, and use the precomputed data in some way to speed up dynamic runtime inference thereafter.

Research papers on weight precomputation:

T. J. Ham, S. J. Jung, S. Kim et al., 2020, A3: Accelerating attention mechanisms in neural networks with approximation, in Proc. of HPCA. IEEE, 2020, pp. 328–341. https://arxiv.org/abs/2002.10941 (Preprocessing of the key matrix in attention, with focus on large positive and negative values.)
Q. Chen, C. Sun, Z. Lu, and C. Gao, 2022, Enabling energy-efficient inference for self-attention mechanisms in neural networks, in IEEE 4th International Conference on Artificial Intelligence Circuits and Systems (AICAS), 2022, pp. 25–28, https://ieeexplore.ieee.org/document/9869924
Tae Jun Ham; Yejin Lee; Seong Hoon Seo; Soosung Kim; Hyunji Choi; Sung Jun Jung; Jae W. Lee, 2021, ELSA: Hardware-software co-design for efficient, lightweight self-attention mechanism in neural networks, 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA), https://ieeexplore.ieee.org/abstract/document/9499860/, https://taejunham.github.io/data/elsa_isca21.pdf (Precomputations involve the key and value matrices including dot products, hashing, and similarity checking.)

For research on weight precomputations, see also https://www.aussieai.com/research/weight-precomputations.

• Next:

• Up: Table of Contents

• Buy: Generative AI in C++: Coding Transformers and LLMs

The new AI programming book by Aussie AI co-founders:

AI coding in C++
Transformer engine speedups
LLM models
Phone and desktop AI
Code examples
Research citations

Get your copy from Amazon: Generative AI in C++

Aussie AI

Weight Precomputations

Weight Precomputations

Quick Links

Product

New to Writing?

Writing Styles