Aussie AI

Multiplication Optimizations

Book Excerpt from "Generative AI in C++"

by David Spuler, Ph.D.

Multiplication Optimizations

Multiplication is the foremost bottleneck in training and inference of neural networks and Transformer architectures. Most models rely on matrix multiplications, whether you call it tensors or convolutions, which involve vector dot products, which in turn involve “multiply-and-add” sequences (called “multiply-accumulate” or MAC). The multiplication part is more expensive than the accumulation.

There have been various ideas over the years of AI research as to how to optimize multiplications, including:

Hardware-accelerated multiplication (lately, this is a GPU's bread-and-butter)
Advanced floating-point formats (e.g. bfloat16)
Faster multiplication arithmetic algorithms
Approximate multiplication arithmetic algorithms
Integer multiplication instead of floating-point (see quantization)
Faster matrix multiplication algorithms (e.g. low-rank matrices, tensor decomposition)
Avoiding or reducing multiplications (e.g. zero-multiplication models, pruning, zero skipping, sparsity, etc.)
Advanced mathematical numerical systems

Although being able to multiply two integers together is taken for granted by modern programmers, there are actually complicated algorithms happening behind the scenes (i.e. in the chips). Early algorithms include Karatsuba multiplication (1962), Toom-Cook multiplication, Schonhage–Strassen algorithms, and contributions by Knuth. The improvement and parallelization of such algorithms is fundamental to GPU and hardware accelerator design. Use of such algorithms in software acceleration of model inference seems unlikely to beat hardware acceleration.

• Next:

• Up: Table of Contents

• Buy: Generative AI in C++: Coding Transformers and LLMs

The new AI programming book by Aussie AI co-founders:

AI coding in C++
Transformer engine speedups
LLM models
Phone and desktop AI
Code examples
Research citations

Get your copy from Amazon: Generative AI in C++

Aussie AI

Multiplication Optimizations

Multiplication Optimizations

Quick Links

Product

New to Writing?

Writing Styles