Aussie AI

Softmax Optimization Research

Book Excerpt from "Generative AI in C++"

by David Spuler, Ph.D.

Softmax Optimization Research

The Softmax function is a significant cost in Transformer inference because it is part of the attention mechanism, whereas it was less of a bottleneck in earlier neural network architectures. A vanilla Softmax implementation is very expensive because it involves computing the exponentials of all of the elements of the logits vector. Various attempts have been made to optimize and approximate Softmax calculations, including:

Softmax code optimizations (sequential)
Vectorized Softmax (parallelization)
Softmax approximations
Integer-only Softmax
Pruned Softmax (removal)
Fused Softmax (kernel fusion)
Softmax replacements (use different functions)

Related Research Areas: Note that there are several other areas of theory that are relevant to Softmax optimizations and approximation. The denominator of the Softmax formula is a “sum of exponentials” and this type of calculation also appears in Logarithmic Number System (LNS) addition. Also, the sum of exponentials calculation, appears in “log-sum-exp networks,” which are somewhat related to “tropical algebra.” The area of “max-plus networks” may also be relevant to Softmax approximation research.

• Next:

• Up: Table of Contents

• Buy: Generative AI in C++: Coding Transformers and LLMs

The new AI programming book by Aussie AI co-founders:

AI coding in C++
Transformer engine speedups
LLM models
Phone and desktop AI
Code examples
Research citations

Get your copy from Amazon: Generative AI in C++

Aussie AI

Softmax Optimization Research

Softmax Optimization Research

Quick Links

Product

New to Writing?

Writing Styles