Aussie AI

Softmax Optimization Research

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

Softmax Optimization Research

The Softmax function is a significant cost in Transformer inference because it is part of the attention mechanism, whereas it was less of a bottleneck in earlier neural network architectures. A vanilla Softmax implementation is very expensive because it involves computing the exponentials of all of the elements of the logits vector. Various attempts have been made to optimize and approximate Softmax calculations, including:

  • Softmax code optimizations (sequential)
  • Vectorized Softmax (parallelization)
  • Softmax approximations
  • Integer-only Softmax
  • Pruned Softmax (removal)
  • Fused Softmax (kernel fusion)
  • Softmax replacements (use different functions)

Related Research Areas: Note that there are several other areas of theory that are relevant to Softmax optimizations and approximation. The denominator of the Softmax formula is a “sum of exponentials” and this type of calculation also appears in Logarithmic Number System (LNS) addition. Also, the sum of exponentials calculation, appears in “log-sum-exp networks,” which are somewhat related to “tropical algebra.” The area of “max-plus networks” may also be relevant to Softmax approximation research.

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++