Aussie AI

Softmax Normalization

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

Softmax Normalization

The Softmax component is used as part of the attention head. It normalizes the output values into proper probabilities (i.e., not negative and not too large), by scaling them using a “sum-of-exponentials” method. This also ensures that all of the distribution sums to one, as probabilities should. See Chapter 25 for more about Softmax.

Some research papers use a different normalization in the attention heads. For example, the “Hardmax” function can be used instead of Softmax, which makes it a different type of distribution that isn't a range of probabilities. Another possibility is the “Sparsemax” function. However, only Softmax has mainstream acceptance in Transformer architectures.

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++