Aussie AI

Vectorized Softmax

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

Vectorized Softmax

The Softmax code has two loops that run sequentially: summing the exponentials, and scaling by the sum's reciprocal. Both loops are candidates for vectorization. The only real problem is we can't fuse the two loops into one, because the second loop needs the result of the first loop as the scaling factor.

Second things first. The second loop is easy to vectorize because it's just multiplying a vector by a scalar. The second loop does not have any exponentiation, because the first loop has stored the exponentiated values in the vector, so there is only a scaling multiplication by the reciprocal.

    for (int i = 0; i < n; i++) {
        v[i] *= recip;  // NOTE: v[i] is already expf'd
    }

Vectorizing exponentials. The first loop has exponentiation and also summing of the results. That sounds like it's going to be expensive, but the “exp” and “expf” functions have had hardware support for years. The x86 processor architecture has opcodes to do various common math functions including exponentials, and these can be accessed via the AVX C++ intrinsic functions.

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++