Aussie AI

Vectorized Softmax with AVX

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

Vectorized Softmax with AVX

The AVX intrinsics use x86 SIMD instructions to operate on multiple float values or integers at once (e.g. 4 float values for AVX-1, 8 float values for AVX-2, 16 float values for AVX-512). Surprisingly, there are AVX SIMD exponential function intrinsics, to apply “expf” to multiple elements of a vector in parallel.

Example: Softmax with AVX exponential and summation. We can vectorize both these loops separately using AVX intrinsics. Vectorized versions of expf and summation were examined in the hardware acceleration chapter. The version for AVX1 becomes:

    void aussie_vector_softmax_exponentiate_and_sum_AVX1(float v[], int n)
    {
        yassert(n % 4 == 0);
        aussie_vector_expf_AVX1(v, n);  // AVX1-accelerated expf...
        float denom = aussie_vector_sum_AVX1(v, n);  // AVX1-accelerated sum
        if (denom == 0.0) {
            yassert(denom != 0.0);
            return;  // fail (should not occur)
        }
        float recip = 1.0f / denom;
        for (int i = 0; i < n; i++) {
            v[i] *= recip;  // NOTE: v[i] is already expf'd
        }
    }

Actually, that's only vectorized two out of three loops. Here's the code with the third loop, multiply-by-scalar, also done with AVX, as was also shown in the vectorization chapter. This code is the AVX2 version:

    void aussie_vector_softmax_fused_exp_sum_mult_AVX2(float v[], int n) 
    {
        // Softmax with EXP and SUM and MULT in AVX2
        yassert(n % 8 == 0);
        float denom = aussie_vector_fused_expf_sum_AVX2(v, n);  // Element-wise expf...
        if (denom == 0.0) {
            yassert(denom != 0.0);
            return;  // fail (should not occur)
        }
        float recip = 1.0f / denom;
        aussie_vector_multiply_scalar_AVX2(v, n, recip);
    }

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++