Aussie AI

Vectorized Softmax with AVX

Book Excerpt from "Generative AI in C++"

by David Spuler, Ph.D.

Vectorized Softmax with AVX

The AVX intrinsics use x86 SIMD instructions to operate on multiple float values or integers at once (e.g. 4 float values for AVX-1, 8 float values for AVX-2, 16 float values for AVX-512). Surprisingly, there are AVX SIMD exponential function intrinsics, to apply “expf” to multiple elements of a vector in parallel.

Example: Softmax with AVX exponential and summation. We can vectorize both these loops separately using AVX intrinsics. Vectorized versions of expf and summation were examined in the hardware acceleration chapter. The version for AVX1 becomes:

    void aussie_vector_softmax_exponentiate_and_sum_AVX1(float v[], int n)
    {
        yassert(n % 4 == 0);
        aussie_vector_expf_AVX1(v, n);  // AVX1-accelerated expf...
        float denom = aussie_vector_sum_AVX1(v, n);  // AVX1-accelerated sum
        if (denom == 0.0) {
            yassert(denom != 0.0);
            return;  // fail (should not occur)
        }
        float recip = 1.0f / denom;
        for (int i = 0; i < n; i++) {
            v[i] *= recip;  // NOTE: v[i] is already expf'd
        }
    }

Actually, that's only vectorized two out of three loops. Here's the code with the third loop, multiply-by-scalar, also done with AVX, as was also shown in the vectorization chapter. This code is the AVX2 version:

    void aussie_vector_softmax_fused_exp_sum_mult_AVX2(float v[], int n) 
    {
        // Softmax with EXP and SUM and MULT in AVX2
        yassert(n % 8 == 0);
        float denom = aussie_vector_fused_expf_sum_AVX2(v, n);  // Element-wise expf...
        if (denom == 0.0) {
            yassert(denom != 0.0);
            return;  // fail (should not occur)
        }
        float recip = 1.0f / denom;
        aussie_vector_multiply_scalar_AVX2(v, n, recip);
    }

• Next:

• Up: Table of Contents

• Buy: Generative AI in C++: Coding Transformers and LLMs

The new AI programming book by Aussie AI co-founders:

AI coding in C++
Transformer engine speedups
LLM models
Phone and desktop AI
Code examples
Research citations

Get your copy from Amazon: Generative AI in C++

Aussie AI

Vectorized Softmax with AVX

Vectorized Softmax with AVX

Quick Links

Product

New to Writing?

Writing Styles