Aussie AI

Vectorized RELU with Max Intrinsics

Book Excerpt from "Generative AI in C++"

by David Spuler, Ph.D.

Vectorized RELU with Max Intrinsics

The RELU activation function simply converts negatives to zero, leaving positives unchanged. This is algebraically equivalent to max(x,0), which can be implemented in AVX like a “max-scalar” operation.

To vectorize RELU applied to a whole vector of float elements, we are effectively doing a SIMD max operation with a scalar zero (i.e., 0.0). Hence, the code is very similar to vectorization of add-scalar, but uses the “_mm_max_ps” intrinsic.

The AVX1 version of vectorized RELU looks like:

    void aussie_vector_reluize_AVX1(float v[], int n)   // Apply RELU to each element (sets negatives to zero)
    {
        if (n % 4 != 0) {
            yassert(n % 4 == 0);
            return; // fail
        }
        const __m128 rzeros = _mm_set1_ps(0.0f);  // Set up vector full of zeros...
        for (int i = 0; i < n; i += 4) {
            __m128 r1 = _mm_loadu_ps(&v[i]);   // Load floats into 128-bits
            __m128 dst = _mm_max_ps(r1, rzeros);   // MAX(r1,0)
            _mm_store_ps(&v[i], dst);  // store back to floats
        }
    }

And here is the AVX2 version doing 8 float elements at a time using the “_mm256_max_ps” intrinsic:

    void aussie_vector_reluize_AVX2(float v[], int n)  // Apply RELU to each element (sets negatives to zero)
    {
        if (n % 8 != 0) {
            yassert(n % 8 == 0);
            return; // fail
        }
        const __m256 rzeros = _mm256_set1_ps(0.0f);  // vector full of zeros...
        for (int i = 0; i < n; i += 8) {
            __m256 r1 = _mm256_loadu_ps(&v[i]);   // Load floats into 256-bits
            __m256 dst = _mm256_max_ps(r1, rzeros);   // MAX(R1, 0)
            _mm256_store_ps(&v[i], dst);  // store back to floats
        }
    }

• Next:

• Up: Table of Contents

• Buy: Generative AI in C++: Coding Transformers and LLMs

The new AI programming book by Aussie AI co-founders:

AI coding in C++
Transformer engine speedups
LLM models
Phone and desktop AI
Code examples
Research citations

Get your copy from Amazon: Generative AI in C++

Aussie AI

Vectorized RELU with Max Intrinsics

Vectorized RELU with Max Intrinsics

Quick Links

Product

New to Writing?

Writing Styles