Aussie AI

Vectorized Add Scalar

Book Excerpt from "Generative AI in C++"

by David Spuler, Ph.D.

Vectorized Add Scalar

The code to vectorize an “add-scalar” operation is almost identical to “multiply-scalar” operations, except that “add” intrinsics are used. Here is the AVX-1 version with “_mm_add_ps”:

    void aussie_vector_add_scalar_AVX1(float v[], int n, float c)
    {
        // Add scalar constant to all vector elements
        const __m128 rscalar = _mm_set1_ps(c);  // Set up vector full of scalars...
        for (int i = 0; i < n; i += 4) {
            __m128 r1 = _mm_loadu_ps(&v[i]);   // Load floats into 128-bits
            __m128 dst = _mm_add_ps(r1, rscalar);   // Add scalars
            _mm_store_ps(&v[i], dst);  // store back to floats
        }
    }

And this is the analogous AVX-2 version using the “_mm256_add_ps” intrinsic:

    void aussie_vector_add_scalar_AVX2(float v[], int n, float c)  // Add scalar constant to all vector elements
    {
        const __m256 rscalar = _mm256_set1_ps(c);  // vector full of scalars...
        for (int i = 0; i < n; i += 8) {
            __m256 r1 = _mm256_loadu_ps(&v[i]);   // Load floats into 256-bits
            __m256 dst = _mm256_add_ps(r1, rscalar);   // Add scalars
            _mm256_store_ps(&v[i], dst);  // convert to floats (Aligned version)
        }
    }

• Next:

• Up: Table of Contents

• Buy: Generative AI in C++: Coding Transformers and LLMs

The new AI programming book by Aussie AI co-founders:

AI coding in C++
Transformer engine speedups
LLM models
Phone and desktop AI
Code examples
Research citations

Get your copy from Amazon: Generative AI in C++

Aussie AI

Vectorized Add Scalar

Vectorized Add Scalar

Quick Links

Product

New to Writing?

Writing Styles