Aussie AI

Vectorized Add Scalar

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

Vectorized Add Scalar

The code to vectorize an “add-scalar” operation is almost identical to “multiply-scalar” operations, except that “add” intrinsics are used. Here is the AVX-1 version with “_mm_add_ps”:

    void aussie_vector_add_scalar_AVX1(float v[], int n, float c)
    {
        // Add scalar constant to all vector elements
        const __m128 rscalar = _mm_set1_ps(c);  // Set up vector full of scalars...
        for (int i = 0; i < n; i += 4) {
            __m128 r1 = _mm_loadu_ps(&v[i]);   // Load floats into 128-bits
            __m128 dst = _mm_add_ps(r1, rscalar);   // Add scalars
            _mm_store_ps(&v[i], dst);  // store back to floats
        }
    }

And this is the analogous AVX-2 version using the “_mm256_add_ps” intrinsic:

    void aussie_vector_add_scalar_AVX2(float v[], int n, float c)  // Add scalar constant to all vector elements
    {
        const __m256 rscalar = _mm256_set1_ps(c);  // vector full of scalars...
        for (int i = 0; i < n; i += 8) {
            __m256 r1 = _mm256_loadu_ps(&v[i]);   // Load floats into 256-bits
            __m256 dst = _mm256_add_ps(r1, rscalar);   // Add scalars
            _mm256_store_ps(&v[i], dst);  // convert to floats (Aligned version)
        }
    }

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++