Aussie AI
Vectorized Add Scalar
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
Vectorized Add Scalar
The code to vectorize an “add-scalar” operation is almost identical to “multiply-scalar” operations,
except that “add” intrinsics are used.
Here is the AVX-1 version with “_mm_add_ps
”:
void aussie_vector_add_scalar_AVX1(float v[], int n, float c) { // Add scalar constant to all vector elements const __m128 rscalar = _mm_set1_ps(c); // Set up vector full of scalars... for (int i = 0; i < n; i += 4) { __m128 r1 = _mm_loadu_ps(&v[i]); // Load floats into 128-bits __m128 dst = _mm_add_ps(r1, rscalar); // Add scalars _mm_store_ps(&v[i], dst); // store back to floats } }
And this is the analogous AVX-2 version using the “_mm256_add_ps
” intrinsic:
void aussie_vector_add_scalar_AVX2(float v[], int n, float c) // Add scalar constant to all vector elements { const __m256 rscalar = _mm256_set1_ps(c); // vector full of scalars... for (int i = 0; i < n; i += 8) { __m256 r1 = _mm256_loadu_ps(&v[i]); // Load floats into 256-bits __m256 dst = _mm256_add_ps(r1, rscalar); // Add scalars _mm256_store_ps(&v[i], dst); // convert to floats (Aligned version) } }
• Next: • Up: Table of Contents |
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |