Aussie AI
Vectorized Add Scalar
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
Vectorized Add Scalar
The code to vectorize an “add-scalar” operation is almost identical to “multiply-scalar” operations,
except that “add” intrinsics are used.
Here is the AVX-1 version with “_mm_add_ps”:
void aussie_vector_add_scalar_AVX1(float v[], int n, float c)
{
// Add scalar constant to all vector elements
const __m128 rscalar = _mm_set1_ps(c); // Set up vector full of scalars...
for (int i = 0; i < n; i += 4) {
__m128 r1 = _mm_loadu_ps(&v[i]); // Load floats into 128-bits
__m128 dst = _mm_add_ps(r1, rscalar); // Add scalars
_mm_store_ps(&v[i], dst); // store back to floats
}
}
And this is the analogous AVX-2 version using the “_mm256_add_ps” intrinsic:
void aussie_vector_add_scalar_AVX2(float v[], int n, float c) // Add scalar constant to all vector elements
{
const __m256 rscalar = _mm256_set1_ps(c); // vector full of scalars...
for (int i = 0; i < n; i += 8) {
__m256 r1 = _mm256_loadu_ps(&v[i]); // Load floats into 256-bits
__m256 dst = _mm256_add_ps(r1, rscalar); // Add scalars
_mm256_store_ps(&v[i], dst); // convert to floats (Aligned version)
}
}
|
• Next: • Up: Table of Contents |
|
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |