Aussie AI
AVX-2 SIMD Multiplication
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
AVX-2 SIMD Multiplication
Here is the AVX-2 version of pairwise SIMD multiply with intrinsics
for 256-bit registers,
which is eight 32-bit float variables.
void aussie_avx2_multiply_8_floats(
float v1[8], float v2[8], float vresult[8])
{
// Multiply 8x32-bit floats in 256-bit AVX2 registers
__m256 r1 = _mm256_loadu_ps(v1); // Load floats
__m256 r2 = _mm256_loadu_ps(v2);
__m256 dst = _mm256_mul_ps(r1, r2); // Multiply (SIMD)
_mm256_storeu_ps(vresult, dst); // Convert to 8 floats
}
This is similar to the basic AVX 128-bit version, with some differences:
- The type for 256-bit registers is “
__m256”. - The AVX-2 loading intrinsic is “
_mm256_loadu_ps”. - The AVX-2 multiplication intrinsic is “
_mm256_mul_ps”. - The conversion back to float uses AVX-2 intrinsic “
_mm256_storeu_ps”.
|
• Next: • Up: Table of Contents |
|
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |