Aussie AI
Example: AVX-2 256-Bit Dot Product
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
Example: AVX-2 256-Bit Dot Product
Here is my attempt at the 256-bit version of a vector dot product
of 8 float
values
using AVX-2 instructions,
which seems like it should work:
float aussie_avx2_vecdot_8_floats_buggy( float v1[8], float v2[8]) { // AVX2 dot product: 2 vectors, 8x32-bit floats __m256 r1 = _mm256_loadu_ps(v1); // Load floats __m256 r2 = _mm256_loadu_ps(v2); __m256 dst = _mm256_dp_ps(r1, r2, 0xf1); // Bug! float fret = _mm256_cvtss_f32(dst); return fret; }
But it doesn't!
Instead of working on 8 pairs of float
numbers, it does the vector dot product of only 4 pairs of float
values, just like the first AVX code.
The problem wasn't related to alignment to 256-bit blocks, because I added “alignas(32)
” to the arrays passed in.
It seems that the “_mm256_dp_ps
” intrinsic doesn't actually do 256-bit dot products,
but is similar to the 128-bit “_mm_dp_ps
” intrinsic that does only four float
numbers (128 bits).
These are based on the VDPPS
opcode in the x86 instruction for 32-bit float
values
and there is VDPPD
for 64-bit double
numbers.
However, it seems that “_mm256_dp_ps
” is not using the 256-bit version.
Or maybe my code is just buggy!
• Next: • Up: Table of Contents |
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |