Aussie AI
What are AVX Intrinsicsand?
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
What are AVX Intrinsics?
AVX intrinsics are SIMD parallel instructions for x86 and x64 architectures. They are actually machine opcodes supported by the x86/x64 CPU, but are wrapped in the intrinsic prototypes for easy access from a C++ program.
The main advantage of SIMD instructions is that they are CPU-supported parallel optimizations. Hence, they do not require a GPU, and can even be used on a basic Windows laptop. The main downside is that their level of parallelism is nowhere near that of a high-end GPU.
There are multiple generations of AVX intrinsics based on x86/x64 CPU instructions. Different CPUs support different features, and exactly which intrinsic calls can be used will depend on the CPU on which your C++ is running. The basic AVX types are:
- AVX — 128-bit registers = 4 x 32-bit
floatvalues - AVX-2 — 256-bit registers = 8 x 32-bit
floatvalues - AVX-512 — 512-bit registers = 16 x 32-bit
floatvalues - AVX-10 — 512-bit registers (with speedups)
The AVX intrinsics use C++ type names to declare variables for their registers.
The float types used to declare the registers in AVX using C++
all have a double-underscore prefix
with “__m128” for 128-bit registers (4 floats), “__m256” for 256 bit registers (8 floats),
and “__m512” for 512 bits (16 floats).
Similarly, there are also register type names for int types (__m128i, __m256i, and __m512i),
and types for “double” registers (__m128d, __m256d, and __m512d).
AVX intrinsic functions and their types are declared as ordinary function prototypes in header files.
The header files that you may need to include for these intrinsics include <intrin.h>, <emmintrin.h>,
and <immintrin.h>.
Useful AVX SIMD vector intrinsics for float types include:
- Initialize to all-zeros —
_mm_setzero_ps,_mm256_setzero_ps - Set all values to a single
float—_mm_set1_ps,_mm256_set1_ps - Set to 4 or 8 values —
_mm_set_ps,_mm256_set_ps - Load from arrays to AVX registers —
_mm_loadu_ps,_mm256_loadu_ps - Store registers back to
floatarrays —_mm_storeu_ps,_mm256_storeu_ps - Addition —
_mm_add_ps,_mm256_add_ps - Multiplication —
_mm_mul_ps(SSE),_mm256_mul_ps(AVX-2) - Vector dot product —
_mm_dp_ps,_mm256_dp_ps - Fused Multiply-Add (FMA —
_mm_fmadd_ps,_mm256_fmadd_ps - Horizontal addition (pairwise) —
_mm_hadd_ps,_mm256_hadd_ps
Note that the names of the intrinsic functions have meaningful suffixes.
The “_ps” suffix means “packed-single-precision” (i.e. float),
whereas “_pd” suffix means “packed-double-precision” (i.e. double).
|
• Next: • Up: Table of Contents |
|
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |