Aussie AI
Vectorization with AVX Intrinsics
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
Vectorization with AVX Intrinsics
The AVX intrinsics are C++ built-in functions that
wrap around SIMD instruction codes in the x86 instruction set.
The basic AVX intrinsics are 128-bits (4 float
values of size 32-bits),
AVX-2 is 256 bits (8 float
values),
and AVX-512 is 512 bits (surprise!), which is 16 float
numbers.
The upcoming AVX-10 (announced in July 2023) is also 512 bits, but with extra capabilities.
Obviously, since the largest number of floating-point values that
can be parallelized is 16,
the AVX intrinsics cannot fully vectorize a larger vector of many float
values,
such as an AI model with dimension 1024.
Instead, we can use AVX intrinsics on segments of vectors,
and thereby vectorize chunks of the right size to get a speedup.
• Next: • Up: Table of Contents |
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |