Aussie AI

Vectorization with AVX Intrinsics

Book Excerpt from "Generative AI in C++"

by David Spuler, Ph.D.

Vectorization with AVX Intrinsics

The AVX intrinsics are C++ built-in functions that wrap around SIMD instruction codes in the x86 instruction set. The basic AVX intrinsics are 128-bits (4 float values of size 32-bits), AVX-2 is 256 bits (8 float values), and AVX-512 is 512 bits (surprise!), which is 16 float numbers. The upcoming AVX-10 (announced in July 2023) is also 512 bits, but with extra capabilities.

Obviously, since the largest number of floating-point values that can be parallelized is 16, the AVX intrinsics cannot fully vectorize a larger vector of many float values, such as an AI model with dimension 1024. Instead, we can use AVX intrinsics on segments of vectors, and thereby vectorize chunks of the right size to get a speedup.

• Next:

• Up: Table of Contents

• Buy: Generative AI in C++: Coding Transformers and LLMs

The new AI programming book by Aussie AI co-founders:

AI coding in C++
Transformer engine speedups
LLM models
Phone and desktop AI
Code examples
Research citations

Get your copy from Amazon: Generative AI in C++

Aussie AI

Vectorization with AVX Intrinsics

Vectorization with AVX Intrinsics

Quick Links

Product

New to Writing?

Writing Styles