Aussie AI

Example: AVX 128-Bit Dot Product

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

Example: AVX 128-Bit Dot Product

The AVX instruction set has a vector dot product intrinsic that wraps an x86 dot product instruction. There are versions of the dot product intrinsic for AVX (128-bit), AVX-2 (256-bit) and AVX-512 (512-bit).

For basic AVX (128 bits), this is a full vector dot product of two vectors with 4 x 32-bit float numbers in each vector. One oddity is that although the result is a floating-point scalar (i.e. a single 32-bit float), it's still stored in a 128-bit register, and must be extracted using the “_mm_cvtss_f32” intrinsic. The example code looks like:

    float aussie_avx_vecdot_4_floats(float v1[4], float v2[4])
    {
        // AVX dot product: 2 vectors of 4x32-bit floats
        __m128 r1 = _mm_loadu_ps(v1);   // Load floats
        __m128 r2 = _mm_loadu_ps(v2);
        __m128 dst = _mm_dp_ps(r1, r2, 0xf1); // Dot product
        float fret = _mm_cvtss_f32(dst);  // Extract float
        return fret;
    }

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++