Aussie AI

AVX-2 SIMD Multiplication

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

AVX-2 SIMD Multiplication

Here is the AVX-2 version of pairwise SIMD multiply with intrinsics for 256-bit registers, which is eight 32-bit float variables.

    void aussie_avx2_multiply_8_floats(
        float v1[8], float v2[8], float vresult[8])
    {
        // Multiply 8x32-bit floats in 256-bit AVX2 registers
        __m256 r1 = _mm256_loadu_ps(v1);   // Load floats
        __m256 r2 = _mm256_loadu_ps(v2);
        __m256 dst = _mm256_mul_ps(r1, r2);  // Multiply (SIMD)
        _mm256_storeu_ps(vresult, dst);  // Convert to 8 floats
    }

This is similar to the basic AVX 128-bit version, with some differences:

  • The type for 256-bit registers is “__m256”.
  • The AVX-2 loading intrinsic is “_mm256_loadu_ps”.
  • The AVX-2 multiplication intrinsic is “_mm256_mul_ps”.
  • The conversion back to float uses AVX-2 intrinsic “_mm256_storeu_ps”.

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++