Aussie AI

RELU AVX SIMD Vectorization

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

RELU AVX SIMD Vectorization

The RELU function is applied element-wise to a vector of computed values. We can compute RELU in parallel on 4 float values (AVX-1) or 8 float values (AVX-2) using a SIMD “max” computation with a register full of zeros as the other operand. This implementation is shown in Chapter 17.

I'm not aware of a single-instruction hardware RELU implementation on x86 or other CPUs, although there may well be one. There are certainly various research papers on computing RELU in hardware. It's a simple computation: if sign bit is on, then clear every bit to zero, else do nothing. For example, do a bitwise-and of all 32 bits against the inverted sign bit.

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++