Aussie AI
RELU AVX SIMD Vectorization
- 
                                            Book Excerpt from "Generative AI in C++"
 
- 
                                            by David Spuler, Ph.D.
 
RELU AVX SIMD Vectorization
The RELU function is applied element-wise to a vector of computed values.
We can compute RELU in parallel on 4 float values (AVX-1) or 8 float values (AVX-2)
using a SIMD “max” computation with a register full of zeros as the other operand.
This implementation is shown in Chapter 17.
I'm not aware of a single-instruction hardware RELU implementation on x86 or other CPUs, although there may well be one. There are certainly various research papers on computing RELU in hardware. It's a simple computation: if sign bit is on, then clear every bit to zero, else do nothing. For example, do a bitwise-and of all 32 bits against the inverted sign bit.
| 
 • Next: • Up: Table of Contents  | 
 
 | 
The new AI programming book by Aussie AI co-founders:
 Get your copy from Amazon: Generative AI in C++  |