Aussie AI
RELU AVX SIMD Vectorization
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
RELU AVX SIMD Vectorization
The RELU function is applied element-wise to a vector of computed values.
We can compute RELU in parallel on 4 float
values (AVX-1) or 8 float
values (AVX-2)
using a SIMD “max
” computation with a register full of zeros as the other operand.
This implementation is shown in Chapter 17.
I'm not aware of a single-instruction hardware RELU implementation on x86 or other CPUs, although there may well be one. There are certainly various research papers on computing RELU in hardware. It's a simple computation: if sign bit is on, then clear every bit to zero, else do nothing. For example, do a bitwise-and of all 32 bits against the inverted sign bit.
• Next: • Up: Table of Contents |
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |