Aussie AI

GELU AVX SIMD Vectorization

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

GELU AVX SIMD Vectorization

The GELU function is an element-wise activation function on an input vector, so it is a good candidate for vectorization. However, the GELU function is complicated to compute in parallel, even though AVX has SIMD support for the error function (“erf”). Raw computations of GELU would require a multiply-by-scalar, erf computation, scalar addition, scalar multiplication, and then non-scale multiplication. We could do all these with sequential AVX intrinsics, but with so many operations, that doesn't seem like a good plan. Hence, our best option is to use precomputation into a lookup-table (LUT) in combination with AVX “gather” intrinsics to vectorize the table lookups.

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++