Aussie AI

Optimization of Activation Functions

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

Optimization of Activation Functions

In order to optimize the speed of the various activation function computations, several techniques are available:

  • Choose a fast activation function (e.g. RELU)
  • Choose an activation function without trainable parameters.
  • Algebraic approximations of the activation function
  • Precomputed lookup tables (sequential)
  • Basic vectorization (e.g. with AVX operation sequences)
  • Vectorization of the precomputed lookup tables (i.e. parallel LUTs)
  • Kernel fusion (e.g. fuse the activation function computations back into a MatMul kernel).

Which activation function is the fastest? Why, it's RELU, of course. I mean, it's more like a typo than some real coding. Does RELU even deserve to be called a “function”?

The logic of RELU is simply to convert all negatives to zero, but leave positive values unchanged. This can be as fast as a sign bit test, making RELU the fastest activation to compute.

The other functions are “non-linear” which is a cryptic way of saying “slooow.” GELU and SwiGLU usually need to be approximated to be efficient, or, even better, pre-calculated into a lookup table, assuming you're not using 32-bit float values (or maybe you're running with a 16GB precomputed LUT for all 32-bits).

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++