Aussie AI

Optimization of Activation Functions

Book Excerpt from "Generative AI in C++"

by David Spuler, Ph.D.

Optimization of Activation Functions

In order to optimize the speed of the various activation function computations, several techniques are available:

Choose a fast activation function (e.g. RELU)
Choose an activation function without trainable parameters.
Algebraic approximations of the activation function
Precomputed lookup tables (sequential)
Basic vectorization (e.g. with AVX operation sequences)
Vectorization of the precomputed lookup tables (i.e. parallel LUTs)
Kernel fusion (e.g. fuse the activation function computations back into a MatMul kernel).

Which activation function is the fastest? Why, it's RELU, of course. I mean, it's more like a typo than some real coding. Does RELU even deserve to be called a “function”?

The logic of RELU is simply to convert all negatives to zero, but leave positive values unchanged. This can be as fast as a sign bit test, making RELU the fastest activation to compute.

The other functions are “non-linear” which is a cryptic way of saying “slooow.” GELU and SwiGLU usually need to be approximated to be efficient, or, even better, pre-calculated into a lookup table, assuming you're not using 32-bit float values (or maybe you're running with a 16GB precomputed LUT for all 32-bits).

• Next:

• Up: Table of Contents

• Buy: Generative AI in C++: Coding Transformers and LLMs

The new AI programming book by Aussie AI co-founders:

AI coding in C++
Transformer engine speedups
LLM models
Phone and desktop AI
Code examples
Research citations

Get your copy from Amazon: Generative AI in C++

Aussie AI

Optimization of Activation Functions

Optimization of Activation Functions

Quick Links

Product

New to Writing?

Writing Styles