Aussie AI
Optimization of Activation Functions
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
Optimization of Activation Functions
In order to optimize the speed of the various activation function computations, several techniques are available:
- Choose a fast activation function (e.g. RELU)
- Choose an activation function without trainable parameters.
- Algebraic approximations of the activation function
- Precomputed lookup tables (sequential)
- Basic vectorization (e.g. with AVX operation sequences)
- Vectorization of the precomputed lookup tables (i.e. parallel LUTs)
- Kernel fusion (e.g. fuse the activation function computations back into a MatMul kernel).
Which activation function is the fastest? Why, it's RELU, of course. I mean, it's more like a typo than some real coding. Does RELU even deserve to be called a “function”?
The logic of RELU is simply to convert all negatives to zero, but leave positive values unchanged. This can be as fast as a sign bit test, making RELU the fastest activation to compute.
The other functions are “non-linear” which is a cryptic way of saying “slooow.”
GELU and SwiGLU usually need to be approximated to be efficient,
or, even better, pre-calculated into a lookup table,
assuming you're not using 32-bit float
values
(or maybe you're running with a 16GB precomputed LUT for all 32-bits).
• Next: • Up: Table of Contents |
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |