Aussie AI
GELU Activation Function
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
GELU Activation Function
The term “GELU” usually means the “GELU-New” function, which is the product of x
times the Gaussian Phi function of x
,
rather than “GELU-Old” which is simply the Gaussian Phi function.
Here is the basic mathematical version of GELU (i.e. GELU-New), in unoptimized C++ code, according to the original paper:
float aussie_GELU_basic(float x) { // Basic Gaussian GELU (inefficient) float phival = 0.5 * (1.0 + erff(x / sqrt(2.0))); return x * phival; }
Note that erff
is the float version of the “error function” from statistics.
The basic GELU arithmetic can be optimized by precomputing sqrt(2.0)
using its reciprocal
so as to multiply rather than divide, and avoiding the use of a temporary variable.
Here's a slightly improved version:
float aussie_GELU_basic2(float x) { // Basic Gaussian GELU (still inefficient) // Once-only initialization static float s_reciprocal_sqrt_2_0 = 1.0f / sqrtf(2.0f); return x * ( 0.5 * (1.0 + erff(x * s_reciprocal_sqrt_2_0))); }
To further optimize GELU, there are two approximations given in the original paper, which are examined later in this chapter. And the code can then be further optimized via calculation changes and table lookups, as shown in the GELU approximations example code.
• Next: • Up: Table of Contents |
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |