Aussie AI

GELU Activation Function

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

GELU Activation Function

The term “GELU” usually means the “GELU-New” function, which is the product of x times the Gaussian Phi function of x, rather than “GELU-Old” which is simply the Gaussian Phi function. Here is the basic mathematical version of GELU (i.e. GELU-New), in unoptimized C++ code, according to the original paper:

    float aussie_GELU_basic(float x)   
    {
        // Basic Gaussian GELU (inefficient)
        float phival = 0.5 * (1.0 + erff(x / sqrt(2.0)));
        return x * phival;
    }

Note that erff is the float version of the “error function” from statistics.

The basic GELU arithmetic can be optimized by precomputing sqrt(2.0) using its reciprocal so as to multiply rather than divide, and avoiding the use of a temporary variable. Here's a slightly improved version:

    float aussie_GELU_basic2(float x)  
    {
        // Basic Gaussian GELU (still inefficient)
        // Once-only initialization
        static float s_reciprocal_sqrt_2_0 = 1.0f / sqrtf(2.0f);
        return x * ( 0.5 * (1.0 + erff(x * s_reciprocal_sqrt_2_0)));
    }

To further optimize GELU, there are two approximations given in the original paper, which are examined later in this chapter. And the code can then be further optimized via calculation changes and table lookups, as shown in the GELU approximations example code.

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++