Aussie AI

SwiGLUSwish Activation Function

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

SwiGLU/Swish Activation Function

The SwiGLU activation function has become widely used in commercial AI models. It is based on the “Swish” function, which has become a popular activation function, notably used in the Llama models from Meta. Swish is based on the sigmoid function, and the SwiGLU activation function is a generalization of the SiLU activation function, using an extra parameter (beta), in this formula:

    Swish(x) = x * sigmoid(beta * x)

The beta parameter can be a constant or a trained activation function parameter. If the parameter beta equals 1, then the Swish function is simply the SiLU activation function (x times the sigmoid function). Swish has independent significance for other values of beta.

Here is the basic C++ to compute a simple Swish function:

    float aussie_swish(float x, float beta)
    {
        // SWISH = x * sigmoid(beta * x)
        return x * aussie_sigmoid(beta * x);
    }

Here is the C++ version with the sigmoid function call flattened:

    float aussie_swish2(float x, float beta)
    {
        // SWISH = x * sigmoid(beta * x)
        // SIGMOID = 1 / ( 1 + e^-x)
        return x * ( 1.0f / (1.0f + expf(-(beta * x))));

    }

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++