Aussie AI
SwiGLUSwish Activation Function
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
SwiGLU/Swish Activation Function
The SwiGLU activation function has become widely used in commercial AI models.
It is based on the “Swish” function,
which has become a popular activation function,
notably used in the Llama models from Meta.
Swish is based on the sigmoid function,
and the SwiGLU activation function is a generalization of the SiLU activation function,
using an extra parameter (beta
),
in this formula:
Swish(x) = x * sigmoid(beta * x)
The beta
parameter can be a constant or a trained activation function parameter.
If the parameter beta equals 1, then the Swish function is simply the SiLU activation function
(x times the sigmoid function).
Swish has independent significance for other values of beta
.
Here is the basic C++ to compute a simple Swish function:
float aussie_swish(float x, float beta) { // SWISH = x * sigmoid(beta * x) return x * aussie_sigmoid(beta * x); }
Here is the C++ version with the sigmoid function call flattened:
float aussie_swish2(float x, float beta) { // SWISH = x * sigmoid(beta * x) // SIGMOID = 1 / ( 1 + e^-x) return x * ( 1.0f / (1.0f + expf(-(beta * x)))); }
• Next: • Up: Table of Contents |
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |