Aussie AI

Common Activation Functions

Book Excerpt from "Generative AI in C++"

by David Spuler, Ph.D.

Common Activation Functions

Various functions have been tested as activation functions, both linear and non-linear. The main activation functions that have emerged in practical usage of Transformers for LLMs are:

RELU (Rectified Linear Unit)
GELU (Gaussian Linear Unit)
SwiGLU (Swish Linear Unit)
SiLU (Sigmoid Linear Unit)

RELU is one of the earliest activation functions, but has stood firm over the years, with applicability for many usages. However, the default OpenAI GPT2 and GPT3 activation function was called “GELU-New,” which is actually what is usually meant by “GELU” nowadays, but these architectures could have been trained with RELU, Swish, or GELU-Old. InstructGPT uses the sigmoid/SiLU activation function. The Llama and Llama2 models from Meta's Facebook Research use Swish/SwiGLU for their activation function. Although confidential, apparently GPT-4 uses the sigmoid activation function (i.e. SiLU) in its loss function. RELU is still often used in many open source models for lower-end architectures because of its efficiency.

There are various other ones that have been used in earlier research, or are sometimes still used:

Step function
tanh (hyperbolic tangent)
Leaky RELU
ELU (Exponential Linear Unit)
Softplus (not to be confused with “Softmax”)

• Next:

• Up: Table of Contents

• Buy: Generative AI in C++: Coding Transformers and LLMs

The new AI programming book by Aussie AI co-founders:

AI coding in C++
Transformer engine speedups
LLM models
Phone and desktop AI
Code examples
Research citations

Get your copy from Amazon: Generative AI in C++

Aussie AI

Common Activation Functions

Common Activation Functions

Quick Links

Product

New to Writing?

Writing Styles