Aussie AI

Learned Activation Parameters

Book Excerpt from "Generative AI in C++"

by David Spuler, Ph.D.

Learned Activation Parameters

Some of the activation functions have “learned activation parameters” or “trainable activation parameters.” For example, Swish/SwiGLU has “alpha” and “beta” parameters that modify activation behavior. Not all activation functions have parameters (e.g. basic RELU and GELU don't), but those that do are technically called “adaptable activation functions.” These extra parameters are stored in the model like weights, but relate to the activation functions, whereas weights apply in matrix multiplications (e.g. linear layers in FFNs).

The advantage of using an activation with trainable parameters is that there is greater opportunity for the model to contain intelligence (i.e. perplexity, accuracy, finesse). The downside is that these activation function parameters require extra computation and also hamper certain optimizations that can be applied to simpler activation functions such as RELU.

The fact that there are granular parameters for the activation functions is quite limiting in terms of optimizing the speed of these parameterized activation functions. We cannot, for example, precompute the entire range of an adaptive activation function, because the parameters would be fixed. Instead, we can precompute parts of the activation function formula, such as calls to expf (to exponentiate the input numbers), but then we have to apply the activation parameters as separate, extra steps. These extra parameter computations can often be vectorized themselves, but it's still an extra step that isn't required with different choices of activation functions (did I mention RELU?).

• Next:

• Up: Table of Contents

• Buy: Generative AI in C++: Coding Transformers and LLMs

The new AI programming book by Aussie AI co-founders:

AI coding in C++
Transformer engine speedups
LLM models
Phone and desktop AI
Code examples
Research citations

Get your copy from Amazon: Generative AI in C++

Aussie AI

Learned Activation Parameters

Learned Activation Parameters

Quick Links

Product

New to Writing?

Writing Styles