Aussie AI

Activation Function Research

Book Excerpt from "Generative AI in C++"

by David Spuler, Ph.D.

Activation Function Research

There are several avenues for speedups in the use of activation functions that have received research attention:

Approximations
Integer-only activation functions
Pruning activation function components
Reordering activation components

Approximations. Some approximations have been examined above in this chapter. Various research exists that examines approximations for each of the different types of activation function.

Integer-only Activation Functions. One particular type of approximation is the change to integer-only arithmetic. This is trivial for RELU, but problematic for the other non-linear activation functions. This is also required to achieve end-to-end integer arithmetic in Transformers, which is an ongoing area of research.

Pruning Activation Functions. In some cases, the activation function component can be removed, or “pruned,” so that the original calculated values are used without modification. For example, in the vanilla FFN component, removing the interleaved activation function from between the two linear layers creates what is termed a “bilinear layer” component (i.e. two matrix multiplications without any activation function). Research has examined the situations when this is possible.

Activation Function Reordering. The standard Transformer has an activation function in between the two linear layers of the FFN. Early research has examined “pre-activation” versus “post-activation” inside the FNN for different effects.

• Next:

• Up: Table of Contents

• Buy: Generative AI in C++: Coding Transformers and LLMs

The new AI programming book by Aussie AI co-founders:

AI coding in C++
Transformer engine speedups
LLM models
Phone and desktop AI
Code examples
Research citations

Get your copy from Amazon: Generative AI in C++

Aussie AI

Activation Function Research

Activation Function Research

Quick Links

Product

New to Writing?

Writing Styles