Aussie AI
Activation Function Research
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
Activation Function Research
There are several avenues for speedups in the use of activation functions that have received research attention:
- Approximations
- Integer-only activation functions
- Pruning activation function components
- Reordering activation components
Approximations. Some approximations have been examined above in this chapter. Various research exists that examines approximations for each of the different types of activation function.
Integer-only Activation Functions. One particular type of approximation is the change to integer-only arithmetic. This is trivial for RELU, but problematic for the other non-linear activation functions. This is also required to achieve end-to-end integer arithmetic in Transformers, which is an ongoing area of research.
Pruning Activation Functions. In some cases, the activation function component can be removed, or “pruned,” so that the original calculated values are used without modification. For example, in the vanilla FFN component, removing the interleaved activation function from between the two linear layers creates what is termed a “bilinear layer” component (i.e. two matrix multiplications without any activation function). Research has examined the situations when this is possible.
Activation Function Reordering. The standard Transformer has an activation function in between the two linear layers of the FFN. Early research has examined “pre-activation” versus “post-activation” inside the FNN for different effects.
• Next: • Up: Table of Contents |
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |