Aussie AI
What is an Activation Functionand?
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
What is an Activation Function?
The idea of an “activation function” is that it controls whether or not a neuron in a neural network will be “activated” or not. For example, early neural networks used a “threshold” or “step” activation function to decide whether a neuron would pass its result along to the next layer. However, this idea is largely opaque in modern Transformer architectures, with the neuron activation concepts overshadowed by massive tensor structures. Even so, activation functions are intricately linked into Transformers and calculated billions of times.
The choice of activation function is an important choice for a model because you're stuck with it. Inference and training must use the same activation function. There are a lot of opinions as to which activation function makes the model the smartest, but there's no disagreement on this: RELU is the fastest.
Activation functions in neural networks can be optimized in various ways. Linear activation functions (e.g. RELU) are more efficient than non-linear functions (e.g. GELU). Non-linear activation functions can be optimized through precalculated lookup tables and approximations. Even linear activation functions can gain further efficiency improvement by fusing them into a MatMul using kernel operator fusion.
• Next: • Up: Table of Contents |
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |