Aussie AI
Transformer Component Fusion
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
Transformer Component Fusion
The operations performed by Transformer components can be fused, which means to combine the code of two operations into one operation, reducing issues with temporary data and other overhead. Some examples of what is possible for fused operations:
- Fused multi-head attention (fused MHA)
- Fused Multiply-Add (FMA)
- Fused normalization (e.g. fused LayerNorm or fused BatchNorm)
- Fused SoftMax
- Fused activations (e.g. fused RELU, fused GELU, fused SwiGLU, etc.)
- Fused Add-Bias
- Fused matrix transpose
Note that all of these kernel fusion examples require two operations to be merged. Usually, the above fused components would be merged back into the prior component.
• Next: • Up: Table of Contents |
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |