Aussie AI

Transformer Layers and Components

Book Excerpt from "Generative AI in C++"

by David Spuler, Ph.D.

Transformer Layers and Components

Inside the encoder and decoder blocks there are lots of sub-structures, many of which are in “layers.” For example, GPT-2 is decoder-only with 12 layers of decoders. Sometimes these layers are called the “encoder stack” and “decoder stack.”

However, layers are not the whole story. Each layer has lots of sub-components, and there are also other parts of the engine that aren't in the layers. In fact, there are numerous low-level component parts of an AI engine, such as:

Model Loader
Tokenizer (input module)
Embeddings
Positional Encoding
Vector Arithmetic (e.g. addition)
Matrix Multiplier (MatMul/GEMM)
Attention Heads (i.e. Q, K, and V)
Feed-Forward Network (FFN)
Activation Functions
Normalization
Softmax
Linearization/De-embedding
Decoding Algorithm (choosing words)
Output module (formatting)

So, that's 14 distinct C++ modules you need to write. If we estimate two weeks for each, your engine will be done in a few months. (I wonder, dear reader, did you check my count in the above list?)

But that's not all. We've forgotten training and the above engine would be inference-only. All of the above components are related to both inference and training. The training-specific extra algorithms and modules include:

Learning algorithms (e.g. supervised vs unsupervised)
Training Optimizer (i.e. “gradient descent” method)
Loss function
Dropout
Evaluation metrics

• Next:

• Up: Table of Contents

• Buy: Generative AI in C++: Coding Transformers and LLMs

The new AI programming book by Aussie AI co-founders:

AI coding in C++
Transformer engine speedups
LLM models
Phone and desktop AI
Code examples
Research citations

Get your copy from Amazon: Generative AI in C++

Aussie AI

Transformer Layers and Components

Transformer Layers and Components

Quick Links

Product

New to Writing?

Writing Styles