Aussie AI

Transformer Layers and Components

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

Transformer Layers and Components

Inside the encoder and decoder blocks there are lots of sub-structures, many of which are in “layers.” For example, GPT-2 is decoder-only with 12 layers of decoders. Sometimes these layers are called the “encoder stack” and “decoder stack.”

However, layers are not the whole story. Each layer has lots of sub-components, and there are also other parts of the engine that aren't in the layers. In fact, there are numerous low-level component parts of an AI engine, such as:

  • Model Loader
  • Tokenizer (input module)
  • Embeddings
  • Positional Encoding
  • Vector Arithmetic (e.g. addition)
  • Matrix Multiplier (MatMul/GEMM)
  • Attention Heads (i.e. Q, K, and V)
  • Feed-Forward Network (FFN)
  • Activation Functions
  • Normalization
  • Softmax
  • Linearization/De-embedding
  • Decoding Algorithm (choosing words)
  • Output module (formatting)

So, that's 14 distinct C++ modules you need to write. If we estimate two weeks for each, your engine will be done in a few months. (I wonder, dear reader, did you check my count in the above list?)

But that's not all. We've forgotten training and the above engine would be inference-only. All of the above components are related to both inference and training. The training-specific extra algorithms and modules include:

  • Learning algorithms (e.g. supervised vs unsupervised)
  • Training Optimizer (i.e. “gradient descent” method)
  • Loss function
  • Dropout
  • Evaluation metrics

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++