Aussie AI

Transformer Architecture Choices

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

Transformer Architecture Choices

There are various architectural decisions that are made in the model design phase, which aren't really optimizations of a model, but can significantly impact its efficiency. Using a more advanced engine architecture is also effectively an optimization that “retains” accuracy because these changes allow the model to be fully trained in a better engine. Some important decisions include:

  • Decoder-only versus encoder-decoder architectures
  • Alternative floating-point representations (e.g. brain float)
  • Pre-norm versus post-norm
  • Positional encoding algorithms (embeddings)
  • Context length optimizations
  • Neural Architecture Search (NAS)

Data doesn't just magically end up in the GPU. There has to be software written to send the data there, and there are a lot of possible optimizations that are used in writing such software. This software is often called the “kernel” of the AI engine. The sub-components of the engine often get called the MatMul kernel, Softmax kernel, normalization kernel, and so on. Software techniques that aim to optimize parallelization primarily by increasing throughput and reducing latency include:

  • Vectorization
  • Multi-threading
  • Kernel fusion
  • Kernel fission
  • Pipelining
  • Scheduling algorithms

Memory usage optimizations: Software optimizations that aim to improve memory usage, and thereby benefit further from lowering memory access overhead to increase parallelism, include:

  • Tiling
  • Data locality optimizations
  • Dataflow optimizations
  • Memory management optimizations
  • Cache management
  • Prefetching
  • Offloading

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++