Aussie AI

What are Encoders and Decodersand?

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

What are Encoders and Decoders?

The original 2017 Transformer had two major structures: encoders and decoders. Both are still used in various different ways by modern Transformers, and each of these structures has many sub-components and layers. However, an encoder-decoder architecture is not the only way to fly. In fact, at the top-level overview, the main types of Transformer architectures are:

  • Encoder-decoder — original vanilla Transformer
  • Encoder-only — BERT (research)
  • Decoder-only — GPT-2, GPT-3

From this list you can see that although the very first GPT in 2017 was based on an encoder-decoder architecture, more recent commercial models including GPT-2 and GPT-3 use decoder-only engines. Encoder-decoder architectures have since been largely relegated to foreign language translation use cases, where encoding and decoding are very distinct, each in different languages. Encoder-only architectures have quite limited use cases, mainly where there's no need to produce a long output, such as classification. Anything that involves using an LLM to write a Shakespearean sonnet about your first-born child is running in a decoder-only Transformer architecture.

Why decoder-only? Research found that removing the encoder was faster, by greatly reducing the number of weights/parameters. Half of the LLM weights were unnecessary. This was possible because the encoder was largely redundant for most LLM-related use cases since its operation was similar to a decoder anyway. Instead of an encoder, decoder-only architectures use an initialization phase called “prefill” that runs an encoder-like process in the decoder. Hence, most models changed to decoder-only architectures from GPT-2 onwards, and subsequently for GPT-3, GPT-3.5, ChatGPT, InstructGPT, and (unofficially) GPT-4.

Meta's Llama and Llama2 models are also decoder-only, similar to GPT-3 versions. Google Gemini and its earlier Bard or PaLM versions are also based on a decoder-only architecture. Although Google's Gemini model was rumored to be reverting to a multimodal encoder-decoder architecture, its release information confirmed a decoder-only architecture. Pity all those poor unwanted encoders.

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++