Aussie AI
What are Encoders and Decodersand?
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
What are Encoders and Decoders?
The original 2017 Transformer had two major structures: encoders and decoders. Both are still used in various different ways by modern Transformers, and each of these structures has many sub-components and layers. However, an encoder-decoder architecture is not the only way to fly. In fact, at the top-level overview, the main types of Transformer architectures are:
- Encoder-decoder — original vanilla Transformer
- Encoder-only — BERT (research)
- Decoder-only — GPT-2, GPT-3
From this list you can see that although the very first GPT in 2017 was based on an encoder-decoder architecture, more recent commercial models including GPT-2 and GPT-3 use decoder-only engines. Encoder-decoder architectures have since been largely relegated to foreign language translation use cases, where encoding and decoding are very distinct, each in different languages. Encoder-only architectures have quite limited use cases, mainly where there's no need to produce a long output, such as classification. Anything that involves using an LLM to write a Shakespearean sonnet about your first-born child is running in a decoder-only Transformer architecture.
Why decoder-only? Research found that removing the encoder was faster, by greatly reducing the number of weights/parameters. Half of the LLM weights were unnecessary. This was possible because the encoder was largely redundant for most LLM-related use cases since its operation was similar to a decoder anyway. Instead of an encoder, decoder-only architectures use an initialization phase called “prefill” that runs an encoder-like process in the decoder. Hence, most models changed to decoder-only architectures from GPT-2 onwards, and subsequently for GPT-3, GPT-3.5, ChatGPT, InstructGPT, and (unofficially) GPT-4.
Meta's Llama and Llama2 models are also decoder-only, similar to GPT-3 versions. Google Gemini and its earlier Bard or PaLM versions are also based on a decoder-only architecture. Although Google's Gemini model was rumored to be reverting to a multimodal encoder-decoder architecture, its release information confirmed a decoder-only architecture. Pity all those poor unwanted encoders.
• Next: • Up: Table of Contents |
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |