Aussie AI
Masking and Lookahead
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
Masking and Lookahead
The above discussion makes it sound like attention only looks backwards at the previously emitted tokens, whereas the situation is more complicated. When an engine starts processing an input query from a user, it actually has an existing sequence of tokens from the prompt, and can “look ahead” at some of the upcoming tokens, too. This idea is executed in the “encoder” part of the Transformer, which can look at the entirety of the user prompt. However, the decoder is typically disallowed from lookahead features, and uses “masked attention” which blocks the decoder from looking at future tokens (i.e., its view of the future tokens is “masked off”). Hence, the usual decoder is only allowed to look at the already-produced output tokens from the engine.
Thus, in the vanilla encoder-decoder architecture, there are two different types of attention. The encoder is used to pay attention to token positions of the input text (i.e. the user prompt) and that is a major part of its “encoding” intelligence. The decoder's attention mechanism pays attention only to the output sequence that it has already produced, rather than the input sequence. The decoder indirectly gets attention information about the input prompt from the encoder via “cross attention” links, but the decoder doesn't directly examine the input text.
• Next: • Up: Table of Contents |
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |