Aussie AI

Masking and Lookahead

Book Excerpt from "Generative AI in C++"

by David Spuler, Ph.D.

Masking and Lookahead

The above discussion makes it sound like attention only looks backwards at the previously emitted tokens, whereas the situation is more complicated. When an engine starts processing an input query from a user, it actually has an existing sequence of tokens from the prompt, and can “look ahead” at some of the upcoming tokens, too. This idea is executed in the “encoder” part of the Transformer, which can look at the entirety of the user prompt. However, the decoder is typically disallowed from lookahead features, and uses “masked attention” which blocks the decoder from looking at future tokens (i.e., its view of the future tokens is “masked off”). Hence, the usual decoder is only allowed to look at the already-produced output tokens from the engine.

Thus, in the vanilla encoder-decoder architecture, there are two different types of attention. The encoder is used to pay attention to token positions of the input text (i.e. the user prompt) and that is a major part of its “encoding” intelligence. The decoder's attention mechanism pays attention only to the output sequence that it has already produced, rather than the input sequence. The decoder indirectly gets attention information about the input prompt from the encoder via “cross attention” links, but the decoder doesn't directly examine the input text.

• Next:

• Up: Table of Contents

• Buy: Generative AI in C++: Coding Transformers and LLMs

The new AI programming book by Aussie AI co-founders:

AI coding in C++
Transformer engine speedups
LLM models
Phone and desktop AI
Code examples
Research citations

Get your copy from Amazon: Generative AI in C++

Aussie AI

Masking and Lookahead

Masking and Lookahead

Quick Links

Product

New to Writing?

Writing Styles