Aussie AI

What is Cross Attentionand?

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

What is Cross Attention?

Cross attention is a high-level crossover of attention results between encoder and decoder, whereas the QKV computations are the low-level mechanism inside the attention heads. The cross attention mechanism allows the decoder to pay attention to the encoder output in every layer of the decoder.

In the vanilla encoder-decoder architecture, cross attention allows the decoder to get some attention information about the input prompt. The encoder's attention calculations are based on analyzing the input prompt with lookahead. The decoder's attention is focused only on the output sequence (using “masked attention”), and it doesn't directly analyze the input prompt. Hence, the decoder indirectly pays attention to the input prompt via cross attention results coming across from the encoder.

In a decoder-only architecture (e.g. GPT), the whole of cross attention is removed because there isn't an encoder to provide this input. Similarly, in an encoder-only architecture (e.g. BERT), you can code up cross attention if you like, but there'll be no-one listening on the other end.

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++