Aussie AI
What is Attentionand?
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
What is Attention?
The attention mechanism is one of the major breakthroughs that allowed advanced AI to take shape. After all, the seminal 2017 Transformer paper was titled Attention is all you need (Vaswani et al., 2017). It's such an endlessly cited paper that it must be a real downer if your name is in the “et al” part, so here's the full list of names: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin; see https://arxiv.org/abs/1706.03762.
What's so great about that paper? The overall class of attention algorithms used in Transformers is called “self-attention” because the different tokens in a sequence pay attention to each other and their relative positions. The vanilla Transformer used a specific type of self-attention called “scaled dot-product attention.”
The idea for attention comes from human intelligence. When we are considering something, we tend to pay more attention to certain features than others. This is true when humans examine words in a sentence or parts of an image. Hence, the AI idea is to apply attention to tokens with different weightings, and have the model learn these weights through training.
Attention is a very powerful mechanism in terms of model capability. It allows the model to learn how much “attention” it should pay to other tokens in the sequence. Hence, it is a mapping between tokens (or words) that indicates how to interrelate the presence of a token in a sequence with the probabilities of the next output token. Thus, it is deeply involved in deciding on the next token to output based on what tokens have previously appeared.
• Next: • Up: Table of Contents |
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |