Aussie AI
Length Generalization
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
Length Generalization
Speed is not the only problem with long contexts. The vanilla Transformers are also not particularly good at generalizing their results with a long context size. Although a key innovation of the Transformer was its “attention” capability, the engine starts to lose track as the output elongates.
This ability to intelligently process long texts is known as “length generalization” (or “length extrapolation”), and improving the accuracy in long inputs and longer outputs is an area of active research.
One of the methods being analyzed to improve length generalization is called “scratchpad” or “chain-of-thought” algorithms. The idea is that the AI inference engine emits rough summaries to an internal scratchpad at regular intervals, which are merged into subsequent inference, thereby the AI helps itself keep track of its own “chain of thoughts” over a longer output sequence.
• Next: • Up: Table of Contents |
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |