Aussie AI

Length Generalization

Book Excerpt from "Generative AI in C++"

by David Spuler, Ph.D.

Length Generalization

Speed is not the only problem with long contexts. The vanilla Transformers are also not particularly good at generalizing their results with a long context size. Although a key innovation of the Transformer was its “attention” capability, the engine starts to lose track as the output elongates.

This ability to intelligently process long texts is known as “length generalization” (or “length extrapolation”), and improving the accuracy in long inputs and longer outputs is an area of active research.

One of the methods being analyzed to improve length generalization is called “scratchpad” or “chain-of-thought” algorithms. The idea is that the AI inference engine emits rough summaries to an internal scratchpad at regular intervals, which are merged into subsequent inference, thereby the AI helps itself keep track of its own “chain of thoughts” over a longer output sequence.

• Next:

• Up: Table of Contents

• Buy: Generative AI in C++: Coding Transformers and LLMs

The new AI programming book by Aussie AI co-founders:

AI coding in C++
Transformer engine speedups
LLM models
Phone and desktop AI
Code examples
Research citations

Get your copy from Amazon: Generative AI in C++

Aussie AI

Length Generalization

Length Generalization

Quick Links

Product

New to Writing?

Writing Styles