Aussie AI

Length Generalization

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

Length Generalization

Speed is not the only problem with long contexts. The vanilla Transformers are also not particularly good at generalizing their results with a long context size. Although a key innovation of the Transformer was its “attention” capability, the engine starts to lose track as the output elongates.

This ability to intelligently process long texts is known as “length generalization” (or “length extrapolation”), and improving the accuracy in long inputs and longer outputs is an area of active research.

One of the methods being analyzed to improve length generalization is called “scratchpad” or “chain-of-thought” algorithms. The idea is that the AI inference engine emits rough summaries to an internal scratchpad at regular intervals, which are merged into subsequent inference, thereby the AI helps itself keep track of its own “chain of thoughts” over a longer output sequence.

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++