Aussie AI

Input Similarity-Based Caching

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

Input Similarity-Based Caching

When an input is similar enough to a prior input, the previous inference results can be cached and re-used. This is applicable to analysis of continual feeds, such as audio or video frames, where the incremental differences are relatively small. This is a type of incremental algorithm for neural network inference.

The overall idea is to detect situations where the input does not need to be processed, because it is similar enough to the previous input. There does not need to be a large cache of previously seen images. Although that can be done, too, it's a different algorithm (i.e. it's the “Inference Cache” idea). For the input similarity approach, only the results from the previous frame are needed. If the previous frame's results are close enough, the new frame can be “skipped” and the prior results retained.

The choice is basically whether to re-use the inference results from the prior video frame, or to re-compute a new set of results. Potentially, the same results can be re-used for multiple frames, if there are minimal changes, but eventually a new computation will be required.

Input similarity could be checked using vector hashing or vector databases. However, more commonly for images, there are non-vector methods to detect when images only have minimal changes between them.

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++