Aussie AI

Input Similarity-Based Caching

Book Excerpt from "Generative AI in C++"

by David Spuler, Ph.D.

Input Similarity-Based Caching

When an input is similar enough to a prior input, the previous inference results can be cached and re-used. This is applicable to analysis of continual feeds, such as audio or video frames, where the incremental differences are relatively small. This is a type of incremental algorithm for neural network inference.

The overall idea is to detect situations where the input does not need to be processed, because it is similar enough to the previous input. There does not need to be a large cache of previously seen images. Although that can be done, too, it's a different algorithm (i.e. it's the “Inference Cache” idea). For the input similarity approach, only the results from the previous frame are needed. If the previous frame's results are close enough, the new frame can be “skipped” and the prior results retained.

The choice is basically whether to re-use the inference results from the prior video frame, or to re-compute a new set of results. Potentially, the same results can be re-used for multiple frames, if there are minimal changes, but eventually a new computation will be required.

Input similarity could be checked using vector hashing or vector databases. However, more commonly for images, there are non-vector methods to detect when images only have minimal changes between them.

• Next:

• Up: Table of Contents

• Buy: Generative AI in C++: Coding Transformers and LLMs

The new AI programming book by Aussie AI co-founders:

AI coding in C++
Transformer engine speedups
LLM models
Phone and desktop AI
Code examples
Research citations

Get your copy from Amazon: Generative AI in C++

Aussie AI

Input Similarity-Based Caching

Input Similarity-Based Caching

Quick Links

Product

New to Writing?

Writing Styles