Aussie AI

Prompt History and Context

Book Excerpt from "Generative AI in C++"

by David Spuler, Ph.D.

Prompt History and Context

The architecture gets more complicated when the use case requires the AI engine to incorporate the user's history of prompts in their current conversation. For example, if it's a chatbot to take your food order, it needs to know what you've already ordered so that it can annoyingly push you to buy more stuff than you need.

The main way is to use your existing AI engine, but simply prepend the prior conversation to the user's new prompt. In other words, the AI engine simply receives each request in a stateless style, but each request includes all of the necessary prior context.

Implementing this architecture requires that the current session's history of prompts and responses are both stored in a session-specific data store. This might be a temporary store for guest sessions and/or a permanent store for signed-in users. Either way, the main point is that the text of the prompts and engine responses is available to be used for the next incoming request. The new prompt is then appended to the end of the conversation, and the whole conversation can be passed to the AI engine.

There are some downsides to this simple approach. Firstly, it's not always that effective, and may require some fancier prompt engineering before it works well. Some AI engines are beginning to have options to explicitly send these two inputs separately in the same API request, which may improve this situation. Secondly, it's sending a lot of extra tokens to the AI engine, which are expensive to process, whether it's extra dollars in the billing statement for a commercial fee-based engine (e.g. OpenAI's API) or the hidden cost of increased load on your own GPU hosting.

One idea to reduce costs is to store a “summary” of the prior conversation, rather than all of it, so fewer tokens are prepended. Summarization is a whole research area in itself, and there are various approaches. For example, this could be achieved via some types of simple heuristics (e.g. just remove unimportant “stop words”) or via AI-based summarization algorithm (although that extra expense probably defeats the purpose). The research areas of “prompt compression” and “document summarization” may be relevant here.

More advanced approaches than prepending the prior conversation are possible to handle an incoming request with history. There are various ways to store and handle the “context” of a long user conversation with prompt and answer history. This area is called “conversational AI” and may require changes to your AI engine architecture.

Finally, this area is changing rapidly and the above may be outdated by the time you read this. It's a fairly obvious extension to a commercial API for the provider to track the context for you, rather than impolitely foisting that requirement onto every programmer. Admittedly, it's also not a cheap capability to add, because the API provider would need to store a huge amount of extra context data, in return for getting paid less because you'd be sending them fewer tokens. Nevertheless, I expect to see commercial APIs having this functionality soon.

• Next:

• Up: Table of Contents

• Buy: Generative AI in C++: Coding Transformers and LLMs

The new AI programming book by Aussie AI co-founders:

AI coding in C++
Transformer engine speedups
LLM models
Phone and desktop AI
Code examples
Research citations

Get your copy from Amazon: Generative AI in C++

Aussie AI

Prompt History and Context

Prompt History and Context

Quick Links

Product

New to Writing?

Writing Styles