Aussie AI

AI Middleware

  • Last Updated 12 December, 2024
  • by David Spuler, Ph.D.

AI middleware is the layer in the tech stack above the LLMs. It can provide services such as prompt extensions, conversational history management, and other relatively low-level functionality. As such, middleware can operate as a wrapper around remote AI API services, or can run near a self-hosted open source LLM in the same local servers.

Features of LLM Middleware

Some of the features that middleware components may typically provide in a layer above an individual LLM inference layer include:

  • Multi-LLM access (helping to avoid "vendor lock-in")
  • Prompt templating (e.g., adding global instructions)
  • Programmatic prompting (i.e., automatic prompt improvement)
  • Conversational history management (chatbots and Q&A versions)
  • Prompt caching (e.g., prefix KV caching)
  • Logging
  • Monitoring and observability
  • Reporting and statistics tracking
  • User identity and security credential management

Generally speaking, most of the RAG architecture are not considered by fit under the category of "middleware." Components such as vector databases, rerankers, packers, and other RAG components have their own category in the RAG stack.

However, the recent advances in Chain-of-Thought and other multi-step inference-based reasoning algorithms have spawned another use of AI middleware at a much higher level. An AI middleware can wrap individual LLM queries into sequences of multiple steps, thereby implementing reasoning algorithms, such as:

  • Reflection
  • LLM as Judge
  • Chain-of-Thought
  • Best-of-N
  • Skeleton-of-Thought

And there are many more such multi-step reasoning algorithms.

Research on AI Middleware

Research papers on AI middleware components and the overall AI tech stack:

More AI Research

Read more about: