Aussie AI

Backend Server Architecture

Book Excerpt from "Generative AI in C++"

by David Spuler, Ph.D.

Backend Server Architecture

Although it's wonderful to see your AI engine running on your dev box, there's still a lot to do before your users can see it. This is called “deployment” of your application, including its AI model and whatever application-specific logic you're adding on top. Your deployment architecture typically consists of these main components:

Web server (e.g. Apache, NGinx)
Backend server
Application logic server
AI Request Server
AI Engine

You can merge several of these types of server components, but it's simpler to do them separately, or at least to think about them conceptually as separate.

The AI engine is not the first part of the backend deployment architecture. There needs to be a simpler request-handling server that receives the user's input from the client. This may involve one or more server processes behind the scenes.

For example, in a simple browser-based Q&A service, the user would input their question or prompt from a web browser. This browser request is then handled by the basic HTTPD server such as Apache or Nginx, which then forwards the user's prompt to another application-specific server that processes the request.

The request processing server could be the AI engine directly in a small architecture, but in a more realistic production architecture it would be a simpler server that multiplexes a stream of input requests, farming out the requests to multiple AI servers.

The backend server takes the user request and sends it over for the application logic server to do whatever high-end services you are providing, which then decides what AI requests are needed, and then sends it along for the AI request server to handle.

The AI request server has to multiplex the AI requests across multiple AI engines, and then, for any complex queries or multi-engine requests, collate the results back together. Neither of these components are trivial, but at least they're not as big of a C++ project as trying to write a whole AI engine. Various commercial off-the-shelf servers already exist for either of these components, so that's probably your best option. But the application logic server should be your own brilliance expressed in C++ code.

• Next:

• Up: Table of Contents

• Buy: Generative AI in C++: Coding Transformers and LLMs

The new AI programming book by Aussie AI co-founders:

AI coding in C++
Transformer engine speedups
LLM models
Phone and desktop AI
Code examples
Research citations

Get your copy from Amazon: Generative AI in C++

Aussie AI

Backend Server Architecture

Backend Server Architecture

Quick Links

Product

New to Writing?

Writing Styles