Aussie AI

Load Balancing

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

Load Balancing

If your goal is a high volume of user requests, then you need to consider higher-end scalable architectures with load-balancing and fault tolerance. Some of the technologies to consider with load balancing include:

  • Round Robin DNS
  • Load Balancer Network Devices
  • Apache Kafka
  • Apache Load Balancer
  • Nginx load balancing

Round robin DNS, or RR DNS, is a simple way to distribute incoming requests to multiple servers, but it isn't true load balancing because it doesn't consider load or availability of the server connections. On the upside, it requires no extra server components and can be done simply by manipulating your domain DNS records.

Kafka is a more scalable production tool with advanced features such as clustering. The advantages of using Kafka are many in a large architecture, in that it is a pre-built tool that is purpose-designed for handling a high volume incoming event stream. It has a highly efficient distributed architecture, where you send requests to a Kafka cluster, and multiple listeners can be created to process incoming requests. For each input prompt, the Kafka listener would dequeue the request, and then forward the prompt text to its associated AI engine.

Apache Load Balancer is s freeware load balancing add-on. For more information, see the mod_proxy and mod_proxy_balancer Apache modules. Nginx also supports multiple different load balancing approaches such as round robin and least connections. Refer to the Nginx documentation for details.

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++