Aussie AI

Hosting Server Specs

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

Hosting Server Specs

Irrespective of whether you're hosting on your own servers, or using a major cloud service provider, you need to consider the specs for an AI backend server. Firstly, note that not all servers are “AI servers” and most of the basic servers don't need GPUs at all (e.g. Apache servers, ancillary servers, etc.). The main specs for a non-AI server are the usual suspects for an Internet server:

  • CPU
  • RAM
  • Disk speed
  • Network connectivity

Personally, for the basic servers, I recommend going reasonably low-end or mid-tier in terms of specs, but running a lot of them. In particular, you don't need a lot of disk space for many of these basic servers, so focus on getting enough RAM and a CPU with enough cores. If you're running multiple servers, you also don't need a high network level on a per-server basis, because it's distributed across multiple servers. But you do need to consider your method of dispersing traffic across multiple HTTPD servers (e.g. round-robin DNS or a load balancing method).

Also, try to set up your architecture so that you don't need those gold-plated extras from your hosting provider. You don't need fault tolerance and failover for an architecture with multiple web server boxes. You also don't need backup of these cloned production servers in this case, but only for those servers with important logs, user management databases, or user document datastores. The idea is to run multiple identical servers, and then shut down any that start being problematic, which occurs rarely anyway. Instead, you need a streamlined process for deploying a new server, whether it's renting a new bare metal server or auto-spinning up a new VM. Hence, the DevOps software processes are almost more important than the exact choice of server specs for many utility servers.

What you do need is a monitoring system to detect any problems. And you also need a fast network between all the servers so you can copy over an entire server deployment quickly. Note that you can often have faster “private” network connections than the public ones if the servers support multiple network cards.

Disk specs. Although your GPU choice and RAM size are more important, you should also consider your disk speed. This applies to your options in setting up a virtual machine, or on the choice of disk storage for a bare metal server.

You need a large amount of disk storage for model files, which are larger than your average bear. This might tempt you to go for the cheaper and larger HDD disks. On the other hand, SDD disks are much faster to load, and not that expensive any more. If you want fast startup of your engine with its full model, I think SDD, such as using NVMe disks, is the way to go. Also, it's a kind of fallback in case you mess up the server process configurations, and the machine starts paging, which is much faster if the disk is SSD.

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++