Aussie AI

GPU Hardware Acceleration

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

GPU Hardware Acceleration

For the sticklers, AI GPU chips are not really a “GPU” because that stands for “Graphics Processing Unit,” and they aren't used for “Graphics” in an AI architecture (even when creating an image). In fact, they're really a General-Purpose GPU (GPGPU), but nothing other than AI matters in the tech industry, so we stole the acronym from the gamers.

GPUs are great big SIMD processors. There is a huge range of vectorized opcodes available for any given GPU. Each GPU isn't just one vectorized stack, but has lots of separate “cores” that process AI workloads (e.g. FMA) in parallel. Each core runs a SIMD operation such as a small matrix multiply or FMA in a single GPU clock cycle. For example, a V100 “Tensor Core” can do a 4x4x4 half-precision (16-bit) matrix/tensor multiply in a cycle, which is a lot more advanced than a typical vectorized operation. Hence, it's a parallel-of-parallel architecture with:

    (a) all the GPU cores running in parallel, and

    (b) each core doing vectorized SIMD operations.

The chips also have their own GPU RAM (sometimes called “VRAM”) and there are also multiple levels of caches of that RAM. If you're assessing the specs of a GPU, consider:
  • FLOPs throughput
  • Cores
  • RAM
  • Clock speed
  • Memory bandwidth rate
  • Cooling systems (they run hot!)

GPU Pricing. If you're looking at renting a data center GPU, NVIDIA is top of the list for AI computations. The choice between a P100, V100, A100, or H100 is examined further in the AI deployment chapter. To run a version of Meta Llama2, a V100 is workable for that, but with not many instances per box. As of writing, pricing for a V100 runs below a buck an hour and there are 730 hours in a month, so you can do the math (pricing varies with vendors anyway). You can get an A100 for more than a buck an hour, and a H100 for roughly double that (for now). On the horizon, NVIDIA has a H200 coming mid-2024 with about 141GB RAM (versus the H100's 80GB), and also the B100 in late 2024 for even higher performance than a H200.

You can also buy a GPU chip outright from your private jet using your diamond-encrusted phone. Okay, so that's a bit of an exaggeration. Pricing changes, but as of writing, you're looking at around ten grand for a V100 by itself, but pricing is higher if it's part of a “system” on a motherboard or a box (and this confuses ChatGPT if you ask it about GPU pricing).

Another option is used GPUs, which are cheaper, but might have spent their prior life in a Bitcoin-mining forced labor camp. GPUs do have a limited lifetime and can overheat with partial or total failure.

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++