Aussie AI

GPU Portability

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

GPU Portability

This will be a short section: none.

Coding portably is a great idea right up until you hit a GPU and then portability is out the window. Writing your code to be similar for both NVIDIA and AMD GPUs is a fantasy. I don't think the developers of CUDA and ROCm are on the phone to each other very often from their private jets.

Similarly, any type of CPU hardware acceleration methods, such as x86 AVX intrinsics or Arm Neon. If you're writing C++ to do vectorized kernels on a CPU or GPU, then you're basically writing a different version for each hardware acceleration method. Admittedly, there has been some attempt to use wrappers to convert AVX intrinsics to Arm Neon, but it's not 100% effective.

Generally, the way that you “tolerate” a new hardware platform is to write a portable sequential C++ version of the code, and that's the fallback. The “exploitation” is to write some very low-level code for whatever hardware acceleration method.

If writing the same AI engine code to run on all platforms is really on your bucket list, then you have to step back up a level. There are several AI platforms that standardize execution of model code at a higher meta-level, and then generate code for the specific GPU platform. Here's my list:

  • OpenCL
  • OpenMP
  • SYCL
  • OpenACC

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++