Aussie AI

38. Platform Portability

Book Excerpt from "Generative AI in C++"

by David Spuler, Ph.D.

“The world is indeed full of peril
and in it
there are many dark places.”

— J.R.R. Tolkien, The Lord of the Rings, 1954.

AI Engine Portability

Ah, yes, I remember portability. Early portability was whether it was a ZX81 or an 8086. Then it was whether it was SunOS, Solaris, AIX, Ultrix, or Irix (I missed a few). And then it was Windows 95 versus Windows NT. And then it was detecting Windows versus Linux. And then it was iOS or Android.

Which brings us up to date. And now portability for AI in C++ is detecting things like:

CPU features
OS configuration settings
Software package versions
Virtual machine settings
GPU capabilities

Does AI need portability? Portability is an issue that can be ignored in some AI applications. If you have control over your hardware and software tech stack, you only need one platform to work, and you can optimize for exactly that platform. This wouldn't be true if you're trying to write an engine to run on a user's phone or PC, but is often the case for business applications running inference in the data center. Whether self-hosted or cloud-hosted on virtual machines, you can control the underlying platform. So, feel free to skip this entire discussion in such situations!

On the other hand, you do need portability if your users have different platforms. And even if you have your own data center, you might want to change the underlying GPU hardware at some stage. There are also various generic benefits from having most of the C++ code being standardized and portable, such as being able to unit test most code on developer's boxes (i.e., without a top-end GPU). Another simple reason is that a large AI application isn't just about matrix multiplication; there's a huge amount of ancillary code that doesn't go near the GPU. Good code design generally dictates that the non-portable parts should at least be wrapped and isolated.

Portability in C++ programming of AI applications involves correctly running on the underlying tech stack, including the operating system, CPU, and GPU capabilities. Conceptually, there are two levels:

1. Toleration. The first level of portability is “toleration” where the program must at least work correctly on whatever platform it finds itself.

2. Exploitation. The second level is “exploiting” the specific features of a particular tech stack, such as making the most of whatever GPU hardware is available.

This is generally true for any application, but especially true for AI engines. To get it running fast, you'll need a whole boatload of exploitation deep in your C++ kernels.

Basics of Portable Coding

The basic approach to writing portable code is:

1. Write generic code portably, and

2. Write platform-specific code where needed.

Write portable code: Most of your AI C++ application should be written in portable C++. The majority of the C++ programming language is well-standardized, and a lot of code can be written that simply compiles on both, and has the same functionality. You just have to avoid the portability pitfalls.

Platform-specific coding: Most C++ programmers are familiar with using #if or #ifdef preprocessor directives to handle different platforms, and the various flavors of this are discussed further below. The newer C++ equivalent is “if constexpr” statements for compile-time processing. Small or sometimes large sections of C++ code will need to be written differently on each platform. Likely major areas that will be non-portable include:

Hardware acceleration (GPU interfaces)
Intrinsic functions (CPU acceleration)
FP16/BF16 floating-point types
User interfaces (Windows vs Mac vs X Windows)
Android vs iOS (not just the GUI)
Multi-threading (Linux vs Windows threads)
Text file differences (You've heard of \r, right?)
File system issues (Directory hierarchies, permissions, etc.)
“Endian” issues in integer representations.

Consider your code choices carefully. Some other areas where you can create portability pain for yourself include:

Third-party libraries (i.e. if not widely used like STL or Boost).
Newer C++ standard language features (e.g. C++23 features won't be widely supported yet).

Backend vs GUI Portability. Most of the discussion in this chapter focuses on the portability of C++ coding on the backend, where the AI engine is running. But the user doesn't give a hoot about that stuff, and only cares about their user interface. Which brings us back to iOS versus Android, or Windows versus Mac.

Yeah, I know, you're a professional C++ programmer sitting there with two screens as big as a mammoth's ears. But your users are on these tiny little things that fit in their purse.

Most of the user interface issues are the same for AI applications as they are for non-AI applications. The methods to detect the type of the end user's device are the same in AI programs as they are for all types of programs, so we won't be delving into them here.

GPU Portability

This will be a short section: none.

Coding portably is a great idea right up until you hit a GPU and then portability is out the window. Writing your code to be similar for both NVIDIA and AMD GPUs is a fantasy. I don't think the developers of CUDA and ROCm are on the phone to each other very often from their private jets.

Similarly, any type of CPU hardware acceleration methods, such as x86 AVX intrinsics or Arm Neon. If you're writing C++ to do vectorized kernels on a CPU or GPU, then you're basically writing a different version for each hardware acceleration method. Admittedly, there has been some attempt to use wrappers to convert AVX intrinsics to Arm Neon, but it's not 100% effective.

Generally, the way that you “tolerate” a new hardware platform is to write a portable sequential C++ version of the code, and that's the fallback. The “exploitation” is to write some very low-level code for whatever hardware acceleration method.

If writing the same AI engine code to run on all platforms is really on your bucket list, then you have to step back up a level. There are several AI platforms that standardize execution of model code at a higher meta-level, and then generate code for the specific GPU platform. Here's my list:

OpenCL
OpenMP
SYCL
OpenACC

Putting Portability into Supportability

The basic best practices are to write portable code until you can't. Here are some suggestions to further finesse your portability coding practices for improved supportability:

1. Self-test portability issues at startup.

2. Print out platform settings into logs.

A good idea is to self-test that certain portability settings meet the minimum requirements of your application. It's necessary to check for the exact feature you want, not just for a particular CPU or GPU architecture. And you probably should do these feature self-tests even in the production versions that users run, not just in the debugging versions. It's only a handful of lines of code that can save you a lot of headaches later.

Also, you should detect and print out the current portability settings as part of the program's output (or report), or at least to the logs. Ideally, you would actually summarize these settings in the user's output display, which helps the poor phone jockeys trying to answers callers offering very useful problem summaries: “My AI doesn't work.”

If it's not a PEBKAC, then having the ability to get these platform settings to put into the incident log is very helpful in resolving production-level support issues. This is especially true if you have users running your software on different user interfaces, and, honestly, if you don't support multiple user interfaces, then what are you doing here?

You should also output backend portability settings for API or other backend software products. The idea works the same even if your “users” are programmers who are running your code on different hardware platforms or virtual machines, except that their issue summaries will be like: “My kernel fission optimizations of batch normalization core dump from a SIGILL whenever I pass it a Mersenne prime.”

Testing C++ Code Portability

How can you assess whether your C++ code is portable? The short answer is: test it!

But you can't test portability on your own box. Instead, you should try to compile and run your code on all of your target platforms, as often as possible. The main points of this plan are:

Compile on all platforms.
Resolve compiler errors (e.g. add an extra type cast, or wrap non-portability with #if and macros).
Check the compiler warnings, not just the errors. Aim for “warning-free compilation” on all platforms.
Run unit tests and regression test harnesses.
Run the memory debug tools for that platform.
Run any static code analysis tools available on that platform.

Virtually Portable. If you don't have access to a big lab full of boxes with random operating systems, then do it virtually. Spin up a new VM on your cloud provider, install the C++ compiler and build tools, upload your C++ source code, compile it, run the tests, shut it down again. Oops, start again, this time save the output results before shutting it down. Using a VM is a powerful way to try lots of platforms and it's not very expensive to do, unless you forget about an instance and accidentally leave it idling for a month.

Source Code Portability Assessment: If you want to try to assess code portability without actually running it on those boxes, try these suggestions:

Review compiler warnings, which often warn of usage of undefined things, such as pointer casting.
Run static analysis code checkers.
Turn on the “strict” or “compliance” modes of your C++ compiler (if you enjoy pain).
Add more unit tests: ensure that the unit tests run through paths that will be found on other platforms.

And one final suggestion on prioritizing testing of portability: test more on your current platform, instead. There'll be more bugs in your code on every platform. I'd bet V-bucks on it.

Compilation Problems

C++ has been standardized for decades, or it seems like that. So, I feel like it should be easier to get C++ code to compile. And yet, I find myself sometimes spending an hour or two getting past a few darn compiler errors. Most compilers have a treat-warnings-as-errors mode. Come on, I want the reverse.

Some of the main issues that will have a C++ program compile on one C++ compiler (e.g. MSVS) but not on another (e.g. GCC) include:

const correctness
Permissive versus non-permissive modes
Pointer type casting

const correctness refer to the careful use of “const” to mark not just named constants, but also all unchanging read-only data types. If it's “const” then it cannot be changed; if it's non-const, then it's writable. People have different levels of feelings about whether this is a good idea. There are the fastidious Vogon-relative rule-followers who want it, and the normal reasonable pragmatic people who don't. Can you see which side I'm on?

Anyway, to get non-const-correct code (i.e. mine) to compile on GCC or MSVS, you need to turn off the fussy modes. On MSVS, there's a “permissive” flag in “Conformance Mode” in Project Settings that you have to turn off.

Pointer type casting is another issue. C++ for AI has a lot of problems with pointer types, mainly because C++ standardizers back in the 1990s neglected to create a “short float” 16-bit floating-point type. Theoretically, you're not supposed to cast between different pointer types, like “int*” and “char*”. And theoretically, you're supposed to use “void*” for generic addresses, rather than “char*” or “unsigned char*”. But, you know, this is AI, so them rules is made to be broken, and the C++ standardizer committees finally admitted as much when they created the various special types of casts about 20 years later (i.e., reinterpret_cast).

Anyway, the strategies for getting a non-compiling pointer cast to work include:

Just casting it to whatever you want.
Turning on permissive mode
Casting it to void* and back again (i.e. “x=*(int*)(void*)(char*)&c”)
Using “reinterpret_cast” like a Goody Two-Shoes.

Runtime Portability Glitches

A bug that occurs on every platform is just that: a bug. A portability glitch is one with different behavior on different platforms. Some examples of the types that can occur:

The code doesn't compile on a platform.
The code has different results on different platforms.
Sluggish processing on one platform.
Crashes, hangs, or spins on one platform.

Some other types of weird problems that might indicate a portability glitch:

Code runs fine in normal mode, but fails when the optimizer is enabled, or if the optimization level is increased.
Code crashes in production, but runs just fine in the debugger (i.e. cannot reproduce it).
Code intermittently fails (e.g., it could be a race condition or other timing issue.)

A lot of these types of symptoms are screaming “memory error!” And indeed, that's got to be top of the list. You might want to run your memory debugging tools again (e.g. Valgrind), even on a different platform to the one that's crashing.

However, it's not always memory or pointers. There are various other insidious bugs that can cause weird behavior in the 0.001% of cases where it's not a memory glitch:

Uninitialized variables or object members.
Numeric overflow or underflow (of integers or float type).
Data size problems (e.g. 16-bit, 32-bit, or 64-bit).
Undefined language features. Your code might be relying on something that isn't actually guaranteed in C++.

Code Portability Pitfalls

Most of the low-level arithmetic code for AI algorithms looks quite standardized. Well, not so much. The general areas where C++ code that looks standard is actually non-portable includes trappy issues such as:

Data type byte sizes (e.g. how many bytes is an “int”).
Arithmetic overflow of integers or float operators.
Integer operators and negatives (e.g. % and >> operators).
Floating-point oddities (e.g. negative zero, Inf, and NaN).
Divide-by-zero doesn't always crash.
Pointer versus integer sizes (e.g. do void pointers fit inside an int?).
Endian-ness of integer byte storage (i.e. do you prefer “big endian” or “little endian”?).
Zero bytes versus zero integers.
Order of evaluation of expression operands (e.g. with side-effects).

And there are various other portability issues arising at a higher-level than the AI arithmetic data processing, such as the inputs and outputs of the program. Problematic areas include:

Text files (e.g. '\n' on Linux versus '\r\n' on Windows).
UTF8 versus Latin1 encodings (e.g. for tokenization).
Unicode special characters (e.g., Asian languages or unicorn emojis).
EBCDIC versus ASCII (character-level problems in tokens).
Operating system accesses (e.g. processes and file permissions).
Signal handling (low-level).

Data Type Sizes

The typical AI engines work with 32-bit floating-point (float type). Note that for 32-bit integers you cannot assume that int is 32 bits, but must define a specific type. Furthermore, if you assume that short is 16-bit, int is 32-bit, and long is 64-bit, well, you'd be incorrect. Most platforms have 64-bit int types, and the C++ standard only requires relative sizes, such as that long is at least as big as int.

Your startup portability check should check that sizes are what you want:

    // Test basic numeric sizes
    yassert(sizeof(int) == 4);
    yassert(sizeof(float) == 4);
    yassert(sizeof(short) == 2);

And you should print them out in a report, or to a log file. Here's a useful way with a macro that uses the “#” stringize preprocessor operator and also the standard adjacent string concatenation feature of C++.

#define PRINT_TYPE_SIZE(type) \
        printf("Config: sizeof " #type " = %d bytes (%d bits)\n", \
        (int)sizeof(type), 8*(int)sizeof(type));

You can print out whatever types you need:

    PRINT_TYPE_SIZE(int);
    PRINT_TYPE_SIZE(float);
    PRINT_TYPE_SIZE(short);

Here's the output on my Windows laptop with MSVS:

    Config: sizeof int = 4 bytes (32 bits)
    Config: sizeof float = 4 bytes (32 bits)
    Config: sizeof short = 2 bytes (16 bits)

16-Bit Integer Data: For quantization to 16 bits, you might use a 16-bit integer (“short”). However, you should check it with a static_assert in your C++ code.

16-Bit Floating-Point: For 16-bit floats (FP16 or BF16), there are still issues. The main C++ compilers at the time of writing (Oct 2023) do not have any great builtin support of 16-bit floating-point types. There's no “short float” type, for example, in GCC or Microsoft Visual Studio C++. Maybe you can find a platform-specific way to do 16-bit float types in C++, since there's no standard way at the time of writing. There are some new standard type names written into the C++23 standard, but not many compilers are there yet.

Standard Library Types: Other data types to consider are the builtin ones in the standards. I'm looking at you, size_t and time_t, and a few others that belong on Santa's naughty list. People often assume that size_t is the same as “unsigned int” but it's actually usually “unsigned long”.

Pointers versus Integer Sizes

You didn't hear this from me, but apparently you can store pointers in integers, and vice-versa, in C++ code. Weirdly, you can even get paid for doing this. But it only works if the byte sizes are big enough, and it's best to self-test this portability risk during program startup. What exactly you want to test depends on what you're (not) doing, but here's one example:

    // Test LONGs can be stored in pointers
    yassert(sizeof(char*) >= sizeof(long));
    yassert(sizeof(void*) >= sizeof(long));
    yassert(sizeof(int*) >= sizeof(long));
    // ... and more

Note that a better version in modern C++ would use “static_assert” to test these sizes at compile-time, with zero runtime cost.

    static_assert(sizeof(char*) >= sizeof(long));
    static_assert(sizeof(void*) >= sizeof(long));
    static_assert(sizeof(int*) >= sizeof(long));

In this way, you can perfectly safely mix pointers and integers in a single variable. Just don't tell the SOC compliance officer.

• Next: Chapter 39. Quality

• Up: Table of Contents

• Buy: Generative AI in C++: Coding Transformers and LLMs

The new AI programming book by Aussie AI co-founders:

AI coding in C++
Transformer engine speedups
LLM models
Phone and desktop AI
Code examples
Research citations

Get your copy from Amazon: Generative AI in C++