Aussie AI
Portability Checking of AVX Versions
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
Portability Checking of AVX Versions
The power of AVX support has changed over the years, with different CPUs having different capabilities, not only with AVX, AVX-2 and AVX-512, but also their sub-releases. And it's also a little unclear into the future, with reports that some of the newer Intel chips have AVX-512 disabled.
If you write some code using AVX-512 intrinsics, and compile your C++ into an executable with the AVX-512 flags on, and then it runs on a lower-capability CPU without AVX-512, what happens? Do the AVX-512 intrinsics fail, or are they simulated somehow so that they're slower but still work? Answer: kaboom on MSVS. In the MSVS IDE, if you try to call these intrinsics on a CPU that doesn't support it, you get “unhandled exception: illegal instruction.” In other words, the C++ compiler still emits the AVX-512 instruction codes, but they aren't valid, so it excepts at runtime.
Hence, the calls to AVX-512 are not emulated at run-time on lower-capability CPUs. And they aren't checked, either. That's up to you!
Dynamic test required:
Firstly, you cannot use the preprocessor.
You can't test #if
or #ifdef
for whether you've got AVX-512 in the CPU or not.
You can use the preprocessor to distinguish between different platforms
where you'll compile a separate binary (e.g. ARM Neon for phones or Apple M1/M2/M3 chipsets).
But you cannot choose between AVX/AVX-2/AVX-512 at compile-time,
unless you really plan to ship three separate binary executables.
Well, you probably could do this if you really, really wanted to.
The other thing you don't really want to do is low-level testing of capabilities. You don't want to test a flag right in front of every AVX-512 intrinsic call. Otherwise, you'll lose most of the speedup benefits. Instead, you want this test done much higher up, and then have multiple versions of the higher-level kernel operations (e.g. vector add, vector multiply, vector dot product, etc.)
What this means is that you have to check in your runtime code
what the CPU's capabilities are, at a very high level in your program.
Hence, it is important to check your platform has the AVX support that you need,
such as via the “cpuid
” intrinsic
at program startup.
Then you have a dynamic flag that specifies whether you have AVX-512 or not,
and you can then choose between an AVX-2 dot product or an AVX-512 dot product,
or whatever else,
during execution.
Obviously, it gets a bit convoluted when you have to dynamically choose
between versions for AVX, AVX-2 and AVX-512 (not to mention all the AVX sub-capabilities
and also AVX-10 coming soon).
• Next: • Up: Table of Contents |
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |