Aussie AI

11. Compile-Time Optimizations

Book Excerpt from "Generative AI in C++"

by David Spuler, Ph.D.

“I was in the middle before I knew that I had begun.”

— Jane Austen, Pride and Prejudice, 1813.

C++ Compile-time Techniques

Compile-time processing is the optimal way to run a program. All the work is done by the compiler and none by your program. There are literally zero instructions executed on the CPU at runtime, whether it's doing training or inference. It will be blindingly fast for your users.

If only all code could be like that!

The reality is that programmers are still needed and that code still needs to run (sigh!). But to make it faster, there are lots of ways to have more computation done by the compiler, long before it ever goes near a user.

The C++ programming language has numerous features that help perform work at compile-time. These include ways to explicitly control what goes to the compiler, or to give more information to the compiler so that its optimizer can do good work on your behalf. Some of the various C++ language features to consider include:

Conditional compilation — #if/#ifdef statements
inline functions
Templates — these expand at compile-time
Symbolic constants — const or #define
Function-like macros — #define with parameters
Constant hints — constexpr, if constexpr, etc.
Global and static variable initializations
static data members — fixed data in C++ classes
Type traits — compile-time type testing
Restricted pointers — ignore aliasing risks

But when we're doing AI, there's another compile-time data structure to consider: the whole LLM model itself.

AI Models are Static

An AI model is inherently static after it's been trained and fine-tuned, and this characteristic offers many opportunities for “offline” speedups. At the highest level there are the model compression optimizations (e.g. quantization, pruning) that create a smaller model file. In addition, some of the other model meta-parameters also have a significant impact on what the C++ compiler can do.

Internal model dimension — i.e. the “embedding size”
Context window size — maximum input token length
Number of layers — depth of the model

These are all constant for both training and inference. It is strongly recommended that you use these parameters to create a model-specific C++ engine that is specialized for this particular model, rather than a generalized AI engine that can handle multiple model sizes. In simpler terms, make all of these meta-parameters as “const” in your code and turn the optimizer up to eleven.

Anywhere in the C++ kernels that these numbers are used gives the optimizer an opportunity to make smarter efficiency choices. These optimizations range from full auto-vectorization of loops into parallel execution if the compiler can see that they are a fixed length, to the simpler arithmetic strength reduction optimizations, such as using bitshifts if a constant meta-parameter is a power-of-two (and they should be).

C++ Optimizers

Every C++ compiler has optimization built into the code generation phase. Typically, there are ways to specify that a higher degree of code optimization should be performed. Methods to control the settings include:

Command-line arguments (e.g. “-O1” or “/O1”)
Configuration settings (e.g. Project Settings in the MSVS IDE)
#pragma preprocessor directives

Take note of the meaning of the optimizer settings. For example, on MSVS the setting “/O1” optimizes for memory, not speed! Also, don't be like me and assume that the defaults are going to be what you want. Looking at the MSVS IDE optimizer settings in my AUSSIE project file, I found:

“Optimization” was “disabled” by default.
“Enable Intrinsic Functions” was “No” by default. Why not?
“Favor Size or Speed” was “neither” by default. Come on, why is there no “both” option?
“Inline Function Expansion” was “default” at least.

When to enable the optimizer? Should you run the optimizer at every build? At what level?

Note that your policy should not be to turn up the optimization to maximum level just before you ship your code to users, because your code can change in a very bad way. Don't assume that turning the optimizer mode up to super-crunch is always an easy win, as optimization can trigger latent glitches in your code by reorganizing memory or reordering instructions.

What does the optimizer do? In order to optimize code, it's important to know what sorts of optimizations your compiler is doing automatically. Compilers have been doing optimizations for literally 50 years, and the state-of-the-art is quite amazing, with an extensive body of research theory. Some of the main automated compiler optimizations include:

Constant folding/propagation
Constant expression evaluation
Common subexpression elimination
Redundant assignment removal
Strength reduction
Algebraic optimizations
Register allocation
Loop optimizations (e.g. unrolling)
Auto-vectorization

If you make simple changes to your code with some of the obvious things above, it's not going to give you a speedup. The compiler has already done it for you.

However, there's a limit to what compilers can do. They certainly can't make architectural changes, and there's also many mid-level algorithmic changes that cannot be automated.

Function calls inside expressions are a good example of code changes that might need to be manually optimized. When the compiler sees a function call used in arithmetic, it isn't always able to know what that function is going to do, and has to be conservative by avoiding possibly incorrect optimizations.

Floating-Point Optimizer Options

Some C++ compilers have optimizations that you can use to speed up your Floating-Point Unit (FPU). Some of the options for GCC include:

“-ffast-math” option — This option is a broad enabler of multiple floating-point speedups, such as -fno-math-errno and -ffinite-math-only. It also disables negative zero.
“-fno-math-errno” option — This allows the standard library math functions such as sqrt to run faster and also be more amenable to parallelization, simply by allowing them to never set the global “errno” variable. The use of errno was once a great way to track error codes, but it's also a blocker for thread-safety and parallelization. And let's be frank: you weren't ever checking errno anyway, so turn it off!
“-ffinite-math-only” — This mode allows GCC math library functions to skip any checks for Inf or NaN, which can make them marginally faster.

Microsoft Visual Studio C++ also has its own set of FPU options:

“Floating-Point Model” settings in a Project's Property Pages under “C++” for “Code Generation” has options “/fp:precise”, “/fp:strict”, or “/fp:fast”
“Enable Floating-Point Exceptions” can be turned off if you like.

People Helping Parsers

The humble C++ compiler needs your attention. Hat in hand, the compiler is sitting there saying “I am but a poor, helpless lexer, without even a single neural network. Please help me.” Hence, please consider donating your time to help a poor struggling compiler in your neighborhood.

There is a long history of the C++ compiler needing “hints” about optimization from the programmer. The early C++ language in the 1990s had a “register” specifier that hinted to the compiler that a variable was going to be highly used, and the compiler should optimize it by putting the variable in a CPU register. The “register” keyword has since been deprecated in C++17, which indicates that compiler register allocation algorithms no longer benefit from human help.

Some of the other longstanding C++ keywords that can be used for efficiency-related purposes include:

inline
const
static

And with the evolving C++ standards, there's a whole new set of directives that are hints to the compiler about how to optimize:

constexpr
constinit
consteval
reinterpret_cast
restricted pointers (“restrict”)
[[likely]] and [[unlikely]] path attributes

The constexpr and related directives help the compiler do “constant folding” and “constant propagation” to compute as much as possible at compile-time, thereby avoiding any runtime cost for lots of code. In fact, the idea is extended to its logical asymptote, whereby you can declare an entire function as “constexpr” and then expect the poor compiler to interpret the whole mess at compile-time. Pity the overworked compiler designers.

The “restrict” pointer declarations help the compiler with advanced optimizations like loop unrolling and vectorization by telling the compiler to ignore potential “aliasing” of pointers, allowing much more powerful code transformations on loops. The restricted pointer optimizations are actually of more interest than constexpr for AI development. These have been formalized in C++23, but non-standard versions have long existed. The possible benefit for C++ AI engines is that restricted pointer specifications might help the compiler do auto-vectorization of loops into parallel hardware-assisted code.

How much do these help? It's rather unclear, and the compiler is free to simply ignore these hints. Compilers already did a lot of constant propagation optimizations before the “constexpr” directives came along, so presumably compiler designers have upped their game even further now.

Inline Functions

Placing the keyword “inline” before any function declarations makes that function instantly disappear in a puff of smoke. Well, sort of. It gives your C++ compiler the hint to optimize the code by putting the function's body there instead of the function call. This is faster, but means there are many copies of the function's statements, so it increases code size.

Which functions should you inline? General wisdom is to do so for these types of C++ functions:

Short functions (esp. single-statement functions)
Getters and setters in a class
Frequently called functions at the bottom of the call hierarchy.

The inline specifier is just a hint. Your compiler is free to completely ignore you. In fact, this choice will probably disappear in a few years, as compilers become better than humans at choosing which functions to inline.

If you want to force the compiler to inline, use preprocessor macros. However, there's a whole minefield of problems in function-like macros. For example, you need to add parentheses around the whole expression and also around each parameter's appearance in the replacement text. Hence, inline functions are much safer than macros.

The value of inline functions is not only from avoiding function call overhead. The merging of the statements into the caller's code also allows many other optimizations to be applied there as follow-up transformations. Constants can be propagated further through the inlined statements, which is similar to constexpr, but the range of optimizations is much larger with inline.

GCC has some additional C++ language features related to inlining. There is the “always_inline” function attribute which says to always inline this function, and the “flatten” attribute which says to inline every call to other functions inside this function. There is also the “gnu_inline” attribute that prevents creation of a non-inlined function body.

inline function limitations

The inline specifier is wonderful when it works. A very important point to note about inline functions is that the inline specifier, by itself, is not enough to guarantee that inline code will be generated. The other requirement is that the compiler must know the function body code, where the function is called.

Hence, an inline keyword in a function prototype declaration is not enough. The executable statements inside the function’s definition (i.e., the function body) must be available to the C++ compiler. Otherwise, how is the compiler to know what inline code to expand a function call into? I guess in theory the C++ compiler could maintain a huge database of all the functions in your source code, or scan through all the CPP files to find it, and that would be amazing, but we're not there yet. In practice, the compiler will only inline functions where it has seen the function body within the current C++ source file or an included header file. This requirement imposes two restrictions on the use of inline functions:

1. Member functions declared as inline should include the function body inside the same header file as the class declaration. This can be achieved by placing the function body of a member function inside the class declaration. For a more readable style when there are many inline member functions, the class declaration can declare the function prototypes, and then provide the inline function definitions immediately after it, in the same header file. This restriction ensures that whenever the class declaration is included as a header file, the member function body is available for inlining.

2. Non-member inline functions must be defined before they are used within a source file, preferably by placing the inline functions in a header file. Placing inline functions at the top of a source file allows the inlining of any function calls later in the same source file, but calls to the functions from a different source file cannot be inlined by the compiler unless the inline function definition is placed in a header file.

Non-inlined functions

Some functions declared as inline will not be expanded into inline code by the compiler, simply because they are too complicated for the compiler to handle. In this case, the inline specifier is ignored and the function is treated like any other function. The sophistication of the inline code generation depends on the compiler implementor.

Even if a compiler could theoretically inline a function, the compiler is sometimes still forced to generate a “real” function. There are various possible reasons for this:

1. The name of an inline function is used as a pointer-to-function constant.

2. A call to the inline function from within another source file.

3. virtual member functions.

When an inline function is called from a source file, where the function body has not been made available, the compiler generates a real function call (simply because it cannot inline the function). Hence, the real function must exist and be linked like any other function. Fortunately, the placement of inline functions in header files as discussed above will avoid this for any function the compiler decides to inline.

Inline Variables

Since C++17 you can define a variable as “inline”. What does this do?

Basically, it's not really much of a speedup, but makes it easier to manage global constants, global variables, or static data members in C++ classes. You can declare these variables as “inline” in a header file, with an initializer:

    inline int g_x = 3;

Then you can with wild abandon include that header file all over the place without any problems whatsoever. The C++ linker is required to:

Merge all of them into one variable at link-time.
Guarantee that it's initialized as specified.
Have the same address for that variable everywhere.

I find this addition to C++ somewhat humorous because it fixes up a huge mess that's existed since old K&R C code, and I've battled against it many times trying to get my program linked. I'm not going to irritate myself by repeating all the quirks, but it was always messy whether you had a global variable that was extern or non-extern, initialized or non-initialized, in a header file or a non-header file. So, if you ask me, the way that “extern” variable declarations “worked” was always broken, and now it's fixed in C++17. Hooray! (A bit late for me.)

Overall, allowing “inline” for variables is helpful to efficiency because you can be guaranteed about constants, static members, or global variables at compile-time. And it's always nice to get your program to link.

Constant Specifiers

The “const” keyword means that something is constant, and cannot be modified. It is helpful for efficiency, but its role is also to help detect programming errors, where code accidentally attempts to modify a constant variable or object. There are multiple places where “const” can be used.

Symbolic constants
const variables
const objects
const function parameters (i.e., “const&” idiom)
const member functions (read-only)

But don't get me started on “const correctness." I've seen too many dawns fighting with compilers about const. Anyway, let's move on, and assume we love const.

Basic const symbols. Symbolic constants can be declared as a representation of a numeric value or other type data (instead of using #define symbols):

    const float pi = 3.14;

Set-once variables with const. Variables can be made constant via “const”, which is effectively the same as a symbolic constant, except that the initializer need not be a compile-time constant. It is a “set-only-once” variable. The C++ compiler ensures that const variables cannot be modified, once they are initialized.

    const int scale_factor = get_config("scale");
    const int primes[] = { 2, 3, 5, 7, 11, 13, 17 };

Function parameters and const. The const specifier can ensure that function parameters are not modified, especially for arrays passed by reference. const on a scalar parameter type such as int is not as useful, only ensuring that the code inside the function doesn't modify the parameter (which isn't really a problem anyway). However, the idiom of “const&” to specify a const reference as a function parameter allows constant pass-by-reference of object parameters, which is extremely important for C++ efficiency.

Instantiate-only objects with const. Class objects can be declared as const variables. When the variable is a const object, it can be instantiated via a constructor, but cannot be modified thereafter.

    const Complex cfactor(3.14, 1.0);

Member functions declared const. Class member functions can be declared by adding the keyword “const” immediately after the function parameter list:

    int MyVector::count() const;

The C++ compiler blocks a const member function from modifying data members, although it can still change “static” data members. For const object variables, the C++ compiler ensures that any calls to non-const member functions are disallowed.

Non-member functions. Note that a non-member function cannot be const. The actions of a friend function or other non-class function are controlled by using const on the parameters, rather than the whole function itself.

Beyond const. Newer C++ features have generalized and improved some of the uses of const. The “constexpr” specifier is much more powerful in terms of allowing compile-time optimizations, as are its derivatives “constinit” and “consteval." The newer use of “inline” on a variable (yes, a variable, not a function, supported since C++17), can be helpful for safely sharing constants across multiple files.

Constant Expressions Specifier

The constexpr keyword is an optimization hint for the compiler that's more powerful than “const." Whereas const only guarantees that something won't change, constexpr is a guarantee by the human that something can be evaluated at compile-time.

The compiler should use the constexpr hint to try to propagate constant values throughout the evaluation of expressions and function calls, producing an overall speedup. However, if the compiler doesn't have the capability to do the level of compile-time optimization required, or if the human has told the machine a bald-faced lie, there's no penalty and the code just runs like it never had a constexpr specifier.

There's not a whole lot of difference between const and constexpr if you use it only for named constants:

    const float PI = 3.14f;
    constexpr float PI = 3.14f;  // Same same

`constexpr` functions

The real power is when you use constexpr for functions.

    const float SQRTPI = sqrtf(3.14f);   // Works?
    constexpr float SQRTPI = sqrtf(3.14f); // Works?

Oh, dear! I just tested this code snippet, and the const version works, whereas the constexpr version fails to compile, which is the opposite of what I was expecting. According to an informed source that was trained on Internet scrapings, sqrtf is not going to be declared as a “constexpr” function until C++26. Alas, by then all C++ programmers will have been replaced by robots, so feel free to skip this section.

The apparently futuristic idea is that sqrtf should have a “constexpr” keyword in its declaration, because the function return value can be computed at compile-time if you pass it a constant argument. In other words, the compiler can evaluate “sqrtf(3.14f)” at compile-time. Hence, the whole function should be declared “constexpr” in the standard library header file. The const version is also probably not evaluating the sqrtf function at compile-time, but just calling it dynamically whenever the const variable is first initialized (this non-compile-time initialization is allowed for const variables, provided you don't later attempt to change its value).

Anyway, you can already declare your own function with the “constexpr” specifier.

    constexpr int twice(int x)
    {
        return x + x;
    }

`constexpr` functions vs `inline` functions

A lot of the same value in terms of optimization can be had by making a function just inline rather than constexpr. Note that you can use both, but officially constexpr for functions implies inline on the function as well.

Is constexpr any better than just inline? If you pass a constant argument to a small inline function, then the expansion of the function body will trigger lots of constant propagation optimizations, effectively evaluating most of it at compile-time, which is almost the same as constexpr.

constexpr is supposed to be more formal in guaranteeing that the result of a function is a compile-time constant, and the compiler is honor-bound to do “compile-time function evaluation” to get the constant return value. Also, a constexpr function is more officially usable as a compile-time constant, so that you can use an expression with a constexpr function's return value in various places where C++ needs a constant (e.g. an array size declaration, some template situations, etc.).

An inline function is also supposed to be optimized at run-time for non-constant arguments, and constexpr functions are implicitly inline functions. The code generation requirements of dynamic inlining are often more advanced that constant expression evaluation.

Also, the limitations on how a constexpr function can be structured are a lot easier to code than the unrestricted nature of an inline function body. However, as a practical matter, the compile-time evaluation of expressions and the code generation for inlined expressions have a lot of overlap, so I expect C++ compilers will mostly try to do both on every type of function.

The inline keyword also serves a weird secondary purpose, by guaranteeing that there's only one copy of the function. This means we can include header files with the full definition of that inline function anywhere we like, without getting a compiler error at link-time about multiple definitions. But this isn't a performance optimization, and the linker feature of inline is almost the opposite of what we want in making a function inline, because we don't want a real function to be called at all.

`if constexpr` statements

There is an alternative usage of constexpr in terms of “if” statement conditions (since C++17):

   if constexpr(cond)

This new syntax tags the condition as being amenable to computation at compile-time. Hence, the compiler should optimize the if statement to a constant value, and it can then determine at compile-time which branch should be executed. So, there is a double speedup from:

(a) the condition computation is removed at run-time, and

(b) code size reduction from unexecuted “dead code” being removed.

In fact, this determines at compile-time which code block will be parsed, so there are cases where you can avoid a compile-time error in templates by wrapping it inside an “if constexpr” check. This can be useful in compile-time situations such as template expansion, where you can prevent some expressions from being compiled, and also code bloat can be reduced.

`constinit` variables

The constinit specifier is like a hybrid between consteval and static variables. The constinit specifier declares a variable that is static, with lifetime scope, that is initialized at compile-time.

A variable declared as constinit must be initialized, and cannot be modified (like “const”). However, the initializer needn't be a “constant expression” although it must be able to be calculated at compile-time.

Huh? That makes no sense. Sure, it does in the world of C++ standards. A “constant expression” with only constant arithmetic is a subset of the set of expressions that can be calculated at compile-time.

The best example is a call to a function that has one path where it's constant, and another path where it's not. The definition of “somefunc” has two paths:

    int somefunc()
    {
        if (something) return 27;
        else return some_random_number();
    }

The “somefunc” function cannot be declared “const” or “constexpr” because it isn't always a constant on all paths.

However, if we're using “somefunc” at program startup initialization, we can try:

    constinit int s_myconst = somefunc();

Here, if we know that it will use the constant path for some reason, the initialization of “s_myconst” will go through the fixed path to get the compile-time constant value of 27, we can tell the compiler that by declaring the variable as constinit.

Anyway, now that you've been forced to learn all that, just forget it. You'll rarely if ever be needing constinit.

`consteval` functions

Use consteval for functions that are always constant. A consteval function is strictly declared so that every invocation of the function must return a compile-time constant.

The consteval keyword is a subset of constexpr functions (and also implies inline on a function). Although a constexpr function is constant if its arguments are constant, it can also return a dynamic return value for non-constant arguments.

When would you use consteval versus constexpr functions? I mean, when you ask your boss to make you a cup of coffee, do you like to ask politely or do you issue commands? Supposedly constexpr is optional for the C++ compiler, whereas consteval is mandating compile-time evaluation.

Personally, I can't see much difference in general usage, since the compiler will probably optimize a constexpr function at compile-time if it's capable enough. Hence, for regular functions I don't see much benefit to consteval over constexpr. There are some complicated places in C++ where it helps to guarantee a compile-time constant, such as reflexive types and other tricks in compile-time template usage.

Templates

C++ templates can be used for compile-time optimizations, rather than merely as a programming convenience for algorithm generality and interface improvement. By specializing templated code for a particular type or constant parameter, the effect is that the resulting code is more specific, giving the compiler an opportunity for better optimizations.

For example, in AI we need vector and matrix classes. Rather than having our code dynamically check whether our precision is 32-bit float, or 8-bit quantized integers, or some other low-level type, we can use templated versions of the vector and matrix classes. This generates different functions for each type of data. At the cost of some extra code space, we've given the compiler the chance to do a much better job of optimizing the code for the specific low-level data types.

Going beyond just using template code to write the same algorithm for different types, there are various ways to optimize code that is templated to do more at compile-time:

Template class and function specializations
Constant template parameters
Compile-time conditional tests on types (e.g. sizeof, type traits, etc.)
if constexpr syntax
Variadic templates
Template Metaprogramming (TMP) techniques
SFINAE techniques

Constants can be used to instantiate template code in a way that helps the compiler to optimize by evaluating constant expressions. Template parameters don't need to be types, but can also be constant variables or numbers, such as the size of an array. Using a template in this way is as efficient as hard-coding the array size, which helps the compiler to know exactly what it can optimize, such as if the array size is used in any computations.

If you think you can do better than the compiler's optimizer, remember that you can also override the generic template code. For example, you can instantiate your own specific version of a template class for a particular type. Similarly, you can provide a generic function declaration that instantiates a templated function with your explicit version.

An alternative to specializing a version of a template class or function is to use compile-time tests inside the generic template code. For example, you can use conditional tests involving compile-time operations:

sizeof
typeid
std::is_same_v
if constexpr conditional test syntax

Next level templating

C++ templates are a very powerful programming mechanism. In fact, you can define entire projects as templates inside header files. To get the most out of template optimizations at compile-time, consider these methods:

Type traits
Variadic templates
SFINAE
Template Meta-Programming (TMP)

Type traits are a generic feature of C++ (since C++11) that you can use to interrogate the type of a variable. They are declared in the <type_traits> header file and there are numerous ways that you can test the type of a variable. The above example std::is_same_v is one example. As another example, there is std::is_signed and std::is_unsigned to test whether it's a signed or unsigned type. There's also std::is_pointer and std::is_array and various others. Combining type traits with “if constexpr” gives a powerful way to ensure templated code gets evaluated at compile-time, and to specialize blocks of code for particular types.

Variadic templates are another way to level up your code and have been supported since C++11. These are variable-argument templates via the use of the ellipsis “...” operator in a template declaration. This allows templates to accept a variable number of parameters for instantiation.

SFINAE. Another optimization for advanced templating is to rely on SFINAE semantics. This refers to “Substitution Failure Is Not An Error” and means that template instantiation that fails should not itself trigger a compilation error that prevents execution. More specifically, if the compiler tries and fails to instantiate a template, but there's another way to run it, such as a different overloaded function available, then the code should execute via the non-templated method. Relying on this capability in C++ not only avoids having compilation errors that block some advanced template usages, but can also be used to ensure compile-time calculations. However, although there are some good uses cases in making templates faster, SFINAE is an obscure programming technique that isn't widely used in everyday C++ programming.

Template Meta-Programming. Further optimization of templated code at compile-time is possible via the technique called “Template Meta-Programming” (TMP). Note that this refers to an unusual usage of templates in C++, where the idea goes beyond just using templates of code for different types (i.e. normal templating of classes). TMP is an advanced coding method that uses (misuses, perhaps) instantiation semantics of templates as a way of generating compile-time code, even for some conditional branches. However, this is an obscure method that is rarely needed, because most of the effects can be achieved via preprocessor macros, function inlining, and using “constexpr” in modern C++.

References

Bjorn Andrist, Viktor Sehr (2020), C++ High Performance: Master the art of optimizing the functioning of your C++ code, 2nd Edition, Packt Publishing, Dec 2020, https://www.amazon.com/dp/1839216549, Code: https://github.com/PacktPublishing/Cpp-High-Performance-Second-Edition (Chapter 8 is on compile-time optimizations.)
Gnu.org (2023), GCC Command Options, GNU Compiler Collection, https://gcc.gnu.org/onlinedocs/gcc/Invoking-GCC.html

• Next: Chapter 12. Pointer Arithmetic

• Up: Table of Contents

• Buy: Generative AI in C++: Coding Transformers and LLMs

The new AI programming book by Aussie AI co-founders:

AI coding in C++
Transformer engine speedups
LLM models
Phone and desktop AI
Code examples
Research citations

Get your copy from Amazon: Generative AI in C++