Aussie AI
More Slug Repellent
- 
                                            Book Excerpt from "Generative AI in C++"
 
- 
                                            by David Spuler, Ph.D.
 
More Slug Repellent
There's plenty of other optimizations in the chapters on compile-time optimizations, code transformations, loop optimizations, and vectorization. Well, actually most of the book! Nevertheless, here's a list of some more C++ code optimization techniques for you to consider. Some of the bigger ideas:
- Use “move constructors” instead of copy constructors where appropriate (since C++11).
 - Use 
staticdata members where appropriate, so they are initialized once only. - Use 
std::sortrather thanqsort. - Don't put 
try..catchinside an inner loop that's a bottleneck. - Use 
std::bitsetfor bit sets or bit vectors. - Use the “iterators” design pattern rather than returning a full scan of a data structure all at once (saves memory and allows early exit).
 - Consider basic C++ arrays instead of 
std::vectorif it has a fixed size (known at compile-time) or its maximum size is small enough. - Consider C++20 coroutines where appropriate for the architecture.
 - Structure of arrays (SoA) data layout is more vectorizable than Array of Structures (AoS).
 
And some of the smaller optimizations:
- Commonly used object or struct fields should be first. On some platforms, smaller offsets from the start of an object are accessed faster. Also, the very first field has offset zero, which is optimized away, so put the most used field first.
 - Avoid long 
else-ifsequences. You are effectively doing linear search on the problem space in a long block ofif-else-ifstatements. The best alternative is to use aswitchstatement, if the conditions are constants. For non-constant conditions or string comparisons, consider tabularizing the options and/or using heuristics to bifurcate the search space (e.g. start with aswitchon the first letter of a string). - Use compact numeric ranges for 
switch. If the case numbers are close together, the compiler will probably use a lookup-table in assembler. If the cases are sparse, it can be forced to do anif-else-ifequivalent in machine code. - Correct choice of loop. If the condition is true at the first iteration, use 
do-whileloops. - Instead of range checking a signed integer 
with “
i>=0 && i < MAX” use a typecast with “(unsigned)i<MAX” because negatives become large unsigned positives, and a cast frominttounsigned intisn't a real instruction at run-time. - Enable the FTZ (“flush-to-zero”) and/or DAZ (“denormals-are-zero”) floating-point modes on your CPU, even though they violate the IEEE 754 standard. You probably don't care about tiny floating-point numbers in your weight or probability calculations.
 - Enable GCC's floating-point arithmetic speedup options: 
-ffast-math,-fno-math-errno,-fno-trapping-math, and-ffinite-math-only. bsearchis slow. Choose a better method.- Use 
static_assertrather thanassert(e.g. to check data type sizes). - Copy arrays by wrapping them in a dummy 
structand using C++structbitwise assignment. It might be faster thanmemcpy. - Use 
memcpyrather thanmemmoveif you're sure the arguments won't overlap. - Move local non-
staticobjects outside of a critical loop. Reuse the same object rather than re-running constructors and destructors every loop iteration. Add a “reset” member function if needed. - Use scaling factors that are a power-of-two, so that multiplication or division can be a bitshift.
 - Specialize a function with a 
voidand non-voidversion if you find yourself ignoring the return value sometimes. This avoids all of the calculations to determine the return value inside thevoidfunction, because the function itself cannot tell whether or not the caller will use its return value. - Prefer pre-increment (
++i) to post-increment (i++) for non-scalar values. And it's better to use pre-increment even for “int” types, even though it's the same, just to get into the habit. - Use the GCC 
__builtin_unreachable()statement and the “noreturn” function attribute to help the GCC optimizer identify dead code paths, allowing unreachable code removal (not that we care that much) and also better optimization of path-specific optimizations on other live paths (e.g. compile-time constant propagation). - Test the first character of two strings directly with character tests before calling 
strcmp. - Replace calls to “
round”, “floor” or “ceil” functions with a type cast toint(as an approximation). - Consider using the simpler 
putchar/putc/fputcorputs/fputsfunctions rather thanprintforfprintf. - Write your own versions of 
absandfabs/fabsf(but benchmark it). - Avoid the floating-point 
powfunction for computing integer powers. - Instead of 
strlen("literal")declare it as an initializedchar[]array variable and usesizeof(arr)-1. - Merge a large number of function parameters into an object. Don't pass 10 Boolean flags as differently named function parameters. Create an object or structure and make them fields instead.
 - Avoid calling 
strlenin a “for” loop conditional. Computestrlenbefore the loop, or test for the null byte. - Merge multiple Boolean function parameters into a bit set.
packed into an 
intorlong. The gain from passing fewer values as function arguments will be offset by the cost of packing and unpacking bits, but still should be better. - Use 
inttype mostly, notcharorshort. Maybe preferinttosize_t, too. - Specialize functions being called with a constant for an argument using a template function with an integer field. This will increase code size, but the constant will be propagated more at compile-time, and you also don't have the cost of passing it as an argument.
 - Add “
noexcept” specifiers to functions wherever it applies, because this allows the compiler to know not to worry about adding any extra exception handling code. - If you're “searching” an array or set of constant integers, known at compile-time, consider “proceduralization” by putting
the numbers as cases in a 
switch. (Trust the compiler engineers.) - Consider writing your own faster 
atoi/itoafunctions, as the standard libraries need to handle lots of rare cases, making them slower. (I'm not sure I agree and you might want to benchmark.) - Don't overuse “
alignas” to specify address alignments if you don't need them, as the enforcement of alignment requirements can impose runtime cost. sprintfis a slow and unsafe function.snprintfis safer but still slow. Find another way.- Post-increment can be faster in pointer arithmetic, so prefer using the normal idiom “
*ptr++” rather than “*++ptr” to scan a vector. 
References
- Agner Fog, 2023, Optimizing software in C++: An optimization guide for Windows, Linux, and Mac platforms, PDF: https://www.agner.org/optimize/optimizing_cpp.pdf
 - Kurt Guntheroth, 2016, Optimized C++: Proven Techniques for Heightened Performance, O'Reilly Media, https://www.amazon.com/dp/1491922060
 - Dov Bulka and David Mayhew, 1999, Efficient C++: Performance Programming Techniques, https://www.amazon.com//dp/0201379503
 - Fedor G. Pikus, 2021, The Art of Writing Efficient Programs: An advanced programmer's guide to efficient hardware utilization and compiler optimizations using C++ examples, Packt Publishing, https://www.amazon.com/dp/1800208111
 - ISO/IEC, Feb 15, 2006, Technical Report on C++ Performance, ISO/IEC TR 18015:2006(E), https://www.open-std.org/jtc1/sc22/wg21/docs/TR18015.pdf (Design of the C++ language from an efficiency perspective, including discussion of virtual functions and other language features.)
 - Nicolai M. Josuttis, 2012, The C++ Standard Library: A Tutorial and Reference, Second Edition, Supplementary Chapter, https://www.amazon.com/Standard-Library-Tutorial-Reference-2nd/dp/0321623215, PDF (extra chapter): http://www.cppstdlib.com/cppstdlib_supplementary.pdf (C++ optimizations such as bit sets and user-defined memory allocators.)
 - Bjarne Stroustrup, 2013, The Essence of C++ with examples in C++84, C++98, C++11, and C++14, PDF Slides: http://www.staroceans.org/e-book/essenceOfC++.pdf
 - Wikibooks, 2023, Optimizing C++/Writing efficient code/Performance improving features, Wikibooks, https://en.wikibooks.org/wiki/Optimizing_C%2B%2B/Writing_efficient_code/Performance_improving_features
 - Dave Abrahams et. al., 2003, Technical Report on C++ Performance, http://web.archive.org/web/20040608203404/http://www.research.att.com/~bs/performanceTR.pdf
 - Jakob Engblom, 2001, Getting the Least Out of Your C Compiler, https://www.engbloms.se/publications/engblom-esc-sf-2001.pdf
 - Jon Louis Bentley, 1982, Writing Efficient Programs, Prentice Hall.
 - Thomas Plum and Jim Brodie, 1985, Efficient C, Plum Hall Inc.
 - Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman, 1986, Compilers—Principles, Techniques and Tools, Addison-Wesley.
 - Donald E. Knuth, 1973, The Art of Computer Programming (Vol. 3): Sorting and Searching, Addison-Wesley.
 - James O. Coplien, 1992, Advanced C++ Programming Styles and Idioms, Addison-Wesley.
 - Jonathan S. Shapiro, 1991, A C++ Toolkit, Prentice Hall.
 - Bjarne Stroustrup, 1991, The C++ Programming Language (2nd edition), Addison-Wesley.
 
| 
 • Next: • Up: Table of Contents  | 
 
 | 
The new AI programming book by Aussie AI co-founders:
 Get your copy from Amazon: Generative AI in C++  |