Aussie AI
More Slug Repellent
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
More Slug Repellent
There's plenty of other optimizations in the chapters on compile-time optimizations, code transformations, loop optimizations, and vectorization. Well, actually most of the book! Nevertheless, here's a list of some more C++ code optimization techniques for you to consider. Some of the bigger ideas:
- Use “move constructors” instead of copy constructors where appropriate (since C++11).
- Use
static
data members where appropriate, so they are initialized once only. - Use
std::sort
rather thanqsort
. - Don't put
try
..catch
inside an inner loop that's a bottleneck. - Use
std::bitset
for bit sets or bit vectors. - Use the “iterators” design pattern rather than returning a full scan of a data structure all at once (saves memory and allows early exit).
- Consider basic C++ arrays instead of
std::vector
if it has a fixed size (known at compile-time) or its maximum size is small enough. - Consider C++20 coroutines where appropriate for the architecture.
- Structure of arrays (SoA) data layout is more vectorizable than Array of Structures (AoS).
And some of the smaller optimizations:
- Commonly used object or struct fields should be first. On some platforms, smaller offsets from the start of an object are accessed faster. Also, the very first field has offset zero, which is optimized away, so put the most used field first.
- Avoid long
else
-if
sequences. You are effectively doing linear search on the problem space in a long block ofif
-else
-if
statements. The best alternative is to use aswitch
statement, if the conditions are constants. For non-constant conditions or string comparisons, consider tabularizing the options and/or using heuristics to bifurcate the search space (e.g. start with aswitch
on the first letter of a string). - Use compact numeric ranges for
switch
. If the case numbers are close together, the compiler will probably use a lookup-table in assembler. If the cases are sparse, it can be forced to do anif
-else
-if
equivalent in machine code. - Correct choice of loop. If the condition is true at the first iteration, use
do
-while
loops. - Instead of range checking a signed integer
with “
i>=0 && i < MAX
” use a typecast with “(unsigned)i<MAX
” because negatives become large unsigned positives, and a cast fromint
tounsigned int
isn't a real instruction at run-time. - Enable the FTZ (“flush-to-zero”) and/or DAZ (“denormals-are-zero”) floating-point modes on your CPU, even though they violate the IEEE 754 standard. You probably don't care about tiny floating-point numbers in your weight or probability calculations.
- Enable GCC's floating-point arithmetic speedup options:
-ffast-math
,-fno-math-errno
,-fno-trapping-math
, and-ffinite-math-only
. bsearch
is slow. Choose a better method.- Use
static_assert
rather thanassert
(e.g. to check data type sizes). - Copy arrays by wrapping them in a dummy
struct
and using C++struct
bitwise assignment. It might be faster thanmemcpy
. - Use
memcpy
rather thanmemmove
if you're sure the arguments won't overlap. - Move local non-
static
objects outside of a critical loop. Reuse the same object rather than re-running constructors and destructors every loop iteration. Add a “reset
” member function if needed. - Use scaling factors that are a power-of-two, so that multiplication or division can be a bitshift.
- Specialize a function with a
void
and non-void
version if you find yourself ignoring the return value sometimes. This avoids all of the calculations to determine the return value inside thevoid
function, because the function itself cannot tell whether or not the caller will use its return value. - Prefer pre-increment (
++i
) to post-increment (i++
) for non-scalar values. And it's better to use pre-increment even for “int
” types, even though it's the same, just to get into the habit. - Use the GCC
__builtin_unreachable()
statement and the “noreturn
” function attribute to help the GCC optimizer identify dead code paths, allowing unreachable code removal (not that we care that much) and also better optimization of path-specific optimizations on other live paths (e.g. compile-time constant propagation). - Test the first character of two strings directly with character tests before calling
strcmp
. - Replace calls to “
round
”, “floor
” or “ceil
” functions with a type cast toint
(as an approximation). - Consider using the simpler
putchar
/putc
/fputc
orputs
/fputs
functions rather thanprintf
orfprintf
. - Write your own versions of
abs
andfabs
/fabsf
(but benchmark it). - Avoid the floating-point
pow
function for computing integer powers. - Instead of
strlen("literal")
declare it as an initializedchar[]
array variable and usesizeof(arr)-1
. - Merge a large number of function parameters into an object. Don't pass 10 Boolean flags as differently named function parameters. Create an object or structure and make them fields instead.
- Avoid calling
strlen
in a “for
” loop conditional. Computestrlen
before the loop, or test for the null byte. - Merge multiple Boolean function parameters into a bit set.
packed into an
int
orlong
. The gain from passing fewer values as function arguments will be offset by the cost of packing and unpacking bits, but still should be better. - Use
int
type mostly, notchar
orshort
. Maybe preferint
tosize_t
, too. - Specialize functions being called with a constant for an argument using a template function with an integer field. This will increase code size, but the constant will be propagated more at compile-time, and you also don't have the cost of passing it as an argument.
- Add “
noexcept
” specifiers to functions wherever it applies, because this allows the compiler to know not to worry about adding any extra exception handling code. - If you're “searching” an array or set of constant integers, known at compile-time, consider “proceduralization” by putting
the numbers as cases in a
switch
. (Trust the compiler engineers.) - Consider writing your own faster
atoi
/itoa
functions, as the standard libraries need to handle lots of rare cases, making them slower. (I'm not sure I agree and you might want to benchmark.) - Don't overuse “
alignas
” to specify address alignments if you don't need them, as the enforcement of alignment requirements can impose runtime cost. sprintf
is a slow and unsafe function.snprintf
is safer but still slow. Find another way.- Post-increment can be faster in pointer arithmetic, so prefer using the normal idiom “
*ptr++
” rather than “*++ptr
” to scan a vector.
References
- Agner Fog, 2023, Optimizing software in C++: An optimization guide for Windows, Linux, and Mac platforms, PDF: https://www.agner.org/optimize/optimizing_cpp.pdf
- Kurt Guntheroth, 2016, Optimized C++: Proven Techniques for Heightened Performance, O'Reilly Media, https://www.amazon.com/dp/1491922060
- Dov Bulka and David Mayhew, 1999, Efficient C++: Performance Programming Techniques, https://www.amazon.com//dp/0201379503
- Fedor G. Pikus, 2021, The Art of Writing Efficient Programs: An advanced programmer's guide to efficient hardware utilization and compiler optimizations using C++ examples, Packt Publishing, https://www.amazon.com/dp/1800208111
- ISO/IEC, Feb 15, 2006, Technical Report on C++ Performance, ISO/IEC TR 18015:2006(E), https://www.open-std.org/jtc1/sc22/wg21/docs/TR18015.pdf (Design of the C++ language from an efficiency perspective, including discussion of virtual functions and other language features.)
- Nicolai M. Josuttis, 2012, The C++ Standard Library: A Tutorial and Reference, Second Edition, Supplementary Chapter, https://www.amazon.com/Standard-Library-Tutorial-Reference-2nd/dp/0321623215, PDF (extra chapter): http://www.cppstdlib.com/cppstdlib_supplementary.pdf (C++ optimizations such as bit sets and user-defined memory allocators.)
- Bjarne Stroustrup, 2013, The Essence of C++ with examples in C++84, C++98, C++11, and C++14, PDF Slides: http://www.staroceans.org/e-book/essenceOfC++.pdf
- Wikibooks, 2023, Optimizing C++/Writing efficient code/Performance improving features, Wikibooks, https://en.wikibooks.org/wiki/Optimizing_C%2B%2B/Writing_efficient_code/Performance_improving_features
- Dave Abrahams et. al., 2003, Technical Report on C++ Performance, http://web.archive.org/web/20040608203404/http://www.research.att.com/~bs/performanceTR.pdf
- Jakob Engblom, 2001, Getting the Least Out of Your C Compiler, https://www.engbloms.se/publications/engblom-esc-sf-2001.pdf
- Jon Louis Bentley, 1982, Writing Efficient Programs, Prentice Hall.
- Thomas Plum and Jim Brodie, 1985, Efficient C, Plum Hall Inc.
- Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman, 1986, Compilers—Principles, Techniques and Tools, Addison-Wesley.
- Donald E. Knuth, 1973, The Art of Computer Programming (Vol. 3): Sorting and Searching, Addison-Wesley.
- James O. Coplien, 1992, Advanced C++ Programming Styles and Idioms, Addison-Wesley.
- Jonathan S. Shapiro, 1991, A C++ Toolkit, Prentice Hall.
- Bjarne Stroustrup, 1991, The C++ Programming Language (2nd edition), Addison-Wesley.
• Next: • Up: Table of Contents |
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |