Aussie AI

More Slug Repellent

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

More Slug Repellent

There's plenty of other optimizations in the chapters on compile-time optimizations, code transformations, loop optimizations, and vectorization. Well, actually most of the book! Nevertheless, here's a list of some more C++ code optimization techniques for you to consider. Some of the bigger ideas:

  • Use “move constructors” instead of copy constructors where appropriate (since C++11).
  • Use static data members where appropriate, so they are initialized once only.
  • Use std::sort rather than qsort.
  • Don't put try..catch inside an inner loop that's a bottleneck.
  • Use std::bitset for bit sets or bit vectors.
  • Use the “iterators” design pattern rather than returning a full scan of a data structure all at once (saves memory and allows early exit).
  • Consider basic C++ arrays instead of std::vector if it has a fixed size (known at compile-time) or its maximum size is small enough.
  • Consider C++20 coroutines where appropriate for the architecture.
  • Structure of arrays (SoA) data layout is more vectorizable than Array of Structures (AoS).

And some of the smaller optimizations:

  • Commonly used object or struct fields should be first. On some platforms, smaller offsets from the start of an object are accessed faster. Also, the very first field has offset zero, which is optimized away, so put the most used field first.
  • Avoid long else-if sequences. You are effectively doing linear search on the problem space in a long block of if-else-if statements. The best alternative is to use a switch statement, if the conditions are constants. For non-constant conditions or string comparisons, consider tabularizing the options and/or using heuristics to bifurcate the search space (e.g. start with a switch on the first letter of a string).
  • Use compact numeric ranges for switch. If the case numbers are close together, the compiler will probably use a lookup-table in assembler. If the cases are sparse, it can be forced to do an if-else-if equivalent in machine code.
  • Correct choice of loop. If the condition is true at the first iteration, use do-while loops.
  • Instead of range checking a signed integer with “i>=0 && i < MAX” use a typecast with “(unsigned)i<MAX” because negatives become large unsigned positives, and a cast from int to unsigned int isn't a real instruction at run-time.
  • Enable the FTZ (“flush-to-zero”) and/or DAZ (“denormals-are-zero”) floating-point modes on your CPU, even though they violate the IEEE 754 standard. You probably don't care about tiny floating-point numbers in your weight or probability calculations.
  • Enable GCC's floating-point arithmetic speedup options: -ffast-math, -fno-math-errno, -fno-trapping-math, and -ffinite-math-only.
  • bsearch is slow. Choose a better method.
  • Use static_assert rather than assert (e.g. to check data type sizes).
  • Copy arrays by wrapping them in a dummy struct and using C++ struct bitwise assignment. It might be faster than memcpy.
  • Use memcpy rather than memmove if you're sure the arguments won't overlap.
  • Move local non-static objects outside of a critical loop. Reuse the same object rather than re-running constructors and destructors every loop iteration. Add a “reset” member function if needed.
  • Use scaling factors that are a power-of-two, so that multiplication or division can be a bitshift.
  • Specialize a function with a void and non-void version if you find yourself ignoring the return value sometimes. This avoids all of the calculations to determine the return value inside the void function, because the function itself cannot tell whether or not the caller will use its return value.
  • Prefer pre-increment (++i) to post-increment (i++) for non-scalar values. And it's better to use pre-increment even for “int” types, even though it's the same, just to get into the habit.
  • Use the GCC __builtin_unreachable() statement and the “noreturn” function attribute to help the GCC optimizer identify dead code paths, allowing unreachable code removal (not that we care that much) and also better optimization of path-specific optimizations on other live paths (e.g. compile-time constant propagation).
  • Test the first character of two strings directly with character tests before calling strcmp.
  • Replace calls to “round”, “floor” or “ceil” functions with a type cast to int (as an approximation).
  • Consider using the simpler putchar/putc/fputc or puts/fputs functions rather than printf or fprintf.
  • Write your own versions of abs and fabs/fabsf (but benchmark it).
  • Avoid the floating-point pow function for computing integer powers.
  • Instead of strlen("literal") declare it as an initialized char[] array variable and use sizeof(arr)-1.
  • Merge a large number of function parameters into an object. Don't pass 10 Boolean flags as differently named function parameters. Create an object or structure and make them fields instead.
  • Avoid calling strlen in a “for” loop conditional. Compute strlen before the loop, or test for the null byte.
  • Merge multiple Boolean function parameters into a bit set. packed into an int or long. The gain from passing fewer values as function arguments will be offset by the cost of packing and unpacking bits, but still should be better.
  • Use int type mostly, not char or short. Maybe prefer int to size_t, too.
  • Specialize functions being called with a constant for an argument using a template function with an integer field. This will increase code size, but the constant will be propagated more at compile-time, and you also don't have the cost of passing it as an argument.
  • Add “noexcept” specifiers to functions wherever it applies, because this allows the compiler to know not to worry about adding any extra exception handling code.
  • If you're “searching” an array or set of constant integers, known at compile-time, consider “proceduralization” by putting the numbers as cases in a switch. (Trust the compiler engineers.)
  • Consider writing your own faster atoi/itoa functions, as the standard libraries need to handle lots of rare cases, making them slower. (I'm not sure I agree and you might want to benchmark.)
  • Don't overuse “alignas” to specify address alignments if you don't need them, as the enforcement of alignment requirements can impose runtime cost.
  • sprintf is a slow and unsafe function. snprintf is safer but still slow. Find another way.
  • Post-increment can be faster in pointer arithmetic, so prefer using the normal idiom “*ptr++” rather than “*++ptr” to scan a vector.

References

  1. Agner Fog, 2023, Optimizing software in C++: An optimization guide for Windows, Linux, and Mac platforms, PDF: https://www.agner.org/optimize/optimizing_cpp.pdf
  2. Kurt Guntheroth, 2016, Optimized C++: Proven Techniques for Heightened Performance, O'Reilly Media, https://www.amazon.com/dp/1491922060
  3. Dov Bulka and David Mayhew, 1999, Efficient C++: Performance Programming Techniques, https://www.amazon.com//dp/0201379503
  4. Fedor G. Pikus, 2021, The Art of Writing Efficient Programs: An advanced programmer's guide to efficient hardware utilization and compiler optimizations using C++ examples, Packt Publishing, https://www.amazon.com/dp/1800208111
  5. ISO/IEC, Feb 15, 2006, Technical Report on C++ Performance, ISO/IEC TR 18015:2006(E), https://www.open-std.org/jtc1/sc22/wg21/docs/TR18015.pdf (Design of the C++ language from an efficiency perspective, including discussion of virtual functions and other language features.)
  6. Nicolai M. Josuttis, 2012, The C++ Standard Library: A Tutorial and Reference, Second Edition, Supplementary Chapter, https://www.amazon.com/Standard-Library-Tutorial-Reference-2nd/dp/0321623215, PDF (extra chapter): http://www.cppstdlib.com/cppstdlib_supplementary.pdf (C++ optimizations such as bit sets and user-defined memory allocators.)
  7. Bjarne Stroustrup, 2013, The Essence of C++ with examples in C++84, C++98, C++11, and C++14, PDF Slides: http://www.staroceans.org/e-book/essenceOfC++.pdf
  8. Wikibooks, 2023, Optimizing C++/Writing efficient code/Performance improving features, Wikibooks, https://en.wikibooks.org/wiki/Optimizing_C%2B%2B/Writing_efficient_code/Performance_improving_features
  9. Dave Abrahams et. al., 2003, Technical Report on C++ Performance, http://web.archive.org/web/20040608203404/http://www.research.att.com/~bs/performanceTR.pdf
  10. Jakob Engblom, 2001, Getting the Least Out of Your C Compiler, https://www.engbloms.se/publications/engblom-esc-sf-2001.pdf
  11. Jon Louis Bentley, 1982, Writing Efficient Programs, Prentice Hall.
  12. Thomas Plum and Jim Brodie, 1985, Efficient C, Plum Hall Inc.
  13. Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman, 1986, Compilers—Principles, Techniques and Tools, Addison-Wesley.
  14. Donald E. Knuth, 1973, The Art of Computer Programming (Vol. 3): Sorting and Searching, Addison-Wesley.
  15. James O. Coplien, 1992, Advanced C++ Programming Styles and Idioms, Addison-Wesley.
  16. Jonathan S. Shapiro, 1991, A C++ Toolkit, Prentice Hall.
  17. Bjarne Stroustrup, 1991, The C++ Programming Language (2nd edition), Addison-Wesley.

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++