Aussie AI

Slug Hunting Advice

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

Slug Hunting Advice

This appendix is about speeding up your C++ programs through general improvements to sequential or parallel coding. For AI-specific techniques for speeding up your Transformer's inference of your model (e.g. quantization, pruning), see the various chapters in Part V (Chapters 28-36).

Before we begin with anything that's actually useful, I have to introduce the obligatory wrist-slapping politically-correct deslugging advice for programmers. Hence, here are some general nuggets of advice when attempting to speed up your program:

  • Profile twice, code once. Performance profiling tools exist for a reason.
  • Don't micro-optimize. Unless you're into that kind of thing. But really, try to sit on your hands.
  • Do macro-optimize. Think about your data structures and algorithms.
  • Optimizing introduces new bugs. 100% guaranteed! Don't optimize the night before your release. Re-run your test suite.
  • Don't optimize exception handling. Tweaking rarely-executed code is a poor use of your geniousness.
  • Use open source third-party libraries that have already been optimized by others.

Or just ignore that advice and go crazy. It's just too much fun optimizing when the alternative is dreary debugging. Pro tip: it's even more fun writing a book on optimizing!

Where to hunt slugs? Some of the common large-scale issues with coding inefficiency in typical C++ programs include:

  • Function call hierarchies
  • Nested loops
  • Overuse of memory allocation
  • Constructor and destructor inefficiencies
  • Inefficient algorithms (e.g. linear search of arrays)
  • Unnecessary overhead or wrappers
  • Recursion. After you've coded up your university assignments (Tower of Hanoi, anyone?), please forget recursion exists.

C++ Speedup Techniques: Some of the general ways to speed up C++ programs at the design structure or algorithmic level include:

  • Faster data structures (e.g. hash tables).
  • Faster algorithms (e.g. fix linear search to something faster like, you know, hashing again).
  • Parallelize via multi-threading, multi-process, multi-core, multi-GPU, multi-something.
  • Vectorization (parallelize your important loops)
  • Precompute expensive functions into a lookup table at compile-time (e.g. activation functions).
  • Cache any complex calculations to trade extra space for time savings (e.g. KV caching).
  • Change floating-point to integer operations (quantization, anyone?)
  • Replace recursion with iteration. Subtract ten bonus points if you need to do this.

Some of the high-level C++ coding optimizations include:

  • Flatten function call hierarchies (stop wrapping everything so much, and inline the small functions at the bottom).
  • Optimize loops, especially nested loops (e.g. move loop-invariant code out, loop unrolling, vectorization, etc.)
  • Templates are effectively a compile-time optimization that improves speed at the cost of code space.
  • Reduce memory allocation (use less memory overall or replace memory allocation with temporary stack buffers).
  • Operator strength reduction (e.g. replace “*” with “+”, a pipe dream of all AI engineers).
  • Declare variables as close as possible to where they are used. This avoids instantiating objects that aren't needed on some paths.
  • Use pointer arithmetic, especially for loops over arrays.
  • Bitwise operations are fast, but the basic C++ integer operations are also fast too, nowadays. Benchmark, don't assume.
  • Use short-circuiting of the && and || operators, and also the ternary ?: operator, to avoid expensive function calls.

And finally, some things you might forget (and some that are forgettable):

  • Benchmark any important changes (e.g. operator strength reductions).
  • Turn up your C++ optimizer. There are higher settings you could try.
  • Add compile-time optimization hints (e.g. constexpr and restrict).
  • Overclock your PC (like a gamer).
  • Sell your car to buy a better GPU.
  • Put every function in a header file and make them all inline.
  • Reorder your case labels. Surely it helps.
  • Change i++ to ++i in everyone else's code.

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++