Aussie AI

Maintainability

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

Maintainability

My first Software Engineer job was maintenance of low-level systems management on a lumbering Ultrix box in C code, with hardly any comments. You'd think I hate code maintenance, right? No, I had the opposite reaction: it was the best job ever!

If you think you don't like code maintenance, consider this: Code maintenance is what you do every day. I mean, except for those rare days where you're starting a new project from scratch, you're either maintaining your own code or someone else's, or both. There are two main modes: you're either debugging issues or extending the product with new features, but in both cases it is at some level a maintenance activity.

So, how do you improve future maintainability of code? And how do you fix up old code that's landed on your desk, flapping around like a seagull, because your company acquired a small startup.

Let's consider your existing code. How would you make your code better so that a future new hire can be quickly productive? The answer is probably not that different to the general approach to improving reliability of your code. Things like unit tests, regression testing, exception handling, and so on will make it easier for a new hire. You can't stop that college intern from re-naming all the source code files or re-indenting the whole codebase, but at least you can help them to not break stuff.

One way to think about future maintainability is to take a step back and think of it as a “new hire induction” problem. After you've shown your new colleague the ping pong table in the lunch room and the restrooms, they need to know:

  • Where is the code, and how do I check it out of the repo?
  • How do I build it? Run it? Test it?
  • Where's the bug database, requirements documents, or enhancements list?
  • What are the big code libraries? Which directories?

After that, then you can get into the nitty-gritty of how the C++ is laid out. Where are the utility libraries that handle low-level things like files, memory allocation, strings, hash tables, and whatnot? Which code modules do the higher-level AI engine features like activation functions, MatMul, tokenization, and so on? Where do I add a new unit test? A new command-line argument or configuration property?

Maintenance safety nets: How do you make your actual C++ code resilient to the onslaught of a new hire programmer? Assume that future changes to the code will often introduce bugs, and try to plan ahead to catch them using various coding tricks. Actually, the big things in preventing future bugs are the large code reliability techniques (e.g. unit tests, assertions, comment your code, blah blah blah). There are a lot of little things you can do, which are really quite marginal compared to the big things, but are much more fun, so here's my list:

  • All variables should be initialized, even if it'll be immediately overwritten (i.e. “int x=3;” never just “int x;”). The temptation to not initialize is mainly from variables that are only declared so as to be passed into some other function to be set as a reference parameter. And yes, in this case, it's an intentional micro-inefficiency to protect against a future macro-crashability.
  • Unreachable code should be marked with at least a comment or preferably an attribute or assertion (e.g. use the “yassert_not_reached” assertion idea).
  • Prefer statement blocks with curly braces to single-statements in any if, else, or loop body. Also for case and default. Use braces even if all fits on one line. Otherwise, some newbie will add a second statement, guaranteed.
  • Once-only initialization code that isn't in a constructor should also be protected (e.g. the “yassert_once” idea).
  • All switch statements need a default (even if it just triggers an assertion).
  • Don't use case fallthrough, except it's allowed for Duff's Device and any other really cool code abuses. Tag it with [[fallthrough]] if you must use it.
  • Avoid preprocessor macros. Prefer inline functions rather than function-like macro tricks, and do named constants using const or enum names rather than #define. I've only used macros in this book for educational purposes, and you shouldn't even be looking at my dubious coding style.
  • Declare a dummy enum at the end of an enum list (e.g. “MyEnum_EOL_Dummy”), and use this EOL name in any range-checking of values of enum variables. Otherwise, it breaks when someone adds a new enum at the end. EOL means “end-of-list” if you were wondering.
  • Add some range-checking of your enum variables, because you forgot about that. Otherwise array indices and enum variables tend to get mixed up when you have a lot of int variables.
  • Assert the exact numeric values of a few random enum symbols, and put cuss words in the optional message, telling newbie programmers that they shouldn't add a new enum at the top of the list.
  • sizeof(varname) is better than sizeof(int) when someone changes it to long type. Similarly, use sizeof(arr[0]) and sizeof(*ptr). No, the * operator isn't live in sizeof.
  • All classes should have the “big four” (constructor, destructor, copy constructor, and assignment operator), even if they're silly, like when the destructor is just {}.
  • If your class should not ever be bitwise-copied, then declare a dummy copy constructor and assignment operator (i.e. as “private” and without a function body), so the compiler prevents a newbie from accidentally doing something that would be an object bitwise copy.
  • If your AI code needs a mathematical constant, like the reciprocal of the square root of pi, just work it out on your calculator and type the number in directly. Job security.
  • A switch over an enum should usually have the default clause as an error or assertion. This detects the code maintenance situation where a newly added enum code isn't being handled.
  • Avoid long if-else-if sequences. They get confusing. They also break completely if someone adds a new “if” section in the middle, but forgets it should be “else if” instead.
  • Instigate a rule that whoever breaks the build has to bring kolaches tomorrow.

But don't sweat it. New hires will break your code, and then just comment out the unit test that fails.

Maintaining OPC. What about brand-new code? It's from that startup that got acquired, and it's really just a hacked-up prototype that should never have shipped. Now it's landed on your desk with a big red bow wrapped around it and a nice note from your boss telling you how much it'll be appreciated if you could have a little look at this. At least it's a challenge, and maybe you could even learn a little Italian, because that's the language the comments are written in.

So, refactoring has to be top of the list. You need to move code around so that it is modular, easier to unit test, and so on. Split out smaller functions and group all the low-level factory type routines. Writing some internal documentation about new code doesn't hurt either! And “canale” means “channel” in Italian so now you're bilingual.

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++