Aussie AI

39. Quality

Book Excerpt from "Generative AI in C++"

by David Spuler, Ph.D.

“Quality is just the focal point around which
a lot of intellectual furniture is getting rearranged.”

— Robert M. Pirsig, Zen and the Art of Motorcycle Maintenance, 1974.

AI Quality

A quality AI would predict my wishes and wash my dishes. While we wait for that to happen, the desirable qualities of an AI engine include:

Accuracy
Sensitivity
Empathy
Predictability
Alignment

Much as I like code, a lot of the “smartness” of the LLM starts with the training data. Garbage in, garbage out! Finding enough quality data that is ratified to use for model fine-tuning or a RAG database is one of the hurdles that delays business deployment of AI applications. Another problem with data quality is that new models are starting to be trained using the outputs of other models, and this “synthetic data” is leading to degradation in these downstream models.

At the other end of the quality spectrum, we've seen the headlines about the various types of malfeasance that a low-quality AI engine could perform, such as:

Bias
Toxicity
Inappropriateness
Hallucinations (i.e., fake answers)
Wrong answers (e.g., from inaccurate training data)
Dangerous answers (e.g., mushroom collecting techniques)
Going “rogue”

And some of the technical limitations and problems that have been seen in various AI applications include:

Lack of common sense
Difficulty with mathematical reasoning
Explainability/attribution difficulty
Overconfidence
Model drift (declining accuracy over time)
Catastrophic forgetting (esp. in long texts)
Lack of a “world view”
Training cut-off dates
Difficulty with time-related queries (e.g., “What is significant about today?”)
Problems handling tabular input data (e.g., spreadsheets)
Banal writing that lacks emotion and “heart” (it's a robot!)

If you ask me, almost the exact same list would apply to any human toddler, although at least ChatGPT doesn't pour sand in your ear or explain enthusiastically that “Dad likes wine” during show-and-tell. Personally, I think it's still a long road to Artificial General Intelligence (AGI).

Unfortunately, every single bullet point in the above paragraphs is a whole research area in itself. Everyone's trying to find methods to improve the smartness and reduce the dumbness. There's another whole book in that list, so I'm going to stick to the code.

The remainder of this chapter is primarily around the quality issues that you have as an engineer of a C++ Transformer, such as ensuring it never crashes and responds fast enough. Many of the issues are generic to any kind of C++ application, but there are also some AI-specific aspects to software quality.

What is Software Quality?

Quality is an overarching goal in software design. The terms “software quality” and “code quality” are not the same thing. Software quality is more about product quality from the user or company perspective, which has a more outward looking feel with issues such as functionality and usability. Code quality is what software developers work on every day. Having quality coding practices is a pre-requisite for software quality, so there's much overlap.

How do we improve both types of “quality”?

First, let's acknowledge the subjectivity. Some groups of people are more focused on “software quality” than “code quality” as a goal. Salespeople want the product to have the hot features. Marketing wants a nice UI and a “positioning” in the market (what is so great about the letter P?). Support wants nobody to call.

For those working on the internals of software, everybody has a different view of code quality. Quality engineers want everything to be perfect before it ships. Project managers want to hit the date by time-boxing features out. Developers want, well, who knows, because every developer has a different but deeply-held belief about this topic.

Second, let's examine the metrics for quality code. It's runtime things like: has cool features, doesn't crash or spin, and is performant. And it's static things like: readability, modularity, and so on. And there are future-looking metrics such as: maintainability, extensibility, etc. There are various techniques to enhance these types of metrics, which we examine in the following chapters.

Third, let's take a top-down look. What does “software quality” or “code quality” mean on the executive floor? Probably it means any software that has “AI” features, so the CEO can say that buzzword in the earnings call about a hundred times. I heard on TikTok that McKinsey research proved that stocks appreciate by sqrt(pi/8) percentage points for every mention.

Finally, let's take a bottom-up look, which is really most of this chapter and the following chapters. We are talking about C++ coding, after all. There are a lot of practical techniques that can be used to improve the delivery of quality software through improvements to C++ code quality and other areas.

Advanced Software Quality

If you want to write the best C++ software for enterprise purposes in terms of “quality,” you need to consider a lot of “abilities”:

Testability
Debuggability
Scalability
Usability
Installability
Supportability
Availability
Reliability
Maintainability
Portability
Extensibility
Interoperability
Reusability

Take a breath. Keep going. Some more:

Deployability
Manageability
Readability
Upgradability
Marketability
Monetizability
Quality-ability (whatever that means!)
Security protection (hackability)
Internationalization (translatability)
Fault tolerance and resilience (keep-going-ability)
Modularity (separatability)
Stability

Oh, and I almost forgot one coding quality issue:

Adding new features that customers want.

Before we get too wrapped in all those inward-looking “abilities,” let us remind ourselves that the customer only cares about a few of them: installability, usability, stability. For a B2C product, think about the “grandma test”; could your grandma use this software? (After she's called you and made you set up her WiFi, I mean.) For B2B customers, the main thing the users actually care about is “ability-ability” which is whether your software has the capability to help users do whatever bizarre things businesses want to do with your code.

Sellability

Oops, I've forgotten about sales yet again, which isn't surprising because all of us in R&D aren't allowed to talk to the reps. I guess they have cooties or they'll stop selling the currently shipped version or they'll blame us for not winning a deal with the currently shipped version. We have drills to practice hiding under our chairs if we see a rep.

Anyway, to get back on topic, marketability and sellability is actually the highest level of quality. If nobody buys it, who cares how beautiful an architecture? Consider broadening the definition of “quality” beyond the C++ code to the “software quality” of the entire product from the perspective of the company.

Sellability is quality!

Most of the “code quality” practices in software engineering are internal inward-focused work, rather than looking “outwards” at the customer. If your company goal is actually financial success of your C++ software product in the B2B market, here's my suggestion of an alternative set of C++ “sellability” processes to consider:

1. Ask your sales reps what new features will close their current deal.

2. Code that in C++.

3. Run your 24-hour or 48-hour automated test suite.

4. Give the executables to your sales reps on a zippy.

Note that I only said to “consider” this method. Nobody in R&D is actually going to do it, I'm sure. I only wrote that so all the sales reps would buy a programming book.

Software Engineering Methodologies

Below is a list of various software engineering paradigms and architectural practices. Let me hereby emphatically state that one of these methods is clearly and by far the absolute best one, far superior to all the rest, and I will defend it to the hilt over a brew any day of the week.

Oh, but I'm not going to tell you which one. Feel free to argue amongst yourselves. Here's the list:

Agile development
Pair programming
AI copilot programming
Waterfall method
DevOps for everyone
Test-driven development
Feature-driven development
Agile scrum
Lean coding
GMB
Don't Repeat Yourself (DRY)
Structured Design Methodology
Designated Object Architecture (DOA)
UML
Rapid Application Development (RAD)
eXtreme Programming
Object Oriented Design (OOD)
SQA
Rogue coder model
Pick Your Favorite Acronym (PYFA)
Intentional coding
Joint Application Development Process
Move fast & break stuff
Behavior-Driven Development
SOLID
Domain-Driven Design
Product Market Fit (PMF)
ISO something
Fingers and toes crossed
Spiral Model
TQM or six-sigma or Jack Welch stuff
Code myself a new minivan
YAGNI
Rational Unified Coding
Product-Led Growth (PLG)

What a fun list! I'm going to make a poster to put on the wall above my “jump to conclusions” mat.

Software Engineering Process Group

The idea of a Software Engineering Process Group (SEPG) is a team of people in your company who aim to help software engineers write better code. It's people helping people, so what could be better than that?

I mean, AI engines helping people is cheaper, but you didn't hear that from me.

What this SEPG team does is buy everyone in the company a copy of this book, including the valet parking attendants and catering staff, who are integral to your AI strategy, if you ask me, because they're real users who ask ChatGPT stuff on their phone all day long. After that, it's feet up on the desk and read the newspaper for the rest of the day on the SEPG floor, because it's all sorted.

I really like the idea of the SEPG, but I've also seen it ineffective when product groups simply ignored their advice. I don't know what to say about that. I guess if I were running an SEPG, I'd say try to focus on pragmatic and incremental ways to improve software processes. Some of the ways that an SEPG can add tremendous value across an entire software development organization include:

Educating engineers on best practices.
Reviewing coding tools that might be useful.
Vetting common libraries of low-level functionality (reusability!).
Documenting and sharing successful methods and ideas.
Coding up horizontal libraries like debug wrappers.

Oh, yeah, and a coding standards document, because who doesn't love a great one of those.

Coding Standards

I cannot pretend that I am a big fan of having coding style standards. But most large companies tend to have them, and there is certainly a benefit to doing so. You can find Google's on the Internet, and I read it to my toddler to put him to sleep (easier than putting him into a child seat and doing a hundred blockies at 3am while wearing pyjamas; who doesn't love parenting?).

The advantage of a coding policy is a standardization of various activities and processes company-wide, which is something they really like in head office. The disadvantages include things like: (a) a focus on “busy work” coding rather than adding new user features, and (b) practical difficulties merging two different development procedures if you acquire another big company. Newly acquired startups will expend a fair amount of effort to conform to your standards, but they probably need to do similar activities to fix technical debt, anyway.

My preference would rather be that a company has a specific organizational group focused on software engineering excellence, with a focus on practicality, rather than dictate the “one true way” of programming. Coding standards are only one of the many issues for such a cross-company team to address. This is the idea of having an SEPG in your organization, which is kind of like a SEP field, if you know what I mean. So, it is a matter of tone and focus in terms of how high or how low to go in devising the coding standard for your project or organization.

Some high-level issues that could be addressed:

Which programming language. (C++, of course!)
Code libraries allowed
Tech stack: database, app layer, UI, etc.
Tools: source code control, bug database, etc.
Naming: e.g., good APIs follow a naming convention that the developer can guess.

A coding style for C++ could specify a variety of factors about which of the advanced language features to use (or avoid):

Templates
Operator overloading
Class inheritance hierarchies
Namespace management

I'm really not going to suggest your coding standard document should address indentation, variable names, comments, and so on, but some of these types of documents actually do.

There is also value in specifying standard suggested coding libraries and interfaces:

Basic data types
Basic coding libraries
Basic data structures (e.g. hash tables, lookup tables, etc.)
Unit testing library/APIs
Regression testing tools and harnesses
Assertions and self-testing
Debug tracing code
Exception handling
Testing and debugging tools

I could go on, but I won't.

Project Estimation

Estimating project time and space requirements is an important part of software project management. Although estimating the efficiency of a proposed project is important in ascertaining its feasibility, it is difficult to find anything concrete to say about arriving at these estimates. Producing advance estimates is more of an art than a science, and a typical process goes like this:

1. Pick a random date.

2. Deny programmers sleep until this date.

3. Slip the date.

4. Time-box out all useful features.

5. Ship it!

Experience is probably the best source of methods for producing an accurate estimate. Hence, it is wise to seek out others who have implemented a similar project, or to perform a literature search for relevant papers and books. Unfortunately, neither of these methods is guaranteed to succeed and the implementor may be forced to go it alone. The only other realistic means of estimation relies on a good understanding of the various data structures and algorithms that will be used by the program. Making realistic assumptions about the input can provide some means of examining the performance of a data structure. How a data structure performs under worst case assumptions may also be of great importance.

An alternative to these methods of plucking estimates out of the air is to code up a prototype version of the program, which implements only the most important parts of the project (especially those which will have the biggest impact). The efficiency of the prototype can then be measured using the various techniques. Even if the prototype is too inefficient, at least the problem has been identified early in the development cycle, when the investment in the project is relatively low.

Code Quality

Everyone has their own opinions on the best way to write software, so I'll choose to simply offer some possible options for you to discuss. Here is my list of some of the more pragmatic and useful ways to ensure code reliability as a professional software developer:

Lots of unit tests.
Lots of assertions.
Lots of bigger regression tests.
Automated acceptance testing in CI/CD.
Nightly builds that automatically re-run all the bigger tests that are too slow for CI/CD.
Warning-free compilation (as a coding policy goal).
Running Valgrind or other memory checkers in the nightly builds (Linux).
Run big multi-platform tests in the nightly builds.
Check return codes (as a coding policy).
Validate incoming function parameters (as a coding policy).
Use an error logger.
Use a debug tracing library.
Add some debug wrapper functions.

And here's an extra bonus one: have an occasional “testing day.” Programmers are good at random testing of OPC, but they tend not to do it much.

Extensibility

Extensibility is allowing your customers to extend or customize your AI software. Although your first thought is going to be to run off and build an API or an SDK, there are a few things to consider first. The simpler ways to “extend” are:

Just add more features.
Add configuration settings.
Add command-line options.
Add minor personalization features.

Adding customer features. The basic problem that customers have is that they want to find a way to do something. If they're looking to extend your software, well, that means that some feature is lacking. If one customer finds this issue, other customers are probably silently suffering. So, rather than building an API, just listen to your customer, and add some more features to your code that will solve the issue, and other reasonably similar issues.

Configuration settings. Think about your AI's configuration settings from the point-of-view of extensibility. If you prefer, call them “declarative extensions.” It's much easier for a customer to change a config option than to write a program using your SDK. Consider elevating and documenting some of the different ways that an AI engine can be configured, to give your customers more capabilities. Yes, this does significantly increase the error handling code and QA testing cycle, so this is a careful consideration: which of your internal config options do you hide or publicize?

Personalization options. When you're deep in the guts of an AI engine, you're thinking about really brain-intensive stuff like vectorizing your tokenizer. Your customer, however, just wants to put their company's name at the top of their AI-generated report. Hence, focus on adding some of the “smaller” functionality that seems trivial to engineers, but is what customers want. Maybe, like the wheel, the report could even have different colors?

And one final point about extensibility: your customers aren't programmers. They don't even know what the acronyms API and SDK stand for. Your customers need an API like a fish needs a bicycle.

Supportability

Supportability refers to making it easier to support your customers in the field. This means making it easier for your customers to solve their problems, and also making it easier for your phone support staff whenever customers call in.

Hey! I have an idea: how about you build an AI chatbot that knows how to debug your software? Umm, sorry, rush of blood to the head.

Some of the areas where the software's design can help both customers and support staff include:

Easy method to print the program's basic configuration, version, and platform details (e.g., either an interactive method or they're logged to a file).
Printing important platform stats (e.g. what CPU/GPU acceleration was found by the program, what is sizeof int, and so on).
Self-check common issues. Don't just check for input file not found. You can also check if it was empty, blanks only, zero words, punctuation only, wrong character encoding, and so on.
Verbose and meaningful error messages. Assume every error message will be seen by customers.
Specific error messages. Lazy coders group two failures: “ERROR: File not found or contained only blanks.” Which is it?
Unique codes in error messages.
Documenting your error messages in public help pages or by making your online support database world-public (gasp!).
Retain copies of all shipped executables, with and without debug information, as part of your build and release process, so you can postmortem debug later.
Have a procedure whereby customers can upload core files to support.
Not crashing in the first place. Fix this by writing perfect code, please.

Why use unique message codes? Adding unique numeric or symbolic codes in your error messages and even in assertions can improve supportability in two ways: self-help and phone support call-ins. A unique code allows customers to find these error codes easily on the internet (i.e. via Google or Bing), either in your website's online help web pages, or on the third-party websites (e.g. Stack Overflow and the like), where other customers have had the same problem.

Note that the codes don't really need to be completely unique, so don't worry if two messages have the same code, unless you're doing internationalization! And certainly, don't agonize over enforcing a huge corporate policy for all teams to use different numbers or prefixes. However, it does help for your unique code to have a prefix indicating which software application it's coming from, because the AI tech stack has quite a lot of components in production, so maybe you need a policy after all (sigh).

Note that supportability is at the tail end of the user experience. It's less important than first impressions: the user interface, installation and the on-boarding experience.

Scalability

Almost this entire treatise is about scalability of your AI engine. Getting that behemoth to run fast is the biggest challenge.

But the actual engine is not the only scalability concern. There's also the server on which you receive and process requests, sending them on to the AI engine, and collating returned results. This is a piece of software, and it could be an off-the-shelf server, or you could write your own in C++ if you like.

User interfaces are another overlooked point in regard to scalability. Not only must the backend be fast, but the user interface layer must handle all of the requirements in a way that people can cope with. The key point is this:

Humans don't scale.

What that means is that making your human user do anything is a hard problem. People cannot read reams of text fast, they cannot click on a thousand warning messages, and they do dumb things in the interface, like re-clicking the “Load” button a hundred times if it's taking too long. The fact that a human is part of the process flow means that you have to make sure that all of your steps are human-friendly. This is an often-underestimated aspect of scalability.

Reusability

In our commercial world it is frequently the cost of our own time that is the greatest. Using our own time efficiently can be more important than writing fast programs. Although improving programming productivity is not our main topic of this book, let us briefly consider a few methods here.

The basic method of reducing time spent programming is to build on the work of others. The use of libraries, including the wide variety of commercially available source code libraries, and the C++ standard library, is a good way to build on the work of others. Since AI is a new area, a literature search of books and research papers can be useful, although it is time-consuming. Hopefully, this book is helpful to you in solving the problems at hand elegantly, efficiently and correctly.

Building on your own work is the other main method of productivity improvement. How often have you coded up a hash table? Have you ever written a sorting routine off the top of your head and then spent hours debugging it? You should perform tasks only once. This doesn’t necessarily mean writing reusable code in its most general sense, but just having the source code available for the most common problems. Modifying code that has already been debugged is far more time-efficient than writing it from scratch. Organizations should seek to create building blocks of code that programmers can use, but you can also do so in your own personal career.

• Next: Chapter 40. Reliability

• Up: Table of Contents

• Buy: Generative AI in C++: Coding Transformers and LLMs