Aussie AI Blog

DIY Memory Safety for C++

Nov 1st, 2024

by David Spuler, Ph.D.

Why DIY Memory Safety?

Well, because you fix some bugs yourself! Instead of waiting for compiler vendors to add a "-safe" option, or the standards organizations to define "Safe C++" language, you do it yourself!

These are the main memory safety issues in C++:

Array bounds writes (buffer overflow writes)
Array bounds reads (buffer overflow reads)
Uninitialized memory usage (e.g., malloc, new, stack buffers).
Use-after-deallocation (i.e., reads or writes after free or delete).
Double-deallocation (i.e., double-free, double-delete).

There are also other special cases of memory issues:

File pointer misuses (e.g., double-fclose).
Text buffer overruns (e.g., string copy overwrites).

Strategies for DIY Memory Safety

There are two overarching strategies, which are the opposite of each other:

Make some failures harmless (e.g., get rid of uninitialized memory usage errors by always initializing memory to zero).
Detect more failures by automatically causing memory problems intentionally.

You can pick one of these and do it for both developer testing and production runs by customers. Or you can vary the idea:

Detect more bugs in developer mode.
Make the bugs harmless in production mode.

Why would we do this? Why not just run AddressSanitizer or valgrind? There's a few reasons:

The sanitizers run too slow, so we cannot use them all the time, or in production.
If we implement fast DIY methods, we can use them continually during testing.
If they're really fast, we might even leave the self-checks in for production runs.

The DIY techniques to detect more bugs inside your own code include:

Canary regions ("redzones") around memory blocks.
Poisoning memory inside the blocks with error-triggering values.
Magic values for statuses stored in buffers.
Full address tracking (i.e., your own hash table of memory block addresses).

Hence, there are multiple levels of error detection, ranging from super-fast to almost-as-slow-as-valgrind.

Making Uninitialized Accesses Harmless

There's another option: just fix it! Instead of trying to find the bugs, just make them disappear by becoming harmless. This is particularly true of the whole class of memory bugs base on uninitialized memory reads.

Why are these even bugs? They seem more like language design failures, with too great a focus on speed. The basic problem with standard C++ and memory initialization is this patchwork of choices:

Global variables are initialized to zero (hooray!).
Local static local variables are initialized to zero (hooray!).
Stack variables are not initialized to zero (boo!).
Heap-allocated memory blocks are sometimes initialized to zero (boo!).

For heap memory allocation, we have again a patchwork:

malloc memory is never initialized.
calloc initializes to zero always.
new of object types relies on constructors to initialize.
new of arrays of objects relies on (many) constructors to initialize.
new of primitive data types does not initialize at all (single variables or arrays).
realloc does not initialize extra memory.

Really we want: change all malloc and new calls to calloc. Then a whole class of memory safety issues just disappears! Honestly, rather than detecting unitialized memory uses, shouldn't we just make them a non-issue? Why would we even bother trying the other strategy of filling uninitilized memory with poisoned values, when we could just fix it everywhere?

Intercepting C++ Primitives

Here are the basic strategies for how to integrate safety into your code with DIY fixes to your codebase:

Coding style to require calling safe functions
Wrapper functions to automatically fix or detect issues.

The way that debug wrapper functions work includes these ideas:

Macro intercepts of malloc, calloc, and free.
Link-time intercepts of new and delete operators.
Macro intercepts for strlen and strcpy, etc.
Macro intercepts for fopen and fclose.

We have to be aware of a few issues:

Macro intercepts won't get any allocations from any less-used primitives we don't intercept.
Macro intercepts won't see anything in third-party libraries (including Standard C++/STL).
Link-time new and delete intercepts will see Standard C++ calls (which can be good or bad).
Link-time new and delete intercepts must define four versions, two for objects, and two array versions.
There's no simple way to intercept stack-based memory operations for local variables (i.e., from function calls or returns).
We can macro-intercept stack-based alloca calls, but it's hard to know when the function returns.
We can macro-intercept fopen type file operations, but it's hard for C++ fstream types.

Overall, the DIY memory safety approach is a patchwork of techniques in itself. It would be so much easier if the compiler vendors would just add a "-safe" flag that does all this!

Safe C++ Book

The new Safe C++ coding book by David Spuler:

Memory Safety
Rust versus C++
The Safe C++ Standard
Pragmatic Memory Safety

Get your copy from Amazon: Safe C++: Fixing Memory Safety Issues

Aussie AI Blog

DIY Memory Safety for C++

Why DIY Memory Safety?

Strategies for DIY Memory Safety

Making Uninitialized Accesses Harmless

Intercepting C++ Primitives

Related Memory Safety Blog Articles

Safe C++ Book

More AI Research Topics

Quick Links

Product

New to Writing?

Writing Styles