Aussie AI Blog

Uninitialized Memory Safety in C++

November 1st, 2024

by David Spuler, Ph.D.

What are Uninitialized Memory Errors?

There are fundamental problems with memory initialized in C++. This is the standard C++ situation:

Global variables are initialized to zero.
Basic stack local variables are not initialized (buggy!).
Local static variables are initialized to zero.
Heap-allocated variables are sometimes initialized (buggy!).

There are two main strategies for dealing with uninitialized memory:

Detect the problems (e.g., run sanitizers, or use poisoned memory DIY methods), or
Just fix them!

This article is about ways to fix uninitialized memory usage in C++ by DIY initialization-to-zero tricks. Really, there should be a compiler "-safe" option that does this for you, but I'm not aware of a vendor that offers it yet.

Initializing C++ Heap Memory

The situation with memory initialization on the heap in C++ includes:

malloc memory is never initialized.
calloc initializes to zero always (hooray!).
new of object types relies on constructors to initialize.
new of arrays of objects relies on (many) constructors to initialize.
new of primitive data types does not initialize at all (single variables or arrays).
realloc does not initialize extra memory.

A first approach would be to fix these via a coding standard:

Never use malloc; only use calloc.
Never use new for basic data types (e.g. int).

Here's one simple try to automate this:

    #define malloc(n)   calloc(1,(n))

Note that we cannot macro-intercept the new operator because it's not function-like. Further, we can't really institute a coding policy of replacing new with malloc, or delete with free, for any object types, because we need the constructors and destructors to run. We could do that for non-object types, such as basic data type arrays, but it becomes a problematic patchwork in itself.

These are all worthwhile ideas, and will fix some issues. But it doesn't address these uninitialized memory usage errors:

Forgetting to initialize a data member in a constructor.
Stack variables are not addressed.
Less common methods like realloc still have the problem.
Easy to get confused and mix-up the matching free and delete.

Here's another idea for fixing the uninitialized data member problems:

    memset(this, 0, sizeof(*this));  // At start of a constructor

But this is an annoying manual coding intervention, and also doesn't fully handle the issue, because it may get confused about the object size in base versus derived objects.

Intercepting Memory Allocation

A more comprehensive approach is to intercept all of the memory allocation primitives. This is possible in this way:

Macro intercepts of malloc, calloc, and free.
Link-time intercepts of new and delete.

There are also some platform-specific tricks that are neat. Microsoft CRT has a callback mechanism called "hooks" that gets called whenever an allocation occurs. You simply register your own callback functions.

What do we do in these intercepts? The basic idea is:

Change malloc to calloc
Change new and new[] to use calloc.
Change delete and delete[] to use free (avoids mismatches).

Note that there's no problems with constructors and destructors with these intercepts, because they are low-level memory primitives. The new intercepts run before the constructors, and the delete intercepts run after the destructors.

The bugs that we can fix with memory allocation interception include:

Uninitialized heap memory.
Mismatched new/delete with malloc/free.

Macro Intercepts

Here's the basic idea for the macro intercept in a header file:

    #define malloc aussie_malloc

And here's the basic idea for the wrapper function that initializes:

    void* aussie_malloc(int sz)
    {
        #undef calloc  // avoid wrapper
	void* v = calloc(sz, 1);   // Call real calloc
	return v;
    }

Link-Time Intercepts

Here's the basic code to create a global link-time intercepts:

    void* operator new(size_t n)
    {
	#undef calloc // avoid macro intercept
	void* v = calloc(n,1);  // Call calloc (can't use ::new here)
	return v;
    }

    void operator delete(void* v)
    {
        #undef free // avoid macro intercept
	free(v);  // call the real free (note: cannot use delete here!)
    }

And we also need the array versions:

    void* operator new[](size_t n)
    {
	#undef calloc // avoid macro intercept
	void* v = calloc(n,1);  // Call calloc (can't use ::new here)
	return v;
    }

    void operator delete[](void* v)
    {
        #undef free // avoid macro intercept
	free(v);  // call the real free (note: cannot use delete here!)
    }

Advanced Intercepts

To make these ideas as robust as possible, it's necessary to do this work:

Ensure the macro intercept header file is included at the top of every C++ file
Macro intercept headers may be needed at the top of some header files, too (e.g., for inline functions).
Add four C++ link intercept functions: basic and array overrides for new/new[] and delete/delete[] operators.
Intercept less common function primitives: realloc, aligned_alloc, etc.
Examine third-party allocation functions in non-header linked libraries (C++ allocation will be handled automatically by the link-time intercepts, but C-style allocations won't be seen by the macro intercepts.)
Class-specific allocators may bypass this method, or not, depending on how they are implemented.
The Standard C++ library/STL uses a lot of C++ memory allocation, which isn't necessarily a problem, but be aware of it.
Global or static C++ objects of your own or STL global variables will run your link-time intercept functions before the main function starts (again, not usually a problem).
Add an option to compile-out these initializations, such as for use when running sanitizers to detect uninitialized memory errors.

On the other hand, ignore that last point. Why bother ever detecting them now? They're fixed! Just initialize the memory to zero for ever after.

One of the main downsides of the above methods is that these interception methods only work for the heap, and don't help with the stack. We can't use these two approachs of function-like macro interception or link-time interception with local stack varaibles.

Stack Buffer Initialization

Stack variables are still a problem, even if we're intercepting all heap allocation primitives. The simple example of an unitialized stack variable looks like this:

    void my_stack_crash_function()
    {
        char buf[100];
        printf("%s\n", buf);
    }

Fixing stack buffer usage is more difficult than heap memory. We cannot easily intercept when the stack frame is increased on function entry, nor when it is released on function returns. Compiler vendors could do this, but it's hidden from the programmer. There's no way to use macros, and I'm not aware of any callback mechanisms or compiler settings to always zero the stack.

Some of the possible approaches include:

Require all local variables to be initialized.
Coding standard requirement to use memset or other methods after every stack array variable.
Use smart buffer objects instead of local array buffers (i.e., a one-variable wrapper).
Use two-variable methods with smart buffer wrapping objects.
Macro-intercept the alloca dynamic stack block allocation method (but it's rarely used, so this isn't that valuable).

There's no easy method to do this comprehensively for stack memory, and I'm not aware of any compiler flags that guarantee zeroing of the stack frame on function entry.

This is the usual way of requiring an initialization:

    char buf[100] = "";

This is a worthwhile policy, and it fixes the bug in my above code example. The downside is that the whole buffer is not zero.

Here's the manual way:

    char buf[100] = "";
    memset(buf, 0, 100);

And here's the better way with sizeof operator:

    char buf[100] = "";
    memset(buf, 0, sizeof buf);

And we can use a macro to reduce the chances of copy-paste errors:

    #define INIT_MY_BUFFER(buf) memset((buf), 0, sizeof(buf))
    // ....
    char buf[100] = "";
    INIT_MY_BUFFER(buf)

But beware the trap of using sizeof for a parameter rather than a local variable. An array function parameter is a pointer, rather than a real array type, so it'll be the size of a pointer rather than the size of a buffer (i.e., too small). Don't do this:

    void init_my_buffer(char buf[100]) 
    {
        memset(buf, 0, sizeof buf);   // Bug!!
    }

Smart Buffer Classes

Another way to handle stack buffers is to use smart buffer clases. There are two approaches: either replace the buffer with a class object, or use a second variable that is a wrapper or "watcher" of the buffer.

The way to replace the buffer with a class looks like this:

    char buf[100];  // Original
    SmartStackBuffer<100> buf;  // Template-based stack memory

Or you can do this, but it's inefficient because it has to allocate on the heap instead of using stack memory, because it doesn't rely on compile-time sizing of the object:

    SmartStackBuffer buf(100);  // Really it's on the heap

The performance downside is that we've added class overhead to a very primitive type. On the other hand, we can make them all short functions that are declared as inline, so the performance hit is minimal.

I'm not especially fond of the idea of using smart buffer classes just for fixing uninitialized stack memory. After all, the memset ideas above are almost as good, and faster than adding class apparatus around a buffer. However, smart buffer classes are worthwhile because they can also do these things:

Detect buffer overrun writes (after they occur, in the destructor).
Detect some buffer overrun reads/writes as they occur (with extra member functions).
Poison the stack memory on function return (in the destructor), to detect use-after-return.
Track stack memory block addresses in more detail.

In conclusion, the above has presented a variety of methods of making the uninitialized memory read error into a harmless non-issue. But it's a variety of techniques, and a lot of extra work, so it would be better if the compiler vendors did this for us!

Safe C++ Book

The new Safe C++ coding book by David Spuler:

Memory Safety
Rust versus C++
The Safe C++ Standard
Pragmatic Memory Safety

Get your copy from Amazon: Safe C++: Fixing Memory Safety Issues

Aussie AI Blog

Uninitialized Memory Safety in C++

What are Uninitialized Memory Errors?

Initializing C++ Heap Memory

Intercepting Memory Allocation

Macro Intercepts

Link-Time Intercepts

Advanced Intercepts

Stack Buffer Initialization

Smart Buffer Classes

Related Memory Safety Blog Articles

Safe C++ Book

More AI Research Topics

Quick Links

Product

New to Writing?

Writing Styles