Aussie AI Blog
DIY Memory Safety for C++
-
Nov 1st, 2024
-
by David Spuler, Ph.D.
Why DIY Memory Safety?
Well, because you fix some bugs yourself!
Instead of waiting for compiler vendors to add a "-safe
" option,
or the standards organizations to define "Safe C++" language,
you do it yourself!
These are the main memory safety issues in C++:
- Array bounds writes (buffer overflow writes)
- Array bounds reads (buffer overflow reads)
- Uninitialized memory usage (e.g.,
malloc
,new
, stack buffers). - Use-after-deallocation (i.e., reads or writes after
free
ordelete
). - Double-deallocation (i.e., double-
free
, double-delete
).
There are also other special cases of memory issues:
- File pointer misuses (e.g., double-
fclose
). - Text buffer overruns (e.g., string copy overwrites).
Strategies for DIY Memory Safety
There are two overarching strategies, which are the opposite of each other:
- Make some failures harmless (e.g., get rid of uninitialized memory usage errors by always initializing memory to zero).
- Detect more failures by automatically causing memory problems intentionally.
You can pick one of these and do it for both developer testing and production runs by customers. Or you can vary the idea:
- Detect more bugs in developer mode.
- Make the bugs harmless in production mode.
Why would we do this?
Why not just run AddressSanitizer
or valgrind
?
There's a few reasons:
- The sanitizers run too slow, so we cannot use them all the time, or in production.
- If we implement fast DIY methods, we can use them continually during testing.
- If they're really fast, we might even leave the self-checks in for production runs.
The DIY techniques to detect more bugs inside your own code include:
- Canary regions ("redzones") around memory blocks.
- Poisoning memory inside the blocks with error-triggering values.
- Magic values for statuses stored in buffers.
- Full address tracking (i.e., your own hash table of memory block addresses).
Hence, there are multiple levels of error detection,
ranging from super-fast to almost-as-slow-as-valgrind
.
Making Uninitialized Accesses Harmless
There's another option: just fix it! Instead of trying to find the bugs, just make them disappear by becoming harmless. This is particularly true of the whole class of memory bugs base on uninitialized memory reads.
Why are these even bugs? They seem more like language design failures, with too great a focus on speed. The basic problem with standard C++ and memory initialization is this patchwork of choices:
- Global variables are initialized to zero (hooray!).
- Local
static
local variables are initialized to zero (hooray!). - Stack variables are not initialized to zero (boo!).
- Heap-allocated memory blocks are sometimes initialized to zero (boo!).
For heap memory allocation, we have again a patchwork:
malloc
memory is never initialized.calloc
initializes to zero always.new
of object types relies on constructors to initialize.new
of arrays of objects relies on (many) constructors to initialize.new
of primitive data types does not initialize at all (single variables or arrays).realloc
does not initialize extra memory.
Really we want: change all malloc
and new
calls to calloc
.
Then a whole class of memory safety issues just disappears!
Honestly, rather than detecting unitialized memory uses, shouldn't we just make them a non-issue?
Why would we even bother trying the other strategy of filling uninitilized memory with poisoned values,
when we could just fix it everywhere?
Intercepting C++ Primitives
Here are the basic strategies for how to integrate safety into your code with DIY fixes to your codebase:
- Coding style to require calling safe functions
- Wrapper functions to automatically fix or detect issues.
The way that debug wrapper functions work includes these ideas:
- Macro intercepts of
malloc
,calloc
, andfree
. - Link-time intercepts of
new
anddelete
operators. - Macro intercepts for
strlen
andstrcpy
, etc. - Macro intercepts for
fopen
andfclose
.
We have to be aware of a few issues:
- Macro intercepts won't get any allocations from any less-used primitives we don't intercept.
- Macro intercepts won't see anything in third-party libraries (including Standard C++/STL).
- Link-time
new
anddelete
intercepts will see Standard C++ calls (which can be good or bad). - Link-time
new
anddelete
intercepts must define four versions, two for objects, and two array versions. - There's no simple way to intercept stack-based memory operations for local variables (i.e., from function calls or returns).
- We can macro-intercept stack-based
alloca
calls, but it's hard to know when the function returns. - We can macro-intercept
fopen
type file operations, but it's hard for C++fstream
types.
Overall, the DIY memory safety approach is a patchwork of techniques in itself.
It would be so much easier if the compiler vendors would just add a "-safe
" flag that
does all this!
Related Memory Safety Blog Articles
See also these articles:
- DIY Preventive C++ Memory Safety
- Canary Values & Redzones for Memory-Safe C++
- Use-After-Free Memory Errors in C++
- Array Bounds Violations and Memory Safe C++
- Poisoning Memory Blocks for Safer C++
- Uninitialized Memory Safety in C++
- CUDA C++ Floating Point Exceptions
- Memory Safe C++ Library Functions
- Smart Stack Buffers for Memory Safe C++
- Safe C++ Text Buffers with snprintf
Safe C++ Book
The new Safe C++ coding book by David Spuler:
Get your copy from Amazon: Safe C++: Fixing Memory Safety Issues |
More AI Research Topics
Read more about: