Aussie AI Blog
Poisoning Memory Blocks for C++ Safety
-
November 1st, 2024
-
by David Spuler, Ph.D.
What is Poisoned Memory?
Poisoning memory is a technique where memory blocks are intentionally set to non-zery bytes, hoping to provide a failure if this memory block is used. The general breakdown of DIY memory safety C++ techniques includes:
- Canary regions ("redzones") around memory blocks.
- Poisoned memory blocks inside the memory block.
- Magic values stored at the start of a block.
Hence, poisoned memory aims to detect some of these memory failures:
- Uninitialized allocated memory use (e.g.,
malloc
,new
). - Uninitialized stack memory buffer usage.
- Use-after-
free
heap memory. - Use-after-
delete
heap memory. - Use-after-
return
for stack memory blocks
Hence, here are some of the places where we want to poison memory blocks:
malloc
block (uninitialized heap memory).new
ornew[]
heap block (uninitialized heap).free
(de-allocated heap block).delete
ordelete[]
(de-allocated heap block).
Those above examples are for the heap, but we also care about stack memory, and ideally we also want to poison:
- Local buffer variables on entry to a function (uninitialized stack memory).
- Returning from a function with a local buffer variable (invalid memory after stack unwind).
Note that we don't need poisoning for these cases:
- Global variable (already initialized to zero).
- Local
static
variable (also zeroed in C++).
And this makes a good point: if the C++ compiler auto-zeroed all the allocated and stack memory,
we wouldn't have to worry about this.
Hence, I want a "-safe
" flag for my compiler.
Marking Poisoned Memory Blocks
The simplest way to "poison" a block with bytes is simply to put a special value into every byte:
char buf[100]; memset(buf, '@', sizeof buf);
Here is a general utility routine to poison a buffer more elegantly. Note that this code does not poison the final byte in the buffer, so that any inadvertant use of the string in the buffer won't actually go beyond the buffer. Whether you do or don't want this to crash depends on context.
inline void aussie_poison_buffer(char* s, int bufsize, char magicchar /* = '@'*/) { // PURPOSE: Our buffer is now unused, mark it with poison bytes. // Put some very visible magic letters e.g. @@@@@ // They can be tested in other use of the buffer, // .. and also make any errors visible in output... memset(s, bufsize - 1, magicchar); // Clear all but the last... // Note: null byte after many @@@'s means it won't crash on strlen/etc. s[bufsize - 1] = 0; // Put a null byte at the very end for safety }
I like the use of multiple @ characters as a poisoned value, because it's highly visible in a printout or HTML page. It's also possible to quickly test for a likely poisoned address:
bool is_poisoned = s[0] == '@' && s[1] == '@' && s[2] == '@';
We can make this into a macro:
#define is_poisoned(s) ((s)[0] == '@' && (s)[1] == '@' && (s)[2] == '@')
The preprocessor macro version really needs all those parentheses to avoid operator
precedence errors,
but also isn't fully safe against any side-effects in the argument expression.
Safer is to use a modern inline
function version:
inline bool is_poisoned(const char *s) { return s[0] == '@' && s[1] == '@' && s[2] == '@'; }
This example is looking for three @'s in a row. It's up to you whether you want to check for 1, 2, 3, or 4 bytes in a row. Fewer means more false positives, and one @ is probably too few, as it will get a false positive for every email address or social media handle in your input text.
However, you can also use other poison byte values, such as (char)1
or (char)127
or some other escape.
I prefer to use the range 1..127 because you needn't worry about signed
versus unsigned char
.
Using an explicit type cast of the byte
is annoying but omitting the cast is non-portable across different compilers, too.
Note also that most 128..255 values are used in valid UTF8 for European or DBCS languages
(or emojis!), but
there are a few bytes that are not valid UTF8 (in which case, you have to be careful to cast to unsigned char
when testing).
Obviously, you cannot use the null byte or any commonly used character as the poison marker.
Also, you would usually repeat the same byte in sequence, which is fast to set using memset
.
However, if you really prefer slower code with fewer false positives,
you can use alternating byte patterns or other variations.
Macro Intercepts of malloc and free
The simplest method of poisoning newly allocated blocks with malloc
is with preprocessor macro intercepts.
Note that we don't want to poison calloc
, because it's already initialized.
Here's the basic idea for the macro intercept in a header file:
#define malloc aussie_malloc
And here's the basic idea for the wrapper function that initializes:
void* aussie_malloc(int sz) { #undef malloc // avoid wrapper void* v = malloc(sz, 1); // Call real malloc if (v) memset(v, '@', sz); // Poison return v; }
Link-Time Intercepts of new and delete
The C++ memory allocation operators cannot be macro-intercepted
because they are not a function-like syntax.
However, link-time interception is a standard feature of C++
that has been supported for decades.
Here's the basic code to create a global link-time intercept for new
,
simply by defining your own version:
void* operator new(size_t n) { #undef malloc // avoid macro intercept void* v = malloc(n); // Call malloc (can't use ::new here) if (v) memset(v, '@', n); // Poison return v; }
Note that you need to exactly match the types, with a size_t
parameter and a
void*
return type.
And we also need to intercept delete
, so that we can change it to free
;
otherwise there is a mismatch error.
void operator delete(void* v) { #undef free // avoid macro intercept free(v); // call the real free (note: cannot use delete here!) }
And we also need the pair of intercepted array allocate and deallocate versions:
void* operator new[](size_t n) { #undef malloc // avoid macro intercept void* v = malloc(n1); // Call malloc (can't use ::new here) if (v) memset(v, '@', n); // Poison return v; } void operator delete[](void* v) { #undef free // avoid macro intercept free(v); // call the real free (note: cannot use delete here!) }
Poisoning Deallocated Memory Blocks
Note that the above macro intercept of free and link-time intercept of delete are not really doing anything. There's no poisoning, and it just calls another deallocation routine.
The main problem is that we don't know the size of the block being deallocate, so how can we poison it? There's no standard C++ function to get the size of a memory block.
However, non-portable code to the rescue! The methods to get the size of a block from its address include:
_msize
— Windows MSVS version.malloc_usable_size
— GCC version.malloc_size
— MacOS version.
So, here's what a semi-portable block size function would look like:
int size_of_block(void *addr) { #if DOS || MSVS || _MSC_VER return _msize(addr); #elif LINUX || UNIX || GCC return malloc_usable_size(addr); #elif MACOS return malloc_size(addr); #else #error What is this platform? #endif }
Note that the _msize
function actually fails with a runtime exception if the
address is not the start of an allocated block (e.g., the middle of an allocated block, or a non-heap address).
However, we can certainly use this in a deallocation sequence, which would crash anyway
if we passed it a non-block address.
Hence, we can use this idea to poison de-allocated memory in free
using a macro interception:
#define free aussie_free
And here's the basic definition for the wrapper function that poisons freed memory:
void aussie_free(void *v) { int sz = size_of_block(v); memset(v, '@', sz); #undef free // avoid macro intercept free(v); // call the real free }
And here is the C++ delete
operator version:
void operator delete(void* v) { int sz = size_of_block(v); memset(v, '@', sz); #undef free // avoid macro intercept free(v); // call the real free (note: cannot use delete here!) }
Poisoning Stack Buffer Memory
Stack variables are still a problem, even if we're intercepting all heap allocation primitives. The simple example of an unitialized stack variable looks like this:
void my_stack_crash_function() { char buf[100]; std::cerr << buf << std::endl; }
Fixing stack buffer usage is more difficult than heap memory. We cannot easily intercept when the stack frame is increased on function entry, nor when it is released on function returns. Compiler vendors could do this, but it's hidden from the programmer. There's no way to use macros, and I'm not aware of any callback mechanisms or compiler settings to control the memory on the stack.
Some of the possible approaches to poisoning uninitialized stack variables include:
- Explicit calls to
memset
- Use smart buffer objects instead of local array buffers (i.e., a one-variable wrapper).
- Use two-variable methods with smart buffer wrapping objects.
- Macro-intercept the
alloca
dynamic stack block allocation method (but it's rarely used, so this isn't that valuable).
This is the usual way of requiring an initialization, which obviates the need to do poisoning completely (except see below about partial buffers):
char buf[100] = "";
This is a worthwhile policy, and it fixes the bug in my above code example. The downside is that the whole buffer is not zero.
Here's the manual way to poison a stack variable:
char buf[100] = ""; memset(buf + 1, '@', 100 - 1);
And here's the slightly improved way of poisoning with sizeof
operator:
char buf[100] = ""; memset(buf + 1, '@', sizeof buf - 1);
And we can use a macro to reduce the chances of copy-paste errors:
#define POISON_STACK_BUFFER(buf) memset((buf)+1, '@', sizeof(buf)-1) // .... char buf[100] = ""; POISON_STACK_BUFFER(buf)
But beware the trap of using sizeof
on a parameter of a function,
rather than a local variable.
An array function parameter is a pointer,
rather than a real array type, the result of sizeof
is the 4 or 8 byte size of a pointer rather than
the size of array buffer (i.e., too small).
Don't do this:
void poison_my_buffer(char buf[100]) { memset(buf, '@', sizeof buf); // Bug with sizeof! }
The above methods are fine for poisoning the uninitialized part of a stack buffer, to detect a future use of uninitialized stack memory from the poisoned characters. But this doesn't poison the stack memory once the function returns. Instead, to achieve this, we need to use a smart buffer class.
Smart Stack Buffer Classes
Another way to handle stack buffers, with poisonig both before usage and after function return, is to use smart buffer clases. There are two approaches:
(a) One-variable method replacing the buffer with a class object, or
(b) Two-variable method with a second variable that is a wrapper or "watcher" object of the buffer.
The way to replace the buffer with a class looks like this:
char buf[100]; // Original SmartStackBuffer<100> buf; // Template-based stack memory
Or you can do this, but it's inefficient because it has to allocate on the heap instead of using stack memory, because it doesn't rely on compile-time sizing of the object:
SmartStackBuffer buf(100); // Really it's on the heap
The two-variable method looks like this:
char buf[100]; SmartStackWrapper bufwrap(buf, sizeof buf);
In this two-variable method, we use the character array buffer as usual. But the extra smart stack wrapper object does some extra work at the start, and at the end in its destructor.
The performance downside
of the one-variable or two-variable smart buffer approach
is that we've added class overhead
to a very primitive type.
On the other hand, we can make them all short functions
that are declared as inline
,
so the performance hit is minimal.
The overhead of smart buffer classes is more worthwhile when used to do a variety of checks. Using them on stack buffers can do all of these things (some of which are shown in other articles):
- Poison the stack buffer on entry to catch uninitialized memory usage.
- Poison the unused portion of a partially-filled buffer.
- Detect buffer overrun writes (after they occur, in the destructor).
- Detect some buffer overrun reads/writes as they occur (with extra member functions).
- Poison the stack memory on function return (in the destructor), to detect use-after-return.
- Track stack memory block addresses in more detail.
Stack Buffer Destructors
The neatest thing about smart stack buffer objects it that the destructor runs whenever it goes out of scope, at the end of a code block or the end of the function. Hence, we don't need to do anything extra to detect when the stack has unwound and the buffer is no longer valid memory.
Here's an example of the two-variable class wrapper method, which works like this:
char buf[1000]; SafeBufferWrap bufwrap(buf, sizeof buf);
Here's the code and note that the stack object wrapper has both types of poisoning and also buffer overrun post-detection:
class SafeBufferWrap { // Safe wrapper object for char[] buffers... const char magicbyte = '@'; private: char* m_string; // Address this buffer wrapper is tracking int m_bufsize; // Number of bytes allocated (stack or wherever) public: SafeBufferWrap() = delete; // disallow without a string... SafeBufferWrap(char* addr, int bufsize) { // Initialize ASSERT_RETURN(addr != NULL); m_string = addr; m_bufsize = bufsize; memset(m_string, magicbyte/*'@'*/, m_bufsize); // Poison! // Set the overrun detection sentinel byte to zero m_string[m_bufsize - 1] = 0; } void check_overflow() { // Check for buffer overrun... (at some prior time) if (m_string[m_bufsize - 1] != 0) { // Detected overflow (but don't know when) AUSSIE_ERROR("AUS050", "ERROR: SafeBufferWrap buffer overrun previously occurred"); } } ~SafeBufferWrap() { // Destructor check_overflow(); memset(m_string, '@', m_bufsize); // Poison on stack unwind } char* string() { return m_string; } int size() { return m_bufsize; } };
Handling False Positives
The idea with the above poison method is three @'s in a row indicates poisoned memory,
as defined by the "is_poisoned
" function above.
If you prefer, it could be two or four characters.
Regardless of the length, you'll get a false positive if any input text
contains that sequence.
This is a "false positive" where an error is detected that is not real.
How to handle false positives?
The simplest idea is to ignore them, since the poisoning technique is
mainly for use in development and testing phases, rather than in production.
It's better to suppress false positives, as they may otherwise hide real errors.
For example, if your regression tests are somehow
triggering a false positive error on every nightly build,
add some code to suppress it.
You can build a suppression method into your error reporting mechanism, such
as simply searching for other string patterns related to the error,
or by suppressing it based on context values found via __func__
, __FILE__
or
__LINE__
.
Poisoning Partial Memory Buffers
It is useful to detect errors where there are "semantically unusable" memory bytes, even where the memory is still officially safe in C++ terms. A good example is copying a string into a larger buffer.
char buf[100] = ""; snprintf(buf, 100, "abc");
There is nothing wrong with the start of the buffer, and it has been safely copied
using snprintf
.
However, any
use of the end of the buffer beyond the string stored there has no valid meaning.
In this case, it's also uninitialized stack memory, but even if it was a fully-initialized global buffer,
any use of that memory is still suspect.
Hence, we want to mark indices 4..99 as invalid memory. There's no standard way to do this in C++, but we can "poison" this area with special byte values. Here is the hand-coded version to do that with the above buffer:
int len = (int)strlen(buf); memset(buf + len + 1, 100 - len - 1, '@');
Obviously, you can generalize that into a useful utility function.
void aussie_poison_unused_part_buffer(char* s, int maxbufsize, char magicchar /* = '@'*/) { // PURPOSE: Our buffer contains a string, but we want to poison unused bytes at the end.. // ..This detects uses of the end of a buffer, no longer actually part of the string. // ..For example, this occurs if we had a long string in the buffer, but now it's shorter. // ..e.g. we are removing blocks of text from a long document (shortening it) // ..e.g. re-using big buffer multiple times for different strings, some long, some short if (!s) { AUSSIE_ASSERT(s != NULL); return; } int len = (int)strlen(s); int validbytes = len + 1; // add 1 for the null byte if (validbytes > maxbufsize) { // Too many bytes (maybe it has overrun the buffer already?) AUSSIE_ASSERT(validbytes <= maxbufsize); return; // avoid this overrun! } int remaining_bytes = maxbufsize - validbytes; if (remaining_bytes == 0) return; // Nothing to do (buffer already full) if (remaining_bytes > 1) { // Poison bytes ... except last byte memset(s + validbytes, magicchar, remaining_bytes - 1); } s[maxbufsize - 1] = 0; // Null at very end for safety (e.g. strlen won't crash later) }
Advanced Poisoning
But wait, there's more! If you really want the poisoning approach to be complete, there are various ways to uplevel:
- Add automated checks for poisoned addresses via the "is_poisoned" function
in intercepts of functions such as:
strlen
,strcmp
,strcpy
, etc. - Ensure the macro intercept header file is at the top of each C++ source file (after the system headers, but before any application headers).
- Either include the macro intercept at the top of your header files,
or ensure there's no
malloc
orfree
used ininline
functions in header files. - Macro-intercept other functions (e.g.,
realloc
,alloca
). - Linked third-party libraries will not get macro-intercepted, but will still work for link-time interception.
- Header-only third-party libraries might need review of their memory allocation usage (e.g., maybe add your macro intercept header file before including them, or maybe not).
- Any other custom allocators, such as class-specific ones, may need changes for this approach.
- Add a compile-out preprocessor macro, because you'll need to remove some of your poisonings
when using a sanitizer or
valgrind
. - Detect whether a sanitizer is running (e.g., the
RUNNING_ON_VALGRIND
variable) and modify the approach (e.g., don't use your own redzones, because these become valid memory in the silicon mind of the sanitizer). - Call the sanitizer APIs to poison memory blocks when running in a sanitizer mode (e.g., not usually necessary for heap or stack memory block issues, but useful for partially empty buffers).
Poisoning API Usage
One advanced usage is modifying your approach if a sanitizer such as ASan or Valgrind is running. You can detect these with features such as:
RUNNING_ON_VALGRIND
— true if Valgrind is currently running.__SANITIZE_ADDRESS__
preprocessor macro.__has_feature(address_sanitizer)
preprocessor test.
The ASan examples above are preprocessor constructs that detect whether GCC
is compiling the C++ in ASan mode (e.g., the -fsanitize=address
option).
This isn't exactly the same thing as whether ASan is currently active at runtime,
but it's a good proxy.
The AddressSanitizer tool has macros whereby you can poison custom memory blocks, so that ASan will treat their use as an error. Valgrind also has a much larger range of functions, from custom allocator controls to explicit "poisoning" calls.
ASAN poisoning API. The usage of the AddressSanitizer macros to control poisoning of memory blocks looks like:
ASAN_POISON_MEMORY_REGION(addr, size) ASAN_UNPOISON_MEMORY_REGION(addr, size)
There is also a runtime flag "allow_user_poisoning
" that controls these,
and can remove them for production code.
Valgrind API. There's also a lot of API macros for fine-grained control of memory blocks in Valgrind. These macros can be useful:
VALGRIND_MALLOCLIKE_BLOCK
— mark a block as if it's newly allocated.VALGRIND_FREELIKE_BLOCK
— mark as if this block is now freed.VALGRIND_MAKE_MEM_UNDEFINED
— data in this memory is undefined.VALGRIND_MAKE_MEM_UNDEFINED
— reset memory to be defined.VALGRIND_MAKE_MEM_NOACCESS
— any use of this memory is an error.
The VALGRIND_MAKE_MEM_NOACCESS
macro can be used to mark redzones
or other poisoned regions,
and VALGRIND_MAKE_MEM_UNDEFINED
can mark memory as uninitialized.
What are these used for? Manually marking of memory blocks as poisoned or freed can be useful to manage the status of memory for ASan, including situations such as:
- Partial string buffer poisoning (as shown above).
- Class-specific memory allocators that pre-allocate a large memory block using the system allocator, but it is then later "allocated" in small chunks.
- Data structures with initialized memory allocated physically via
calloc
, but where the memory is not all used at a logical level. - Custom memory allocators with fine-grained control over the memory blocks.
- Implementing a "never-free" or "delayed-free" memory management method for better detection of use-after-free errors, thereby getting more warnings from ASan about uses of the pseudo-deallocated memory blocks, even if they haven't really been freed yet.
- High-level logic whereby a memory block is known to be no-longer-used by the program, or otherwise invalid, but is still valid from a low-level system allocator perspective.
Final Thoughts
In conclusion, the above has presented a variety of methods
of poisoning both the uninitialized memory on the heap or stack,
de-allocated heap memory, and unwound stack memory.
The goal is to detect reads of uninitialized or invalid already-freed memory blocks.
The above article shows a variety of techniques, and these are a lot of extra work
for the programmer.
It would be better if the compiler vendors did this for us!
Hence, I vote for a "-poison
" option in the next compiler release.
Related Memory Safety Blog Articles
See also these articles:
- DIY Preventive C++ Memory Safety
- Canary Values & Redzones for Memory-Safe C++
- Use-After-Free Memory Errors in C++
- Array Bounds Violations and Memory Safe C++
- Uninitialized Memory Safety in C++
- DIY Memory Safety in C++
- CUDA C++ Floating Point Exceptions
- Memory Safe C++ Library Functions
- Smart Stack Buffers for Memory Safe C++
- Safe C++ Text Buffers with snprintf
Safe C++ Book
The new Safe C++ coding book by David Spuler:
Get your copy from Amazon: Safe C++: Fixing Memory Safety Issues |
More AI Research Topics
Read more about: