Aussie AI Blog
Array Bounds Violations and Memory Safety in C++
-
November 2nd, 2024
-
by David Spuler, Ph.D.
Types of Array Bounds Violations
These are memory errors where an array or buffer has its memory block bounds exceeded.
For an array block of memory arr
of size N
,
the valid range for the array index is 0
..N-1
.
Array bounds violations come in two types:
- Overflow — accessing
arr[N]
or largerN
. - Underflow — accessing
arr[-1]
or earlier.
Each of these two types of bounds violations also has two subtypes:
- Write — modify the out-of-bounds memory.
- Read — get a value from out-of-bounds memory.
All types of memory blocks can be affected by overflows or underflows:
- Global variables — these are stored in global memory.
- C-style allocated memory —
malloc
andcalloc
allocations. - C++-style allocated memory —
new
andnew[]
memory. - Local variables in functions on the stack — such as string buffer variables.
- Local
static
variables in functions — in global memory, not the stack. - Class data members — in whatever type of memory that contains the object (i.e., any).
- Class
static
data members — these are in global memory. - Read-only memory regions — string literals and numeric constants, and simple
const
variables.
There are a variety of lesser-known memory allocation functions, and also platform-specific functions that allocate memory:
realloc
— when it increases memory block size or moves the block.aligned_alloc
— allocation with address alignment restrictions.cudaMalloc
— CUDA C++ GPU memory allocation.alloca
memory — dynamically allocated stack memory.sbrk
— lower-level memory allocation controls.
Detection Methods
The methods to detect memory errors in general, including array bounds violations, include:
- Sanitizer runtime tools — e.g.,
valgrind
andAddressSanitizer
. - DIY methods — as described in this article.
The main advantage of the sanitizer tools is that they catch the errors immediately, as they happen. Unfortunately, they're too slow to run all the time, or in production, but still should be running every night with all the automated regression tests.
The DIY methods aim to be much faster, but tend to only catch buffer overruns after they have occurred, so it is not always clear when the buffer was previously overrun or what code caused it. However, some DIY methods can catch and prevent buffer overruns beforehand. The various DIY methods range in efficiency from adding only a single byte test (very fast) to a fully instrumented "memory wrapper library" that is as slow as the sanitizers.
Sanitizers typically detect multiple types of errors in different memory.
However, valgrind
notably does not check stack buffers.
The DIY methods for array overruns can also be combined with other techniques:
- Uninitialized memory read detection.
- Poisoned memory blocks usage.
- Basic parameter validation (e.g., deallocation of a null address).
The main techniques for DIY buffer overflow techniques include:
- Canary regions ("redzones") of extra bytes around the memory block.
- Explicit checking of sizes and addresses at intercepted points.
- Checking the last byte of a text buffer is the null byte.
- Checking the last element of a non-text buffer (e.g.,
float
array).
The remainder of this article is about text buffers and detecting overruns without any canary redzone areas. We'll discuss using canary/redzone memory regions and non-text buffers in the next article.
Text Buffer Overruns
The classic case of a text buffer overrun occurs on the stack:
char buf[3]; strcpy(buf, "abcd");
The typical method to avoid such overflows is the "safe" string functions:
strncpy
(with a big proviso!)snprintf
strcpy_s
There are a few disadvantages of these functions. Firstly, strncpy
has issues (discussed below).
These functions also have the problem that they silently truncate the string, without giving
the programmer a way to detect that an overflow has occurred.
No error messages!
strncpy problems
The funny thing is that strncpy
in standard C or C++ is literally the worst function.
Sure, if the string is too long, it will avoid a buffer overrun right there.
But it fills the whole buffer,
which then leaves the string without a null byte at the end.
Any subsequent use of the string (e.g., strlen
) will be a buffer overrun.
The solution is to manually add your own null byte:
strncpy(buf, 3, s); buf[3 - 1] = 0; // ensure null
The better way is to declare your own strncpy
safety wrapper:
inline char *safe_strncpy(char *dest, char *src, int n) { #undef strncpy // remove wrapper char *s = strncpy(dest, src, n); dest[n - 1] = 0; // ensure nulled return s; }
Then you should macro intercept all calls to strncpy
, or otherwise ban them.
#define strncpy safe_strncpy
A more advanced version of the safety wrapper would check for null parameter values.
We'd also like to check the last byte was already null at the start,
and that any canary redzones have not been changed by a prior buffer overrun.
However, in the general case of intercepts,
we cannot necessarily be sure that strncpy
is occurring at the start
of the buffer, or that the size is that of the whole buffer.
Checking the Last Byte of Text Buffers
This method is a buffer overrun detection method that uses the very last byte of a text string buffer. It only works for text strings, not for other types of arrays. The advantages include:
- No extra memory overhead
- Fast single-byte tests
The main disadvantages of this quick approach:
- After-the-fact detection (does not prevent the overrun).
- No information on when and where it was overrun.
Here's how it works. Let's assume that we have a simple buffer variable on the stack:
char buf[100];
Slightly better is to initialize it:
char buf[100] = "";
This avoids uninitialized memory usage, with a null at the first byte (and 99 uninitialized characters), but this variable still has no overflow checking.
The trick is to think about the last byte, not the first. Now, if we have such a text buffer that contains strings, then the last byte in the buffer is either:
(a) the null byte (for a full buffer), or
(b) unused (for a shorter string).
We'd like to use this byte for overflow checking, but in the latter case, it could have a random value. Hence, the insightful trick is to always set the last byte to zero right at the start, even if we aren't necessarily going to use it. Then we can be sure it must be zero at all times when using the buffer, or else there's been a buffer overflow (at some time previously). We can be sure the last byte is zero for global text buffers, but not for stack variables or allocated buffers, so we have to add our own "set" method near the buffer initialization.
With this idea, we can add some checks:
char buf[100]; DEBUG_SET_BUFFER_OVERRUN(buf, 100); // Set zero // ... rest of function DEBUG_CHECK_BUFFER_OVERRUN(buf, 100); // Check zero
The macros are quite small and efficient, only setting and checking a single byte of the array:
#define DEBUG_SET_BUFFER_OVERRUN(buf, len) ( \ ((buf)[(len)-1] = 0)) #define DEBUG_CHECK_BUFFER_OVERRUN(buf,len) \ (( (buf)[(len)-1] == 0) ? \ true /*ok*/ :\ debug_buffer_overrun_failed((buf),(len)))
This idea will work with any kind of memory block, where we know the size of the buffer, whether local, global, or heap memory. If you have a class object with a text buffer data member, then add the "set" macro in the constructor, and the "check" macro in the destructor (and optionally also other places along the way).
We can clean this up a little with the sizeof
operator.
But be aware that there's an insidious sizeof
error if the buffer is ever a function parameter,
in which case it returns the size of a pointer (too small).
Here's the version:
char buf[100]; DEBUG_SET_BUFFER_OVERRUN(buf, sizeof buf); // Set zero // ... rest of function DEBUG_CHECK_BUFFER_OVERRUN(buf, sizeof buf); // Check zero
Note that we can actually use the check as often as we like, at any point where we think that a buffer overflow might have occurred.
char buf[100]; DEBUG_SET_BUFFER_OVERRUN(buf, sizeof buf); // Set zero // ... some of the function DEBUG_CHECK_BUFFER_OVERRUN(buf, sizeof buf); // Check mid-point // ... rest of the function DEBUG_CHECK_BUFFER_OVERRUN(buf, sizeof buf); // Check final
We can hide the sizeof
operator behind a macro.
Here are some macros based on this idea:
#define DEBUG_SET_BUFFER_OVERRUN(buf) ( \ ((buf)[(sizeof(buf))-1] = 0)) #define DEBUG_CHECK_BUFFER_OVERRUN(buf) \ (( (buf)[(sizeof(buf))-1] == 0) ? \ true /*ok*/ :\ debug_buffer_overrun_failed((buf),(sizeof(buf))))
If you don't like typing, you can do this:
#define SET DEBUG_SET_BUFFER_OVERRUN #define CHK DEBUG_CHECK_BUFFER_OVERRUN
Note that sizeof
only works on local variables and global variables,
but not for heap buffers or array function parameters.
Hence, you can choose between both versions,
and prefer the additional macro version with a separate length parameter in some cases,
where the caller can provide the memory block size.
Finally, note that we need to change the last byte, so this doesn't work for read-only constants
(e.g., string literals), but they can't really have buffer overruns anyway.
Smart Buffer Variable with Bounds Checking
One way to add bounds checking of text buffers is to replace a simple char
buffer
with a smart buffer object.
This is a "one-variable" solution because we change the original buffer to our smart object.
The simple code is this:
char buf[100];
We write this instead:
SafeStackBuf<100> buf;
The full class
code is a template
with an integer parameter, like this:
templateclass SafeStackBuf { // Templated byte size to wrap a stack buffer... const char magicbyte = '@'; static_assert(bufsize > 0); private: char m_buffer[bufsize]; // The stack buffer (memory is wherever the object is) private: SafeStackBuf(const SafeStackBuf&) = delete; // disallow void operator=(const SafeStackBuf&) = delete; // disallow public: SafeStackBuf() { // Constructor m_buffer[0] = 0; // Ensure initialized... (first byte only) // Mark end of buffer for later overrun detection.. m_buffer[bufsize - 1] = 0; // Sentinel byte for overrun detection } void check_overflow() { // Check for buffer overrun... (at some prior time) if (m_buffer[bufsize - 1] != 0) { // Sentinel byte changed? // Overrun detected (at some previous time) AUSSIE_ERROR("AUS050", "ERROR: SafeStackBuf buffer overrun previously occurred"); } } ~SafeStackBuf() { // Destructor check_overflow(); } operator char* () { return m_buffer; } // Type conversion to "char*" type... };
Note that we defined a type conversion operator so that this smart buffer variable can hopefully be used without changing much of the other code in the function. In theory, we should be able to compile-out the checking for production mode with this style (and no other code changes):
#if DEBUG SafeStackBuf<100> buf; #else char buf[100]; #endif
An important advantage is that there's literally no extra memory overhead.
We've simply put the original text buffer
inside an object framework, but it's the same size.
As for runtime overhead, there's the extra "set" of the last byte in the constructor,
and the "check" in the destructor,
but these are inline
functions, and should be the same as using the macro versions earlier.
Two-Variable Smart Buffer Wrapper Class
The two-variable version of using a smart buffer object puts the bounds overflow checking on the "outside" in a diferent object. This extra object does the "set" in its constructor (clearing the last byte to zero), and the "check" in its destructor. The way to set up the bounds overflow detection works looks like this with two separate variables:
char buf[100]; SafeBufferWrap bufwrap(buf, sizeof buf);
This is more elegant that the original macro versions, in that you don't need to add an explicit "check" call at the end of the function, because the wrapper object's destructor is automatically called when it goes out of scope. The wrapper object does the bounds overflow detection in the destructor, just before it disappears.
One of the advantages of this two-variable approach over the one-variable smart buffer is that we can easily compile-out the bounds checking object for production. The checking is not inherent to the buffer itself, and we can do this style and the overhead of the bounds checking completely disappears:
char buf[100]; #if DEBUG SafeBufferWrap bufwrap(buf, sizeof buf); #endif
Another advantage of the two-variable approach is that the original variable is unchanged, so there is no need to fuss about whether type conversions are working. The original variable uses the original code. No problems at all!
Here's what the full class
looks like to implement this wrapper.
Note that it's not a template
.
class SafeBufferWrap { // Safe wrapper object for char[] buffers... const char magicbyte = '@'; private: char* m_string; // Address this buffer wrapper is tracking int m_bufsize; // Number of bytes allocated (stack or wherever) public: SafeBufferWrap() = delete; // disallow without a string... SafeBufferWrap(char* addr, int bufsize) { // Initialize ASSERT_RETURN(addr != NULL); m_string = addr; m_bufsize = bufsize; // Set the overrun detection sentinel byte to zero m_string[m_bufsize - 1] = 0; } void check_overflow() { // Check for buffer overrun (at a prior time) if (m_string[m_bufsize - 1] != 0) { // Detected overflow (but don't know when) AUSSIE_ERROR("AUS050", "ERROR: SafeBufferWrap buffer overrun previously occurred"); } } ~SafeBufferWrap() { // Destructor check_overflow(); } char* string() { return m_string; } int size() { return m_bufsize; } };
The downside to this approach, when compared to the simple "set" and "check" macro versions, is two-fold:
- Memory overhead from the extra object (a pointer and an integer).
- Runtime overhead from storing data in the extra object (a couple extra assignments).
Note that there's nothing requiring this to be used on a stack buffer. Hence, you can use a wrapper object for allocated memory blocks, global arrays, or any other memory object, provided you supply the correct buffer size. The last byte has to be writeable, so this doesn't work on read-only memory.
Furthermore, this approach can be used in other ways, because the wrapper object does not need the same lifetime as the original buffer object. You can use a wrapper object multiple times for the same buffer, and you can also combine this approach with other calls to the earlier macros that check that the last byte is null. Too many options!
Related Memory Safety Blog Articles
See also these articles:
- DIY Preventive C++ Memory Safety
- Use-After-Free Memory Errors in C++
- Poisoning Memory Blocks for Safer C++
- Uninitialized Memory Safety in C++
- DIY Memory Safety in C++
- CUDA C++ Floating Point Exceptions
- Memory Safe C++ Library Functions
- Smart Stack Buffers for Memory Safe C++
- Safe C++ Text Buffers with snprintf
Safe C++ Book
The new Safe C++ coding book by David Spuler:
Get your copy from Amazon: Safe C++: Fixing Memory Safety Issues |
More AI Research Topics
Read more about: