Aussie AI Blog
Safe C++ Text Buffers with snprintf
-
Oct 29th, 2024
-
by David Spuler, Ph.D.
C++ sprintf is unsafe
The C++ sprintf
function is a long-standing part of C and C++,
but it's also unsafe.
It can easily overflow a buffer,
and there's no way to know
without inspecting the parameters in greater detail.
Consider this code:
char buf[100]; sprintf(buf, "%s", str); // Buffer overflow?
One marginally safer way is to use the precision markers, such as in:
char buf[100]; sprintf(buf, "%.100s", str); // Still overflows
In this way, the output is limited to 100 bytes, but this is still an overflow because of the +1 for the null byte. We really need this:
char buf[100]; sprintf(buf, "%.99s", str); // No buffer overflow
snprintf is safer
The snprintf
function is safer than sprintf
.
On some platforms, there is also the sprintf_s
safe function.
Here's how snprintf
works:
char buf[100]; snprintf(buf, 100, "%s", str); // Safer
We can write this more portably:
char buf[100]; snprintf(buf, sizeof buf, "%s", str); // Safer
Problems with snprintf
Although using snprintf
will avoid a buffer overrun and a crash
(whereas sprintf
didn't),
there are still some limitations:
- Not easy to detect if any overflow occurs (i.e., was prevented).
- Difficult to use
snprintf
in the middle of a string. - Appending with
snprintf
is similarly tricky.
Detecting truncated overflows with snprintf
In many applications, you might want to know that a buffer overflow was avoided,
such as by emitting an error message or throwing an exception.
By default, snprintf
will quietly truncate the output and do nothing else.
It is possible to examine the return value of snprintf
to know
whether an overflow has been prevented and the output truncated.
The returned value is an integer and it's rather weird:
The bytes that would have been output if there were enough room in the buffer..
If there's no overflow, then snprintf
returns the bytes output (excluding the terminating null byte),
just like unsafe sprintf
.
If there's an overflow, then the return value will be more than (or equal to) the size of the buffer.
This seems odd, but it's actually quite useful,
because the way to detect an overflow is simply
to compare the return code to the buffer size:
int bufsize = sizeof buf; int ret = snprintf(buf, bufsize, "%s", s); if (ret < 0) { // snprintf failure... (can this really occur?) } else if (ret >= bufsize) { // Overflow has occurred! (Truncated text) } else { // Normal case. // The string and its null byte fit in the buffer. }
Note that if the return code exactly equals the buffer size (i.e., ret==bufsize
), this is still an overflow
because the extra null byte didn't fit,
and snprintf
has truncated one character from the output string
so as to leave room for the null byte.
Macro wrapping snprintf return codes
The above code sequence is rather a lot of typing if you're
going to do that for every call to snprintf
.
Here's a way to automate it, using a preprocessor macro intercept
and an inline
function to check the return code:
#undef snprintf #define snprintf(dest, bufsize, ...) \ aussie_snprintf_return_check(snprintf(dest, bufsize,__VA_ARGS__), \ bufsize, __func__, __FILE__, __LINE__)
This looks dangerous since the macro snprintf
is also in the macro value.
However, C++ preprocessor macros that are self-referential are only expanded once.
This is standard functionality since inception for both C and C++.
Note that this is using variable-arguments C++ macros, which are also standard C++ for many years now.
These include the "...
" and the "__VA_ARGS__
" tokens.
There's also a useful __VA_OPT__
macro, but we don't need it here.
The above macro simply wraps the call to snprintf
with another function
whose only task is to check the return value.
Here's an example of that definition:
inline int aussie_snprintf_return_check( int snprintf_retval, int bufsize, const char* func, const char* file, int line ) { // PURPOSE: Wrapper for snprintf return value // ... snprintf_retval is the value that was returned by snprintf // ... (sent here by macro interception of snprintf) if (snprintf_retval < 0) { AUSSIE_ERROR_CONTEXT("AUS053", "snprintf returned negative failure", func, file, line); return snprintf_retval; // pass through } else if (snprintf_retval >= bufsize) { int bytes_truncated = snprintf_retval - bufsize + 1; // TODO: report the bytes truncated, bufsize, etc., as extra error context... AUSSIE_ERROR_CONTEXT("AUS054", "snprintf overflow truncated buffer", func, file, line); return snprintf_retval; // pass through } return snprintf_retval; // pass through }
Unsafe Buffer Appending with sprintf
It's tricky to append to a string using sprintf
or snprintf
.
Here's the basic idiom for unsafe sprintf
appending using strlen
:
char xbuf[1000] = ""; sprintf(xbuf + strlen(xbuf), "abc"); sprintf(xbuf + strlen(xbuf), "def"); sprintf(xbuf + strlen(xbuf), "xyz");
Note that this works even for the special case of an empty string,
where strlen
will return 0, and add nothing to the location.
If you do this a lot, or the buffer is a massive text string (e.g., a long HTML document in memory),
then the call to strlen
is a slug.
Marginally better is to maintain an incremental buffer pointer, so that the strlen
calls
are only from the current location, which is faster.
char* where = xbuf; sprintf(where, "abc"); where += strlen(where); // append sprintf(where, "def"); where += strlen(where); // append sprintf(where, "xyz");
And you can micro-optimize this using the return code,
which works for sprintf
, which returns the number of bytes output.
char* where = xbuf; where += sprintf(where, "abc"); where += sprintf(where, "def"); where += sprintf(where, "xyz");
But beware a pitfall: don't do this trick for snprintf
, because it doesn't always
return the actual bytes output, but returns the bytes it would have output,
had it been in the right frame of mind.
There's only one problem with all those appending tricks: none of them are safe!
Safe Buffer Appending with snprintf
How do we append safely to a buffer? We want to do this:
char xbuf[1000] = "abc"; snprintf_append(xbuf, sizeof xbuf, "def");
But this function doesn't exist. We have to try to define our own via a macro:
#define snprintf_append(dest, bufsize, ...) \ do { \ int snplentmp = (int)strlen((char*)dest); \ snprintf((char*)(dest) + snplentmp, (bufsize) - snplentmp, __VA_ARGS__); \ } while(0)
As you can see, this figures out how far along the buffer to append using strlen
.
Then it adds that byte count to the location, but also reduces the buffer size by that amount.
It's difficult to return the value of snprintf
in this statement-like macro.
However, if we're using the macro intercept
with #define snprintf (as in prior sections), then the wrapped return value checking
will also be occurring in this usage of snprintf
,
so maybe we don't need to return the value to the caller.
Again, the call to strlen
can become a slug for large buffers,
because it's always scanning from the very start of the buffer.
The alternative is to maintain a pointer to the end of the string,
which is the location from which to append.
Pointer arithmetic can compute the byte count more efficiently.
#define snprintf_append_end(dest, bufsize, endstr, ...) \ do { \ long int snplentmp = (long) ( (char*)endstr - (char*)dest); \ snprintf((char*)(dest) + snplentmp, (bufsize) - snplentmp, __VA_ARGS__); \ } while(0)
If we really do need to return the code through, then it's hard to do this in a macro,
which looks like a code block rather than a function-like macro.
Instead of using a macro, you can define a C++ function with variable arguments,
and then have it call the vsnprintf
function.
#include <stdarg.h> int snprintf_append_function(char *dest, int bufsize, char* format, ...) { va_list ap; int len = (int)strlen(dest); va_start(ap, format); int ret = vsnprintf(dest + len, bufsize - len, format, ap); va_end(ap); return ret; }
Again, we can avoid the slowdown from the strlen
call if we maintain
another pointer to the end (or middle) of the text buffer:
#include <stdarg.h> int snprintf_append_end_function(char* dest, int bufsize, char *endstr, char* format, ...) { va_list ap; if (*endstr != 0) endstr += strlen(endstr); // Safe: move to the end if not already long int len = (long)((char*)endstr - (char*)dest); va_start(ap, format); int ret = vsnprintf(dest + len, bufsize - len, format, ap); va_end(ap); return ret; }
Actually, for a further optimization,
the parameter endstr
probably should be a reference parameter,
so that its value is automatically updated in the calling code whenever it gets moved to the end.
And one final safety point: we need to check the return value of vsnprintf
,
so that we know when an overflow caused a truncation.
This is possible
either through another macro intercept, like we did above for snprintf
,
or by adding extra code directly into the above varargs functions.
Related Memory Safety Blog Articles
See also these articles:
- DIY Preventive C++ Memory Safety
- Canary Values & Redzones for Memory-Safe C++
- Use-After-Free Memory Errors in C++
- Array Bounds Violations and Memory Safe C++
- Poisoning Memory Blocks for Safer C++
- Uninitialized Memory Safety in C++
- DIY Memory Safety in C++
- CUDA C++ Floating Point Exceptions
- Memory Safe C++ Library Functions
- Smart Stack Buffers for Memory Safe C++
Safe C++ Book
The new Safe C++ coding book by David Spuler:
Get your copy from Amazon: Safe C++: Fixing Memory Safety Issues |
More AI Research Topics
Read more about: