Aussie AI Blog

Safe C++ Text Buffers with snprintf

  • Oct 29th, 2024
  • by David Spuler, Ph.D.

C++ sprintf is unsafe

The C++ sprintf function is a long-standing part of C and C++, but it's also unsafe. It can easily overflow a buffer, and there's no way to know without inspecting the parameters in greater detail. Consider this code:

    char buf[100];
    sprintf(buf, "%s", str);   // Buffer overflow?

One marginally safer way is to use the precision markers, such as in:

    char buf[100];
    sprintf(buf, "%.100s", str);   // Still overflows

In this way, the output is limited to 100 bytes, but this is still an overflow because of the +1 for the null byte. We really need this:

    char buf[100];
    sprintf(buf, "%.99s", str);   // No buffer overflow

snprintf is safer

The snprintf function is safer than sprintf. On some platforms, there is also the sprintf_s safe function. Here's how snprintf works:

    char buf[100];
    snprintf(buf, 100, "%s", str);   // Safer

We can write this more portably:

    char buf[100];
    snprintf(buf, sizeof buf, "%s", str);   // Safer

Problems with snprintf

Although using snprintf will avoid a buffer overrun and a crash (whereas sprintf didn't), there are still some limitations:

  • Not easy to detect if any overflow occurs (i.e., was prevented).
  • Difficult to use snprintf in the middle of a string.
  • Appending with snprintf is similarly tricky.

Detecting truncated overflows with snprintf

In many applications, you might want to know that a buffer overflow was avoided, such as by emitting an error message or throwing an exception. By default, snprintf will quietly truncate the output and do nothing else.

It is possible to examine the return value of snprintf to know whether an overflow has been prevented and the output truncated. The returned value is an integer and it's rather weird:

    The bytes that would have been output if there were enough room in the buffer..

If there's no overflow, then snprintf returns the bytes output (excluding the terminating null byte), just like unsafe sprintf. If there's an overflow, then the return value will be more than (or equal to) the size of the buffer. This seems odd, but it's actually quite useful, because the way to detect an overflow is simply to compare the return code to the buffer size:

    int bufsize = sizeof buf;
    int ret = snprintf(buf, bufsize, "%s", s);
    if (ret < 0) {
        // snprintf failure... (can this really occur?)
    }
    else if (ret >= bufsize) {
        // Overflow has occurred! (Truncated text)
    }
    else {
        // Normal case. 
        // The string and its null byte fit in the buffer.
    }

Note that if the return code exactly equals the buffer size (i.e., ret==bufsize), this is still an overflow because the extra null byte didn't fit, and snprintf has truncated one character from the output string so as to leave room for the null byte.

Macro wrapping snprintf return codes

The above code sequence is rather a lot of typing if you're going to do that for every call to snprintf. Here's a way to automate it, using a preprocessor macro intercept and an inline function to check the return code:

    #undef snprintf
    #define snprintf(dest, bufsize, ...) \
        aussie_snprintf_return_check(snprintf(dest, bufsize,__VA_ARGS__), \
                  bufsize, __func__, __FILE__, __LINE__)

This looks dangerous since the macro snprintf is also in the macro value. However, C++ preprocessor macros that are self-referential are only expanded once. This is standard functionality since inception for both C and C++.

Note that this is using variable-arguments C++ macros, which are also standard C++ for many years now. These include the "..." and the "__VA_ARGS__" tokens. There's also a useful __VA_OPT__ macro, but we don't need it here.

The above macro simply wraps the call to snprintf with another function whose only task is to check the return value. Here's an example of that definition:

    inline int aussie_snprintf_return_check(
        int snprintf_retval, int bufsize, 
        const char* func, const char* file, int line
        )
    {
	// PURPOSE: Wrapper for snprintf return value 
	// ... snprintf_retval is the value that was returned by snprintf 
        // ... (sent here by macro interception of snprintf)
	if (snprintf_retval < 0) {
		AUSSIE_ERROR_CONTEXT("AUS053", "snprintf returned negative failure", func, file, line);
		return snprintf_retval;  // pass through
	}
	else if (snprintf_retval >= bufsize) {
		int bytes_truncated = snprintf_retval - bufsize + 1;
		// TODO: report the bytes truncated, bufsize, etc., as extra error context...
		AUSSIE_ERROR_CONTEXT("AUS054", "snprintf overflow truncated buffer", func, file, line);
		return snprintf_retval;  // pass through
	}
	return snprintf_retval;  // pass through
    }

Unsafe Buffer Appending with sprintf

It's tricky to append to a string using sprintf or snprintf. Here's the basic idiom for unsafe sprintf appending using strlen:

    char xbuf[1000] = "";
    sprintf(xbuf + strlen(xbuf), "abc");
    sprintf(xbuf + strlen(xbuf), "def");
    sprintf(xbuf + strlen(xbuf), "xyz");

Note that this works even for the special case of an empty string, where strlen will return 0, and add nothing to the location.

If you do this a lot, or the buffer is a massive text string (e.g., a long HTML document in memory), then the call to strlen is a slug. Marginally better is to maintain an incremental buffer pointer, so that the strlen calls are only from the current location, which is faster.

    char* where = xbuf;
    sprintf(where, "abc");
    where += strlen(where); // append
    sprintf(where, "def");
    where += strlen(where); // append
    sprintf(where, "xyz");

And you can micro-optimize this using the return code, which works for sprintf, which returns the number of bytes output.

    char* where = xbuf;
    where += sprintf(where, "abc");
    where += sprintf(where, "def");
    where += sprintf(where, "xyz");

But beware a pitfall: don't do this trick for snprintf, because it doesn't always return the actual bytes output, but returns the bytes it would have output, had it been in the right frame of mind.

There's only one problem with all those appending tricks: none of them are safe!

Safe Buffer Appending with snprintf

How do we append safely to a buffer? We want to do this:

    char xbuf[1000] = "abc";
    snprintf_append(xbuf, sizeof xbuf, "def");

But this function doesn't exist. We have to try to define our own via a macro:

    #define snprintf_append(dest, bufsize, ...) \
        do { \
         int snplentmp = (int)strlen((char*)dest); \
         snprintf((char*)(dest) + snplentmp, (bufsize) - snplentmp, __VA_ARGS__); \
        } while(0)

As you can see, this figures out how far along the buffer to append using strlen. Then it adds that byte count to the location, but also reduces the buffer size by that amount.

It's difficult to return the value of snprintf in this statement-like macro. However, if we're using the macro intercept with #define snprintf (as in prior sections), then the wrapped return value checking will also be occurring in this usage of snprintf, so maybe we don't need to return the value to the caller.

Again, the call to strlen can become a slug for large buffers, because it's always scanning from the very start of the buffer. The alternative is to maintain a pointer to the end of the string, which is the location from which to append. Pointer arithmetic can compute the byte count more efficiently.

    #define snprintf_append_end(dest, bufsize, endstr, ...) \
        do { \
         long int snplentmp = (long) ( (char*)endstr - (char*)dest); \
         snprintf((char*)(dest) + snplentmp, (bufsize) - snplentmp, __VA_ARGS__); \
        } while(0)

If we really do need to return the code through, then it's hard to do this in a macro, which looks like a code block rather than a function-like macro. Instead of using a macro, you can define a C++ function with variable arguments, and then have it call the vsnprintf function.

    #include <stdarg.h>

    int snprintf_append_function(char *dest, int bufsize, char* format, ...)
    {
	va_list ap;

	int len = (int)strlen(dest);
	va_start(ap, format);
	int ret = vsnprintf(dest + len, bufsize - len, format, ap);
	va_end(ap);
	return ret;
    }

Again, we can avoid the slowdown from the strlen call if we maintain another pointer to the end (or middle) of the text buffer:

    #include <stdarg.h>

    int snprintf_append_end_function(char* dest, int bufsize, char *endstr, char* format, ...)
    {
	va_list ap;

	if (*endstr != 0) endstr += strlen(endstr);  // Safe: move to the end if not already
	long int len = (long)((char*)endstr - (char*)dest);
	va_start(ap, format);
	int ret = vsnprintf(dest + len, bufsize - len, format, ap);
	va_end(ap);
	return ret;
    }

Actually, for a further optimization, the parameter endstr probably should be a reference parameter, so that its value is automatically updated in the calling code whenever it gets moved to the end.

And one final safety point: we need to check the return value of vsnprintf, so that we know when an overflow caused a truncation. This is possible either through another macro intercept, like we did above for snprintf, or by adding extra code directly into the above varargs functions.

Related Memory Safety Blog Articles

See also these articles:

Safe C++ Book



Safe C++: Fixing Memory Safety Issues The new Safe C++ coding book by David Spuler:
  • Memory Safety
  • Rust versus C++
  • The Safe C++ Standard
  • Pragmatic Memory Safety

Get your copy from Amazon: Safe C++: Fixing Memory Safety Issues

More AI Research Topics

Read more about: