Aussie AI
C++ Standard Library Bugs
-
Bonus Material for "Generative AI in C++"
-
by David Spuler, Ph.D.
Standard Library Bugs
Trigonometric functions use radians
The trigonometric functions sin, cos, and tan require their arguments in radians, not degrees. Usages such as below are likely to be erroneous:
double cos_60 = cos(60.0); /* PROBABLE ERROR */
The natural assumption is that cos_60 will be 0.5, but in fact the call to cos computes the cosine of 60 radians, which is approximately 3437 degrees. The correct code should convert the degree value to radians using the relationship that PI radians equals 180 degrees. Therefore, the following is one method of converting degrees to radians:
#define CONVERT(degrees) (((double)(degrees) * M_PI) / 180.0) double cos_60 = cos(CONVERT(60));
Note that the use of computations of this nature may involve roundoff errors, and the value need not be exactly equal to 0.5. In fact, when printed out using the printf format %.20f to show 20 decimal places, the calculated result of cos_60 on my implementation was actually:
0.50000000000000011000
Boolean library function compared with true
A similar problem to that involving strcmp is that a number of library functions return a "Boolean" value. For example, the character test functions in <ctype.h> such as isdigit do not only return 0 and 1 for false and true.
These functions usually return a nonzero integer for true (it is usually positive, but some obscure implementations might return a negative). Hence, the following test is incorrect:
if (isdigit(ch) == true) /* WRONG */
The correct version is simply:
if (isdigit(ch)) /* CORRECT */
The comparison of the return result with zero is (accidentally) correct, but indicates a dangerous habit:
if (isdigit(ch) == false) /* DANGEROUS */
Again, the correct style is:
if (!isdigit(ch)) /* CORRECT */
Other boolean functions for which this pitfall exists include the feof and ferror library functions:
if (feof(fp) == true) /* WRONG */ if (ferror(fp) == true) /* WRONG */
Confusing identical library function parameters
There are a number of library functions that have two parameters of the same type. For example, many functions accept two parameters of type char*, and therefore novice errors such as the following will compile without warning:
fp = fopen("r", filename); // Wrong strcpy("hello", dest); // Wrong memset(arr, MAXSZ, 0); // Wrong
There are also a number of functions with two integer parameters indicating number and size of the objects concerned. These functions include fread, fwrite, calloc, bsearch, and qsort. Unfortunately, the ordering of these parameters differs across these functions. For example, the prototypes of fread and calloc illustrate the different orderings of the "size" and "number" parameters:
size_t fread(void *p, size_t size, size_t number, FILE *fp); void* calloc(size_t number, size_t size);
Therefore, the roles of the two parameters may be occasionally confused. Because both parameters have the same types the compiler will not warn about erroneous calls such as the following:
p = calloc(sizeof(int), 100); fread(buf, 100, sizeof(struct rec), fp); qsort(arr, sizeof(int), 100, cmp_fn);
Fortunately, such errors may be harmless for calloc (and also for fread and fwrite if their return value is ignored) since the effect of the call depends mainly on the product of the two arguments. However, passing the wrong arguments to qsort or bsearch can lead to disastrous results.
Miscellaneous library function errors
There are many other areas of standard library usage that are undefined by the standard. Some of these areas are listed below:
- wrong fopen mode (e.g., applying fseek to file opened as "r")
- ungetc cannot push back 2 or more characters for one file (undefined behavior)
- ungetc causes the file position pointer to become indeterminate (on a text file)
- ungetc won't work with the C++ <iostream.h> library
- remove applied to a file that is currently open (undefined behavior)
- rename when the new file already exists (undefined behavior)
- printf producing more than 509 characters (undefined behavior)
- arguments overlap for strcpy, strncpy, strcat, strncat, memcpy
- destination and arguments overlap (e.g., for %s): sprintf, vsprintf, sscanf
- memcmp cannot be portably used to compare structs because of padding "holes"
- abs may fail (overflow) for the "largest" negative, i.e., abs(INT_MIN)
- labs may fail (overflow) for the "largest" negative, i.e., labs(LONG_MIN)
- time_t is not necessarily a count of seconds elapsed
- bad format string to strftime
getchar idiom problems
For example, the common method of receiving input using getchar is:
while ((ch = getchar()) != EOF) putchar(ch);
This has the inherent danger of a precedence error if the brackets around the assignment are omitted. The following code will output a whole stream of characters with ASCII value 1, since the != operator will return 1 (i.e., true) until EOF is found:
while (ch = getchar() != EOF) /* ERROR */ putchar(ch);
There's also the portability bug if ch is type "char" because EOF is usually implemented as a negative integer. Use type "int" for ch here.
Wrong types to printf or scanf
The compiler cannot type-check arguments to functions with a variable number of parameters, such as printf and scanf. Even worse, functions with variable parameters cannot themselves check for correct types. If the wrong types are passed to printf or scanf, a number of different run-time errors can occur.
The most likely problem is that a meaningless number will be printed (i.e. garbage as output). The worst that can occur is a segmentation fault or some other abnormal termination.
A particular instance of wrong types to scanf is forgetting the & in front of the arguments. This causes the wrong type to be passed. Instead of a pointer to the variable being passed, the variable's value is passed as if it were a pointer value. This is usually an illegal address and a dereference of this address inside the scanf function will cause undefined run-time behavior, typically a crash.
Errors with printf
The compiler cannot type-check arguments to functions with a variable number of arguments, such as printf and scanf. Even worse, functions with variable arguments cannot themselves check for correct types. If the wrong types are passed to printf or scanf,anumber of different run-time errors can occur. The most harmless problem is that a meaningless number will be printed (i.e., garbage characters as output). The worst that can occur is a segmentation fault or some other abnormal termination.
There are a number of common errors in calling printf. The simplest error is to simply forget the argument:
printf("%d\n"); /* MISSING ARGUMENT */
This causes an integer to be extracted from the stack even though one was not passed as an argument. The usual result is output of a weird integer value.
The other similar error is to pass too many arguments to printf, as below. Fortunately, this causes no harmful effects since the extra argument is ignored.
printf("%d\n", i, j); /* EXTRA ARGUMENT */
Even if the correct number of arguments is passed, their types must also be correct for the corresponding format. For example, accidentally passing an int argument for a %s format will cause the integer to be treated as an address, causing abnormal termination or at least the output of unexpected string data:
printf("%s", 10); /* ERROR */
Fortunately, the default argument promotions that are applied to variable arguments reduce the frequency of type mismatches. Arguments of type char or short are promoted to int, and float is promoted to double. This explains why the printf formats %e, %f and %g are valid for both float and double, and also why there is no special format for short values. Therefore, accidentally passing a char or short instead of an int, or a float instead of a double, will not cause an error; in fact, these situations are quite common.
char ch; short sh; float fl; printf("%d %d %f\n", ch, sh, fl); /* OK */
Nevertheless, there are a few situations where a mismatch is possible. The major types and their formats that cannot be mixed are int (%d), long (%ld), double (%f), and char* (%s). The error in passing an integer to %s has been discussed above. Passing an integer for a %f format or a float for a %d format will also cause an error. There is no implicit conversion between integral and floating-point types when they are passed as arguments to a variable-argument function, although there are conversions when they are passed to a fixed-argument prototyped function. An example of an erroneous call passing an integer to %f is:
#define COST 10 .... printf("Cost is %4.2f dollars\n", COST);
Portability to embedded systems: One source of trouble with portability is the type of large integer constants on 16-bit implementations. Consider the following printf statement:
#define SALARY 50000 .. printf("The manager has salary %d dollars\n", SALARY);
Missing FILE* argument to fprintf
Another type-related error that is prevalent in notice code is a missing FILE* argument in a fprintf call. Typically, this occurs when the programmer forgets to use stderr. The following statement illustrates the error:
fprintf("Error...file %s didn't open!\n", filename);
Modern C++ compilers will detect this, and it's usually just a simple fix.
fprintf(stderr, "Error...file %s didn't open!\n", filename);
Printing a string with printf
One dangerous practice that introduces a subtle error is a common method of printing a string using printf. The following statement contains a hidden error:
printf(str); /* DANGEROUS */
This will print the string correctly unless it happens to contain a % character, in which case printf will expect extra arguments. No arguments are present, and a fatal error may result. The correct method is:
printf("%s", str); /* CORRECT */
Errors with scanf
The scanf function exhibits similar type-related errors to printf because it is also a variable-argument function, and the compiler cannot type-check its arguments. This problem is compounded by the common misconception that the format string for scanf can be the same as that used for printf, which causes the programmer to use the wrong type arguments.
Missing & on scanf argument
A very common instance of passing wrong argument types to scanf is forgetting the & in front of the arguments. Instead of a pointer to the variable being passed, the variable's value is passed as if it were a pointer value. This is usually an illegal address and a dereference of this address inside the scanf function will cause undefined run-time behavior. The error is seen in the following code:
int x = 0; .... scanf("%d", x); /* ERROR */
The correct scanf statement is:
scanf("%d", &x); /* CORRECT */
An & is needed for input of all basic types, including integers, characters, and floating point values. However, it is not needed for string arguments corresponding to the %s format. Accidentally using an & for a char[] argument is often harmless (perhaps causing a compilation warning about & being "ignored"), but using & on a char* argument to %s can cause a fatal error similar to that from omitting & on basic types:
char s1[10]; char *s2; .... scanf("%s", &s1); /* HARMLESS */ scanf("%s", &s2); /* ERROR */
Wrong argument types to scanf
Even if & is used correctly, the scanf function is still more prone to argument type errors than printf simply because the default argument promotions offer little safety for scanf. Whereas printf allowed both float and double for the %f format, scanf has %f for float and %lf for double. Similarly, scanf has a special format %hd for short int, whereas printf simply used %d for both short and int values. The difference occurs because the default argument promotions do not apply to addresses, and all arguments to scanf are addresses. Consider the following erroneous code:
double f; scanf("%f", &f); /* ERROR */
This will attempt to store a float value at an address that actually contains a double. Although this should not cause a run-time error, the value of f is likely to be incorrect (unless the implementation has float and double the same size).
A similar, more dangerous example is the confusion between %d and %hd, as follows:
short x; scanf("%d", &x); /* ERROR */
This statement attempts to store an int at an address that points to a short. If a short is 2 bytes and int is 4 bytes the program is unwittingly modifying some unknown extra 2 bytes. Note that such errors can lie dormant in a program as portability problems if short and int are the same size for a particular implementation.
scanf and printf format strings differ
A common error that even reasonably experienced programmers may make is the assumption that the format strings for printf and scanf are the same. There is some overlap in the common formats such as %d and %s, but there are many differences and programmers should study printf and scanf separately, rather than assuming a one-to-one mapping of facilities.
One common mistake with scanf involves the input of floating-point numbers. Since printf uses %f for double types it is quite common for a programmer to mistakenly use %f for a double variable in a scanf call:
double x; ... scanf("%f", &x);
This is incorrect since %f is the format for float, and %lf is the correct scanf format for double. There is further confusion in that the %lf format is not correct for printf (it specifies long double); instead simply %f should be used for both float and double.
This form of error can lie dormant in a program since many implementations use the same representation for float, double, and long double. It becomes a portability problem for other implementations where the representations differ.
Unnecessary type casting to float
A minor mistake made by novice programmers is to unnecessarily type-cast arguments to the printf %f format specifier to float even when they are already double. The following statement is an example:
printf("x*2.0 = %f\n", (float)(x*2.0));
The type cast is not necessary because %f applies to both float and double. This is not a dangerous error, since the value is simply converted to float by the type cast and then promoted back up to double with argument promotions. However, it is bad style, and indicates a dangerous misunderstanding that may lead to a later error. For example, this programmer might believe that %f applies to float and %lf applies to double (as is the case for scanf), whereas %lf actually applies to long double in printf.
scanf has no precision specifier
Another common error with using the same format string for printf and scanf is using a field width and precision. Suppose you have generated a data file using:
fprintf(dat, "%6.4f\n", val);
The following statement is erroneous, even if val has float type (if it has double type then we have the mismatch with the %f format):
fscanf(dat, "%6.4f", &val); /* WRONG */
The error is that scanf formats do not allow two integers. Whereas printf has both a field width and a precision, scanf only allows a single integer indicating field width. The above statement is interpreted by scanf as if the format specification were "%6.", where the '.' character is an undefined format specifier. Therefore, the format string is erroneous, and scanf will return zero, and will not alter val, and will not even read any characters in the file.
Whitespace in scanf format strings
Programmers get into the habit of adding \n at the end of a printf call, and because of the similarity between printf and scanf format strings, they can accidentally do so for a scanf call. A newline in the scanf format string causes characters up to the next non-whitespace character to be read. For example, if the following program is executed, the user will have to enter a number and then another character other than space, tab, or newline:
#include <stdio.h> int main() { int x = 0; printf("Enter an integer: "); scanf("%d\n", &x); }
In fact, any whitespace in a scanf format string is usually misplaced, although it is often harmless. A common indication that something is wrong is when two or more adjacent whitespace characters appear in a scanf format string. Tw o whitespace characters have the same effect as one, so the usage is probably erroneous.
Even a single space character is often not necessary. The scanf format string "%d %d" needs no space before the second %d specification because scanf skips whitespace when looking for a number. Hence, whitespace is skipped twice and the space is redundant. The only specifications that may require a preceding space character are the %c and %[ specifications, since these do not skip whitespace by default. The programmer should consider carefully whether to put a space before a %c specification; using a space will read the first non-whitespace character, and omitting the space will be similar to getchar:
scanf("%c", &c); /* act like getchar */ scanf(" %c", &c); /* read first non-whitespace */
Whitespace and the %c scanf format
Not using a space before a %c format is a reasonably common error in scanf usage. The crucial function is:
void menu(void) { char choice; while(1) { /* infinite loop */ printf("--- MAIN MENU ---\n"); printf("1. Insert\n"); printf("2. Delete\n"); printf("3. Quit\n"); printf("-----------------\n"); printf("\nEnter your choice: \n"); scanf("%c", &choice); /* ERROR! */ switch(choice) { case '1': insert_obj(); break; case '2': delete_obj(); break; case '3': exit(0); /* Quit */ default: printf("Unknown choice... try again\n\n"); } } /* infinite loop */ }
Using %c without a preceding space for the input will cause strange behavior. When the user chooses 1 or 2 the program will perform the insertion or deletion operation (which we assume to consume no input), and will then read the newline (or any other whitespace typed by the user) as the next user choice. Hence, choosing 1 or 2 and hitting return will cause a complaint about an "Unknown choice" (because the newline character is processed as a choice) and an extra appearance of the menu. The solution is to add a space to the format string in the scanf statement:
scanf(" %c", &choice); /* CORRECT */
Misusing feof
A common erroneous practice is to use feof to control whether or not to read input from a file. The error arises because of a misunderstanding of how feof operates. The feof function returns true if a previous read operation tried to read past the end of the file. Therefore, unlike the similar eof function in Pascal, feof can return false (i.e., not end of file) even if there are no more characters to read (i.e., the previous read operation read the last character, but did not try to read past the end of the file). Therefore, it is important to always check the return value of input functions even if feof has returned false. The error is illustrated in the code below to copyafile to stdout:
#include <stdio.h> int main() { FILE *fp; char *fname = "dummy"; char s[100]; /* Copy file to stdout */ fp = fopen(fname, "r"); while(!feof(fp)) { fgets(s, 100, fp); printf("%s", s); } }
The main error that occurs is that the last line is printed twice. The return value of fgets has not been examined to check for a NULL return indicating end of file. The feof test will not indicate end of file after reading the last line because it has just read the last newline in the file and hasn't yet tried to read past the end of the file. When fgets returns NULL on end of file its string argument is unchanged, and in this case it contains the previous line. A slightly better method is to test the return value of fgets:
while(!feof(fp)) { char *ret = fgets(s, 100, fp); if (ret = NULL) break; ..... /* process the line */ }
However, it is rather pointless to use feof to control the loop iteration, since the loop will never actually exit from the feof test. The following idiom for reading from a file is much better:
while(fgets(s, 100, fp) != NULL) { ..... /* process the line */ }
This idiom where an input statement is involved in the loop condition is very common. For example, it is commonly used with getchar, getch and fread:
while((ch = getchar()) != EOF) { .... } while((ch = getc(fp)) != EOF) { .... } while(fread(buf, sizeof(int), 1, fp) != 0) { .... }
However, note another common porting bug with signed/unchanged chars: ch should be declared int here, not char, as EOF is usually -1.
Another fairly common error involving fgets is the method of removing the newline from the string that fgets reads (recall that gets removes newlines, but fgets leaves the newline in the string). Therefore, code such as the following is often seen after an fgets call:
len = strlen(s); s[len - 1] = '\0'; /* overwrite the newline */
This is erroneous if there is not a newline at the end of the string, which can occur from reading in a long line (in which case the fgets call may not have read the newline), and also since it is not always guaranteed that there will be a newline on the end of the last line of a file. The correct method is:
len = strlen(s); if (len > 0 && s[len - 1] == '\n') s[len - 1] = '\0'; /* overwrite the newline *
getchar and getc return int not char
It is all too common for programs to assign the character returned from getchar or getc to a variable of type char. Howev er, this is an incorrect practice because these functions return EOF upon end of file, and EOF is an int, not a char. Consider the common method of copying input to output:
char ch; /* WRONG */ while( (ch = getchar()) != EOF) putchar(ch);
In practice, EOF is always equal to −1, which is a value that cannot always be represented by the type char. The result of the above code depends on whether the compiler treats plain char as signed or unsigned char, which is unspecified by the standard.
If characters are unsigned, the above expression causes −1 to be promoted to unsigned int for comparison with the returned character. Howev er, the variable c can never hold a value equivalent to (unsigned)-1, and the while loop becomes an infinite loop that continues to read characters even after end of file.
If characters are signed, the code above will mostly perform as expected because −1 can be represented as a signed char. Howev er, there is a rare error that if the input contains the character with ASCII value 255, then it too will be represented as −1 and the loop will terminate prematurely.
The solution is simply to declare "c" as an int. An int type should always hold the returned value of getchar or getc as long as the value could be EOF. As soon as the presence of EOF has been tested, the value could be assigned to a char variable (such as for efficiency). The idea is illustrated below:
int ch; /* CORRECT */ while( (ch = getchar()) != EOF) { char c2 = ch; /* no longer need be int */ /* ... do something with c2 */ }
setbuf buffer is local array
The buffer used in a call to setbuf or setvbuf must be a static or global array variable, or an allocated heap block. If it is an automatic local array variable, a run-time failure may occur because when the function containing the local array returns, the space on the stack for that array is no longer used for that array (and may in fact be used by different variables).
It is not even safe to declare the buffer for setbuf or setbuf as an automatic array in main since the flushing of buffers by the exit function may occur after main has returned. The following code contains the error:
#include <stdio.h> #include <stdlib.h> int main() { char buf[BUFSIZ]; /* should be static */ setbuf(stdout, buf); /* ... do work */ exit(EXIT_SUCCESS); }
<stdarg.h> errors
Variable-argument functions are tricky enough, but there's even more traps and pitfalls than you thought. There are a number of common errors when declaring variable-argument functions using the <stdarg.h> macros: va_start, va_arg, and va_end. Some of these errors are:
- va_end not called at the end of the function.
- va_start uses a register parameter (in old C++).
- va_start uses an array parameter.
- va_start uses a char, short, or float parameter.
- va_start doesn't use a parameter (e.g., wrongly uses a local variable).
- va_start doesn't use the last named parameter.
- va_start uses a C++ reference parameter.
- va_arg uses a complicated type declarator.
- va_list type passed to printf.
- va_arg uses the type char, short, or float.
- relying on call-by-value when passing the va_list variable.
- va_end not called in the same function as va_start.
- mismatched multiple calls to va_start and va_end.
The simplest error is to forget to call va_end at the end of the function. The va_end macro performs any necessary cleanup, possibly deallocating memory allocated by va_start. This error is not dangerous in many cases because va_end is a "do nothing" macro in many implementations, but it is still a dormant portability problem.
The restrictions on the parameter used by va_start are caused by the fact that, in most implementations, it applies the address-of operator (&) and the sizeof operator to the parameter. If the parameter is declared as register, in old C++ before the register keyword was deprecated in C++17, then applying the address-of operator is illegal and should provoke a compilation error.
If the parameter used by va_start is an array type, the sizeof operator computes the size of the corresponding pointer type instead of the size of the array and all its elements. Therefore, the process of extracting arguments with va_arg is corrupted because an incorrect size is computed and the bytes in the array parameter might be returned by later calls to va_arg.
Similarly, if the parameter used by va_start is of type char, short, or float it can corrupt the argument extraction. Although I'm unsure of the reason for this restriction, one conjecture is that a small type would corrupt alignment of the argument list. Note that the use of char, short, or float with va_arg is a different error.
Naturally, va_start must use a function parameter; using a variable can cause any manner of failure. Another error is applying va_start to the wrong parameter. If the parameter supplied to va_start is not the last named parameter, subsequent calls to va_arg may return values of the named parameters
Another form of rare error when using va_start in C++ occurs when the last named parameter is a reference parameter, as in:
void fn(int &x, ...) // pass x by reference { .... va_start(ap, x); // ERROR .... }
The problem is that most va_start implementations are simply macros that will involve computing &x. Unfortunately, the address of a reference cannot be computed, and &x evaluates to the address of the value referenced by x — that is, the address of the argument passed to x. All manner of failures are possible, since later va_arg calls will extract values from near where the argument is stored, rather than near where the parameter is stored.
va_arg with pointer-to-function type
The va_arg macro is often implemented in a way that simply appends a * to the type argument to create a new type. For example, with va_arg(ap,int) it will make some internal use of the type int*. Howev er, there are some types for which appending a star will not create a legal type, and such uses will cause a syntax error at the va_arg call. Typically, the problematical types are those containing brackets. For example, trying to extract a pointer-to-function argument using:
ptr_to_fn = va_arg(ap, void(*)(int));
is likely to cause a syntax error because appending a star will not work. The solution is to use a typedef name:
typedef void (*pfn)(int); .... ptr_to_fn = va_arg(ap, pfn);
va_list type passed to printf
A common error made by inexperienced programmers when coding up their first function using <stdarg.h> is to misunderstand how the pointer to the list of arguments (i.e., the va_list variable) should be used. A typical error is to assume that printf will understand variable argument lists:
#include <stdarg.h> #include <stdio.h> #include <stdlib.h> void debug_print(char *format, ...) { va_list ap; va_start(ap, format); printf(format, ap); /* should be vprintf */ va_end(ap); } int main() { debug_print("x = %d\n", 3); exit(0); }
The result will usually be incorrect output, although some more drastic failure could also occur. The problem is that the value of ap is used as data (i.e., the address of the start of the argument block) rather than what ap points to being used. Hence, printf has only one argument, a 4-byte pointer value, which is used as the data for the %d format. Unfortunately, the compiler does not produce a warning about this incorrect use of va_list because printf's arguments cannot be type-checked. The correct solution is to call one of the functions that are specially designed to accept a va_list argument: vprintf, vfprintf, or vsprintf.
va_arg uses char, short, or float
A common error where the use of char, short, or float types is dangerous is the use of va_arg. The va_arg macro is used to extract unnamed arguments to the variable-argument function, and these arguments will have had the default promotions applied: char and short promote to int; float promotes to double. Consider the effect of the following erroneous use of type char:
char ch; .... ch = va_arg(p, char); /* WRONG */
The va_arg macro usually applies the sizeof operator to the second type parameter. The use of char will compute the size as 1, whereas the correct number of bytes is sizeof(int) — usually 2 or 4 bytes. Therefore, the extraction of a 1 byte value is incorrect and va_arg may return an incorrect character value. The remaining extra bytes will cause erroneous behavior of subsequent calls to va_arg. The correct method of extracting a character argument is:
ch = va_arg(p, int); /* CORRECT */
The variable ch can be left declared as type char. Using type int may be preferable style, but leaving the variable with the intended type will not cause a failure since the returned value of va_arg will be truncated to the required type.
The use of short or float as types for va_arg will cause a failure analogous to the one described above reg arding char. On some implementations they may cause no failure — using short is not dangerous if sizeof(short) is the same as sizeof(int), and similarly, using float is not dangerous if sizeof(float) equals sizeof(double).
Some implementations use very clever macros to handle erroneous types to va_arg correctly. For example, the va_arg macro might include a constant conditional expression that compares the size to ensure that it is at least as large as an int. Alternatively, the implementor could use a hidden built-in operator, which is similar to sizeof except that it returns the size of the type after the default promotions. Even on implementations where such usage is harmless, it represents a dangerous portability pitfall and should be avoided.
Naturally, the restriction on va_arg also applies to the types signed char, unsigned char, signed short, and unsigned short. Howev er, pointers to these types are allowed because no promotions are applied to pointer types. Therefore, uses such as va_arg(p,char*) are legal.
Passing va_list function arguments
The pointer of type va_list initialized by va_start can safely be passed to another function, and it is satisfactory to define a function with a va_list parameter. The standard library functions vprintf, vfprintf, and vsprintf have this property, and we can also define new functions. The va_arg macro can be called in a different function from that in which the corresponding va_start macro was invoked. However, there are a few pitfalls to avoid when passing va_list arguments around.Firstly, after processing all the arguments, the va_end macro must be invoked in the same function that called va_start. This means that a variable-argument function can pass its argument list to another function, but the second function cannot call va_end.
The second problem relates to the type of va_list. The majority of implementations declare va_list as a pointer type, typically char*. This means that passing a va_list parameter to a function will not change the parameter in the top-level function because of call-by-value parameter passing of pointer types. However, other implementations could declare va_list as an array type that would display pass-by-reference semantics, and would be changed in the top-level function. Hence, any code that passes va_list types to other functions should take care that it works for any type of va_list. For example, the following code relies on call-by-value passing of the va_list variable:
#include <stdarg.h> #include <stdio.h> void addlog(char *format, va_list ap) { extern FILE *logfile; vfprintf(logfile, format, ap); } void debug_print(char *format, ...) { va_list ap; va_start(ap, format); addlog(format, ap); /* to log file */ vprintf(format, ap); /* to screen */ va_end(ap); }
The problem is that the addlog function could modify ap (i.e., if vfprintf modifies ap) if va_list is an array type. Hence, this code exhibits a portability problem for some implementations.
One solution is to use a temporary variable of type va_list, either in the top-level function (as in the example below) or in the lower-level function. The copying of the temporary must be performed using memcpy rather than the assignment "temp=ap;" because the latter will fail with a compilation error when va_list is an array type.
void debug_print(char *format, ...) { va_list ap; va_list temp; va_start(ap, format); /* use temporary va_list variable */ memcpy(&temp, &ap, sizeof(va_list)); addlog(format, temp); /* No need to use temporary for last usage */ vprintf(format, ap); va_end(ap); }
Alternatively, the top-level function could use va_start to restart the processing of arguments at the beginning of the argument list. And here there is another pitfall: va_end must be called to finish off the current processing before calling va_start to restart another pass over the arguments.
void debug_print(char *format, ...) { va_list ap; va_start(ap, format); addlog(format, ap); /* to file */ va_end(ap); /* Finish ....*/ va_start(ap, format); /* .. and restart */ vprintf(format, ap); /* to screen */ va_end(ap); }
Naturally, the reverse of this error can arise, where the programmer assumes call-by-reference either because va_list is an array type for that implementation, or more likely, just a mistaken assumption about va_list. The following dummy example shows the incorrect way to extract multiple arguments by passing the va_list pointer to a function:
#include <stdarg.h> #include <stdio.h> #include <stdlib.h> void extract_and_print(va_list ap) { printf("%d\n", va_arg(ap,int)); } void unusual_print(int how_many, ...) { va_list ap; va_start(ap, how_many); for (int i = 1; i <= how_many; i++) extract_and_print(ap); va_end(ap); } int main() { unusual_print(5, 1,2,3,4,5); exit(0); }
On most implementations, va_list will be passed by value and the program will print out fiv e 1's, because the same argument is repeatedly extracted. The program is correct only if va_list is an array type for the chosen implementation.
If it is necessary to pass a va_list type in such a way that changes must be propagated to the top-level function, then the solution is to use the type pointer-to-va_list to give the effect of pass-by-reference. This is the same as passing any other scalar type by reference using pointers: use the type va_list* for the function parameter, pass the address of ap as the argument corresponding to that parameter, and deference that pointer type inside the function. The corrected implementation of the above example becomes:
#include <stdarg.h> #include <stdio.h> #include <stdlib.h> void extract_and_print(va_list *ap) /* pointer type */ { printf("%d\n", va_arg(*ap,int)); /* dereference ap */ } void unusual_print(int how_many, ...) { va_list ap; va_start(ap, how_many); for (int i = 1; i <= how_many; i++) extract_and_print(&ap); /* pass the address */ va_end(ap); }
Side effects to assert macro
It is incorrect to place any side effects in a call to the assert macro. Although the side effects will be evaluated when assertions are turned on, the program will fail as soon as assertions are disabled by setting NDEBUG. Side effects include assignment, increment, decrement, accessing a volatile variable, and calling certain types of function. An example of an incorrect use of assert is the tricky code:
assert(++count < MAX); /* WRONG */
The increment to count may be an important side effect. Removing assertions by defining the macro NDEBUG will cause the increment to disappear. The correct code is simply:
++count; assert(count < MAX);
Although an experienced programmer is unlikely to intentionally place side effects in an assert macro, it can occur by accident:
assert(x = 3); /* WRONG */
This assertion exhibits the =/== error and accidentally assigns 3 to x.
strncpy leaves string nonterminated
A rather dangerous feature of the strncpy function is that it can leave a string nonterminated. It is reasonably common practice to use strncpy as a more reliable method of copying strings than strcpy, because it prevents too many characters from being copied. An example of such a usage is:char s[MAX]; ... strncpy(s, s2, MAX);
This will prevent s from being overwritten by too many characters, regardless of the length of s2. Howev er, if strlen(s2) is greater than or equal to MAX, then s will not be left null-terminated. Exactly MAX nonzero characters will be copied to s and no terminating zero character is appended by strncpy. A more reliable method of using strncpy that avoids this danger is given below:
strncpy(s, s2, MAX - 1); s[MAX - 1] = '\0';
Unlike the strncpy function, the strncat library function does not have this dangerous feature and always leavesastring null-terminated.
sscanf twice on the same string
Programmers are sometimes confused about the behavior of sscanf, and assume that two successive calls to sscanf will read along a string in the same way as two successive scanf calls would read input. However, sscanf does not save its position and the second call to sscanf will simply restart from the beginning of the string. Consider the code:
char s[] = "10 20"; sscanf(s,"%d", &i); sscanf(s,"%d", &j); /* INCORRECT */
The value of i and j will always be the same after this code sequence — both will equal 10.
Misusing strcat
A common mistake in the use of strcat is to assume that it concatenates its two arguments, and returns the resulting string. This leads to uses such as:
char *str2; ... str2 = strcat("The cat sat on the ", str); /* WRONG */
The actual behavior of strcat is to concatenate the second argument onto the end of the first, and return the address of the first string. Hence, this return value is quite useless in most circumstances, and is mostly ignored by programmers. However, the fact that it does return a value means that the above code will compile and run, but its behavior is undefined.
In fact, the statement will correctly concatenate the two strings on platforms where literal strings are writeable, and str2 will point to the new string. However, the code is appending str on the end of the string constant. Hence, the statement overwrites the memory where string constants are stored, and depending on whether this memory is read-only, will lead to a runtime error on some systems, and undefined behavior on others.
In fact, the instance of this error was seen in a real code loop, similar to the following:
for (;;) { char *str2 = strcat("The cat sat on the ", str); puts(str2); }
This produces the following very strange output, caused by repeatedly concatenating str onto the end of the string constant:
The cat sat on the mat The cat sat on the matmat The cat sat on the matmatmat The cat sat on the matmatmatmat The cat sat on the matmatmatmatmat .........
There are many methods of achieving the desired results of the above strcat statement. One correct method is:
char str2[MAX_LEN]; ... strcpy(str2, "The cat sat on the "); strcat(str2, str);
Confusing strcpy and strcmp
This is an error often made by me when tiredness causes the typing of the wrong function name. These are probably the two most common string functions and their occasional transposition is not surprising. Consider the following statements:
strcmp(s1, s2); if (strcpy(s1,s2) == 0) ....
The use of strcmp is a null effect statement, and the test of strcpy copies rather than comparing. Neither usage will receive a compilation warning. The similar expression strcpy(s1,s2)>0 may receive a warning about a suspicious pointer comparison.
strcmp compared with -1 or 1
Although it would probably be quite sensible, the strcmp function is not restricted to returning only 1, 0 and −1. It can return any negative or positive number instead of −1 or 1, respectively. Thus, only comparisons of the return value with zero are safe. The following comparison is incorrectly determining if s1 is lexically after s2:
if (strcmp(s1, s2) == 1) /* WRONG */The correct version is:
if (strcmp(s1, s2) > 0) /* CORRECT */
qsort/bsearch using the strcmp function
The strcmp function is not an appropriate comparison function for passing to qsort for sorting arrays of strings, as in:
qsort(arr, n, sizeof(arr[0]), strcmp); /* WRONG */
This is true regardless of whether the array of strings is declared as char*[] or char[][]. The solution is to write a comparison function that contains a call to strcmp, such as the following code:
#include <stdio.h> #include <stdlib.h> /* declare qsort */ int cmp(const void *p, const void *p2) { return strcmp(*(char**)p,*(char**)p2); } int main() { int n = 3,; char *arr[] = { "abc", "def", "abcd" }; qsort(arr, n, sizeof(arr[0]), cmp); for (int i = 0; i < n; i++) printf("'%s'\n", arr[i]); }
The main point to note is that the void* pointer received by the cmp function is actually of type char**, and not type char*, which strcmp requires. Using strcmp would cause it to operate on strings starting at &arr[0], which is actually a pointer value (i.e., it treats the bytes of the pointer as a string).