Aussie AI
C++ Preprocessor Macro Bugs
-
Bonus Material for "Generative AI in C++"
-
by David Spuler, Ph.D.
Preprocessor Macro Bugs
Preprocessor macros can cause two types of problems: compilation errors and runtime errors. Compiler diagnostics due to incorrect macro definitions can sometimes be difficult to understand. This is because the compiler receives input after the preprocessor has done its substitutions. Comments have been deleted, and any macros or symbolic constants have been replaced by their corresponding text. Hence, the diagnostic lines printed up by the compiler (supposedly program lines) may bear no resemblance to your program.
The second problem with macros is a logical error at runtime. This occurs when the macro expansion is syntactically correct, but is not what was intended by the programmer. This kind of bug can be very difficult to track down. Many of the mistakes mentioned below do not cause a compilation warning. This makes them very dangerous.
There are many bugs that can arise in using the C++ preprocessor, because it's a separate phase before parsing, and things can get confused. The rules about safely using #define macros in C++ include:
- Parentheses around the whole macro
- Parentheses around every use of a macro parameter
- Avoid macro parameters twice
- No semicolons at the end
- No if without else
- Use do-while not while loops
Macro with Semicolons
In a macro definition there should not be a semicolon on the end. The preprocessor replaces text exactly as it is asked to, and putting a semicolon on the end usually leads to there being an extra semicolon in the wrong place. For example, consider the code:
#define MAX 10; // Wrong x = MAX * 2;The macro expansion leads to the incorrect statement:
x = 10; * 2;
In this example, the extra semicolon end the statement, and then causes "*2;" to be seen as a second statement, which is a pointer dereference ("*" operator) of the constant 2. As is the case with many instances of this error, this example causes a compilation error because the constant 2 is not a legal pointer type.
Here's an example that is a little more insidious. This issue only gets a warning, not a compilation syntax error. The consider the code:
#define SQUARE(x) x*x; // Wrong
The semicolon is wrong. When the caller does:
y = SQUARE(3.14) + 2;
This will expand to:
y = 3.14*3.14; + 2;
In this case, the extra semicolon ends the expression, and the "+2" becomes a "null effect" statement. Hopefully, it gets a warning, but it will still run.
Spaces between macro name and left bracket
In a macro definition with parameters there cannot be any whitespace (spaces or tabs) between the macro name and the left bracket. Spaces indicate to the preprocessor where the replacement text begins. If there is space, the preprocessor assumes that the definition is for a parameter-less symbolic constant and that the left bracket is part of the replacement text. For example, consider the incorrect macro definition:
#define abs (x) (x > 0) ? (x) : (-(x)) // ErrorThe identifier abs is replaced wherever it appears by:
(x) (x > 0) ? (x) : (-(x))
As in most occurrences of this error, the above example will cause a compilation error.
Assignment operator in macro declaration
A less common misconception about declaring symbolic constants using #define is that they require syntax similar to that for initializations. The macro definition below shows this error:
#define MAX = 10
This has an extra "=" operator that shouldn't be there. Fortunately, most instances of this error will cause a syntax error during compilation wherever the symbolic constant is used.
Macro with Braces
Don't use braces around macros. This issue usually just causes a compilation syntax error.
#define SQUARE(x) { x*x }
The braces are simply wrong. Consider this call to the macro:
y = SQUARE(3.14) + 2;
The expansion becomes:
y = { 3.14*3.14 } + 2;
This doesn't even compile properly, which is fortunate because this macro is buggy in several other ways.
Macro Replacement Needs Parentheses
Consider this macro:
#define TWICE(x) x + x
Consider this usage:
y = 5 * TWICE(3.14); // 5 times 2 pi (10 pi)?
This expands out to:
y = 5 * 3.14 + 3.14;
But the "*" multiplication operator has higher precedence than "+", just like in High School math class, and this actually evaluates as:
y = (5 * 3.14) + 3.14; // 6 pi!
The solution is that the macro's replacement body needs parenthesis at the start and end to avoid operator precedence problems like this.
#define TWICE(x) ( x + x ) // Better-ish
Note that it's not just "+" and macro precedence issues are a problem if the macro has any operator. This is true even of high-precedence operations like type casts or unary ++/-- operators. Note also that the TWICE macro above is still buggy with another different precedence error (it needs parentheses around each "x" parameter).
Macro Parameters Needs Parentheses
Every usage of a macro parameter needs to be inside its own set of parenthesis (round brackets). Consider this macro:
#define SQUARE(x) ( x * x ) // Buggy
Now consider this usage:
y = SQUARE(1 + 3.14); // 1 plus pi all squared?
This expands out to:
y = ( 1 + 3.14 * 1 + 3.14 ); // Bug!
Again, there's an operator precedence problem because "*" is done first, then "+" sums. This expands out as if the expression was:
y = ( 1 + (3.14 * 1) + 3.14 ); // 1 plus 2 pi
The result is a different computation to expected, which we'll politely call a feature rather than a bug. The correction is to add parentheses around every parameter "x" in the definition of the SQUARE macro:
#define SQUARE(x) ( (x) * (x) ) // Better
Again, this need for parenthesized macro parameters isn't specific to macros using the "*" operator, and applies generally. We have to do this to parameters in any macro that is mimicking an arithmetic computation.
Macro Parameter Side-Effects
There's still a major problem with the SQUARE and TWICE macros: they have their macro parameter "x" appearing twice in an expression. Although we've fixed the operator precedence errors with parenthesis around each "x" and around the whole body, there's still an error with "side effects".
Consider this buggy code:
#define SQUARE(x) ( (x) * (x) ) i = 3; j = SQUARE(i++);
And i should be 4, because it was 3, and then ++, so it's 4 now? Nope. It's actually 5, because the "++" side-effect happened twice. The code expands to have "i++" appearing in both places:
j = ( (i++) * (i++) );
Now, for another problem, consider j. Does this code correctly compute j=3*3=9? It's actually unclear, and it could compute j as having a value of 9 or 12. The most likely value is 12, because it computes the left-hand side first ("i++") which is 3 in value, but changes i to 4, and then the second use of i is already 4 (from the first ++), so this computes as 4, giving j=3*4=12, rather than 9. But it's not actually guaranteed to go from left to right. The compiler is allowed to compute the RHS before the LHS for some operators (e.g. "+" or "*"), and this code actually has an undefined "order of evaluation" error, which is another complicated issue.
Conditional side-effect macro problems: The problem is even less obvious in macros like "min" and "max", where the number of times a side effect occurs depends on the values of the arguments. For example, consider the following definition of min:
#define min(x, y) (x < y) ? x : y
And the macro call:
c = min(a++, b);
This expands out to give:
(a++ < b) ? a++ : b
You can see that the "a++" side effect is executed twice if a
Note that in macros where each parameter appears only once, there is usually no problem
with side effects. For
example, this is fine as any side effects are executed exactly once.:
#define twice(x) ((x) << 1)
Macros and Side-Effects: However, some macros where parameters appear only once can still give trouble — if some parameters are not evaluated due to short-circuiting. Some side effects may not be executed at all. For example:
#define choose(x, y, z) ( x > 0? y : z )
Depending on x, any side effect of y or z may or may not be executed.
Types of Side Effects: Note that there are several types of side effects that can interact wrongly with this macro. A side effect is anything that changes a variable:
- Increment (++) or decrement (--)
- Assignment operator (=)
- Extended assignment operators (e.g. +=, *=)
- Function calls
- Member function calls
There's not really a good way to fix this double-side-effect problem in the macro. The best idea is probably to use an "inline" function instead of a macro.
Macro if without else (dangling else)
Generally speaking, a macro should not use the control flow statements such as "if" statements, or "while" or "for" loops. But if you use an if, it has to have a matching else, otherwise there's a "dangling else" error.
Consider this yassert macro:
#define YASSERT(cond) if (!(cond)) yassert_fail(#cond)
Firstly note that there's no semicolon at the end. A macro doesn't get a semicolon even as an if statement.
Another point to note is that having the parameter name twice with "cond" and "#cond" (stringized with the "#" operator) is not a double-side-effect risk. Rather, the #cond stringize operator becomes a compile-time string constant, rather than being a second use of the parameter as an expression.
The above YASSERT macro will actually work fine in most cases, which makes it an insidious bug. This will only fail when it's expanded just before another "else" keyword, that's not inside braces. Here's the example with YASSERT just before an else statement:
if (ptr == NULL) YASSERT(ptr); else { ... // normal code }
This expands out to be:
if (ptr == NULL) if (!(ptr == NULL)) yassert_fail("ptr == NULL"); else { ... // main code }
But the "else" clause now has two "if" statements before it, and matches the wrong one. This problem is actually a "dangling else" problem hidden behind the macro expansion. The control flow has become incorrectly equivalent to:
if (ptr == NULL) { if (!(ptr == NULL)) yassert_fail("ptr == NULL"); else { ... // main code } }
Suddenly, the whole block of main code does nothing! The main block of code is never executed in the normal case where "ptr" is non-NULL. Instead, it's only executed as part of the exception handling for when ptr is NULL. This is definitely a bug.
The way to correct the dangling else bug, is to make sure that the macro has its own "else" to match its "if".
#define YASSERT(cond) if (cond) { } else yassert_fail(#cond)
There's no dangling else now, because there are two if's and two else's. However, there's an obscure bug if a semicolon is accidentally missing after the YASSERT macro call. It's better to wrap the whole if statement inside another loop via the do-while(0) trick.
#define YASSERT(cond) \ do { \ if (cond) {} else yassert_fail(#cond); \ } while(0)
Using the do-while(0) trick, if you now forget a semicolon, it's a compiler error, rather than an insidious mistake.
A better way to fix the YASSERT macro to be more function-like is to make it more expression-like. Instead of if statements and braces, we can do some operator trickery.
Another way to do this macro is to rely on the "short-circuiting" execution of the && or || operators. The && operator skips the second operand if the first condition is false. The || operator does the reverse, skipping the second operand if the first condition is true. This skipping of the second operand in half the situations is called "short-curcuiting" and has been a guaranteed part of the C++ standard, from the very early days. One way to remember how it works is think about these operators as if "&&" means "and-then" whereas "||" means "or-else".
The short-circuiting features of these operators in C++ gives a simple assertion macro using the || operator:
#define YASSERT(cond) ( (cond) || yassert_fail(#cond) )
The && operator could be used with a negated condition:
#define YASSERT(cond) ( !(cond) && yassert_fail(#cond) )
The only extra wrinkle here is that "yassert_fail" is now part of an expression and cannot return "void" type, because this will give a compilation error. So, "yassert_fail" has to have return type "bool" and return a dummy value.
Accidental macro expansion (Namespace collisions)
Macro expansion occurs before any other phase of compilation. Consider what happens to the function definition below if "min" is already defined as a macro:
int min(int x, int y) { ... }
The macro expansion occurs in the middle of the function declaration, leading to absurd syntax. This causes the compiler to output nonsensical error messages and garbled diagnostic lines. The solution is to undefine the macro name before the function definition using:
#undef min
The #undef line is also needed in any other file that calls the "min" function
A more dangerous problem is that if the macro is defined where the function is called, the macro will be invoked without problem, and the function will never be called! This is a far worse problem because it will not cause a compilation error. An example of this problem would be trying to define your own version of the "getchar" function. The problem is that getchar is actually a macro and will expand out before compilation — the new getchar function is never called.
Multi-Statement Macros without Braces
Consider this macro with two statements separated by a semicolon:
#define INCBOTH(a,b) a++; b++
This will work in many cases correctly, such as in a standalone statement.
INCBOTH(x,y);
But consider this usage:
if (x == 0) INCBOTH(x,y);
This expands out to:
if (x == 0) x++; y++;
This is accidentally equivalent to a different effect:
if (x == 0) { x++; } y++;
Comma operator trick. A good way to correct this is to use the comma operator to separate the two expressions, instead of the semicolon.
#define INCBOTH(a,b) a++, b++ // Comma!
And we still have to fix all the missing parentheses around each parameter, and also around the whole thing, to avoid precedence bugs. The final result is:
#define INCBOTH(a,b) ( (a)++, (b)++ ) // Better
Non-Expression Multi-Statement Macros
Consider the issue with a multi-statement macro that is too complex to become a comma operator. Consider the swap macro below:
#define swap(x, y) temp = x; x = y; y = temp // Bug
This has the same bug as above, because the macro should have braces around it. When called by:
if (a > b) swap(a, b);The erroneous result is:
if (a > b) temp = a; a = b; b = temp;
This is accidentally equivalent to:
if (a > b) { temp = a; } a = b; b = temp;
Only the first statement is considered to be the statement for the if.
There are a number of possible solutions. Placing braces around any sequences of statements prevents the problem:
#define swap(x, y) { temp = x; x = y; y = temp; } // Improved
This is satisfactory because it will prevent the above usage bugs. However, this gives an obscure minor problem with else statements. The code below will cause a compilation error because of a semicolon after a right brace and before an else statement:
if (...) swap(a, b); /* semicolon causes error */ else ...
In this case, a special form of do loop can be used:
#define swap(x, y) do { temp = x; x = y; y = temp; } while(0)
This macro avoids syntax error problems with a semicolon before an else statement because the semicolon terminates the do loop statement. Note that the block of statements inside the loop is only executed once because while(0) is always false. A similar solution is to use an if statement with a condition that is always true:
#define swap(x, y) if (1) { temp = x; x = y; y = temp; } else
This also works correctly because the semicolon ends the if statement. However, this method is slightly worse than the do-while(0) loop trick because accidentally omitting the semicolon after the macro call may silently introduce a major bug (although some compilers will report an unreachable statement warning), whereas a compilation error occurs when the do-while(0) form of the macro is used. One minor problem with both these solutions is that lint will warn about a "constant in conditional context".
A slightly different solution is to use a different style for all if statements and loops — use blocks instead of single statements. This is generally sound practice, and can be combined with any of the other solutions, just to be safe. Admittedly, this solution leads to longer programs (extra lines), and sometimes less clear programs.
Macro with Multiple Statements in Braces
Consider this macro again:
#define INCBOTH(a,b) a++; b++
One way to try to correct this issue is to use braces in the macro:
#define INCBOTH(a,b) { a++; b++; }
Also, we should parenthesize the macro parameters to avoid tricky operator precedence issues:
#define INCBOTH(a,b) { (a)++; (b)++; }
This example works in many cases. But this will get compilation errors in some usages, such as immediately before an "else" statement:
if (x == 0) INCBOTH(x,y); // Compilation error! else { ... }
The solution with the comma operator only works for simple expression statements. If you need a block of code that's much bigger, with many statements, or with non-expression statements, then the comma operator won't work.
The "do-while(0)" trick comes to the rescue. A general way to fix macros with multiple statements in braces is to still use braces, but then wrap them with "do" and "while(0)" like this:
#define INCBOTH(a,b) do { (a)++; (b)++; } while(0)
Note that there's no semicolon or right brace at the end of the macro. This "do-while(0)" wrapper works in general, even before an else statement, because its syntax doesn't end with a right brace.
How does this work? The do loop will execute the block of code once to start the loop, and then the dummy "while(0)" test at the end will fail, and the loop doesn't get executed twice. And we are relying on the C++ optimizer to note that "while(0)" is never true, so it simply removes this dummy test.
Note that a similar attempt to use an "if(0)-else" trick with similar effects doesn't actually work as well:
#define INCBOTH(a,b) if (0) {} else { (a)++; (b)++; }
This "if(0)" version still ends with a right brace, and will still get a compilation error immediately before an "else", because there's a semicolon in the wrong place. Similar tricks with "while(0)" or "for(;0;)" are much worse, not working at all, because they never execute the code even once. So, the "do-while(0)" trick is the best option if the comma operator won't work.
And a final point is that maybe you shouldn't ever use this neat do-while(0) trick! If it's a big chunk of code that you're trying to cram into a macro, it really should be an "inline" function instead of a tricky macro.
Aliasing and macro parameters
Aliasing problems are not limited to pointer variables or reference parameters. Another situation is that the names of macro parameters can be aliases for the same object. Let us consider a well-known trick for swapping two integral variables without using a temporary variable via bitwise-XOR. The trick uses properties of the bitwise exclusive-or operator, ˆ. The following code sequence will (usually) swap the values of x and y:
x ^= y; y ^= x; x ^= y;
The clever manner in which this code fragment swaps two values can be examined by following the effects of all three statements on the four different bit patterns that each bit of x and y could have (the ^ operator applies to each bit individually):
x 0 0 1 1 y 0 1 0 1 -------------- x 0 1 1 0 (x ^= y) y 0 1 0 1 (Unchanged) -------------- x 0 1 1 0 (Unchanged) y 0 0 1 1 (y ^= x -> original x value) -------------- x 0 1 0 1 (x ^= y -> original y value) y 0 1 0 1 (Unchanged -> original x value)
Therefore, it seems that it is possible to write a clever macro to efficiently swap two integers without wasting memory on an extra variable, as follows:
#define swap_int(x,y) (x ^= y, y ^= x, x ^= y)
However, there is a hidden aliasing error in using this method. What is the result when the macro parameters x and y are aliases for the same location, such as below?
swap_int(a, a); /* DANGEROUS ALIASES */
The result is that all bits of "a" are set to zero. All of the three statements perform "a^=a", which sets "a" to zero. Therefore, this method of swapping works only when "x" is not an alias for the "y" argument. Note that the alias might be more obscure, as in the following call when i and j are equal:
swap_int(a[i], a[j]); // Alias if i==j
Macro side effects in XtSetArg
A real-world example of the macro double side-effect problem is the XtSetArg macro, which is part of the toolkit for X Windows programming. The first parameter of XtSetArg appears twice in the macro replacement text, and thus the following is a common pitfall for novice X Windows programmers:
XtSetArg(args[n++], XmNwidth, 500); /* ERROR */
Instead, a typical usage style for XtSetArg that avoids the error is:
XtSetArg(args[n], XmNwidth, 500); n++;
Lost type checks in macros
Because the replacement of macro parameters is by simple text replacement, there is no type checking of these parameters. However, wherever the parameter is used, type checking of its usage inside the replacement text may cause compilation errors. This is no guarantee, and types of arguments to macros should be checked carefully by hand. An example of an error caused by the use of macros is:
#define PRINT_FLOAT(x) (printf("%f", (x))) ... PRINT_FLOAT(3.14); /* OK */ PRINT_FLOAT(3); /* ERROR */
The problem with this macro is that the implicit conversion of integral types to double is not performed by macro replacement of parameters, whereas it would be performed if PRINT_FLOAT were a function accepting x as a double parameter. Passing the integer constant 3 fails because it has type int instead of double and therefore corrupts the list of arguments. This macro can be improved by adding an explicit type cast to remove the problem:
#define PRINT_FLOAT(x) (printf("%f", (double)(x)))
The loss of type checking within macros is yet another reason to avoid using preprocessor macros in C++. The use of inline functions can achieve the same efficiency gains without the dangers.
Name clashes involving macro local variables
There are some very obscure errors that the use of parameterized macros can cause. One problem is a clash between the names of local variables and variables in a macro argument. Consider the following macro to swap two integers:
#define swap_int(x,y) \ { int temp; \ temp = x; x = y; y = temp; \ }
This macro will work correctly for almost all arguments. However, there is one dangerous situation when an obscure error may occur. Consider what happens when one of the macro arguments is "temp," as may occur if the programmer uses the name "temp" frequently. The macro call:
swap_int(temp,temp2);
will expand to become:
{ int temp; temp = temp; temp = temp2; temp2 = temp; }
Instead of using the temp in the argument, the macro uses only the local variable, and the results are erroneous. Although a good compiler will warn that temp is "used before set," many compilers will compile this code without warning.
The best solution is to always take care in naming local variables created by macro definitions. The name could be capitalized, given a unique prefix, or changed by any other method to any name, making it unlikely that the rest of the program will use such an identifier. One common solution is to use _temp.
This error would not arise were swap_int declared as a function with parameters x and y — another reason for C++ programs to use inline functions rather than macros.
Conditional compilation error: name misspelled
The preprocessor does not check whether macro names used in #if, #ifdef, or #ifndef directives are actually the correct names. An error that is very difficult to trace can occur if such a macro name is misspelled. Consider a program using the macro name TEST_DRIVER to control bottom-up testing of a single source file. If this macro is defined (by a -D compiler option) then this indicates that testing is currently in progress and the function performs many self-tests. If the macro is not defined this indicates creation of the production version and the function runs more quickly without performing any tests. Some of the code might look like:
void do_something(void) { int i = 0; #ifdef TEST_DRIVER int count = 0; #endif for (i = 0; test(i); i++) { #ifdef TESTDRIVER /* TYPO HERE! */ count++; #endif } #ifdef TEST_DRIVER assert(count > 0); /* loop entered at least once */ assert(count <= 10); /* but not more than 10 times */ #endif }
The self-testing code surrounded by conditional compilation tests involving the TEST_DRIVER macro name is intended to check that the loop is actually entered at least once, and not more than 10 times. However, the unfortunate programmer who tests this module will find that the first assert macro will always fail — apparently the loop is never entered. But there is no error in the program flow; the error is a typographical mistake in the #ifdef directive. The underscore in TEST_DRIVER has been accidentally omitted, and the #ifdef always fails — the count++ instruction is never executed regardless of whether the code is compiled in self-testing or production mode.
This form of error is not found by any compilers I know, although future compilers might one day be smart enough to watch out for "suspiciously similar" macro names. The error can arise in the use of any of the #ifdef, #ifndef, #if, or #elif directives and I'm not aware of any particular programming style that can avoid this hazard.
Using #define instead of typedef
A common macro error is to use #define to declare a type name. There are no problems for basic types or structure types, and the following declarations are quite safe:
#define INTEGER int #define PIECE enum piece #define NODE struct node
However, there is a subtle error when #define is used to declare a pointer type. The problem is typified by the declaration of a string type as:
#define string char * /* ERROR */
This declaration is dangerous because of the strange binding rules of the * token in declarators. There is no problem for declaring single variables, but an error arises when two variables are declared in a single declaration:
string s; /* OK */ string s1, s2; /* ERROR */
The problem is that the second declaration expands out to become:
char *s1, s2;
Because of the binding rules of * the second variable s2 is not declared as a pointer type. The above declaration is equivalent to:
char *s1; char s2;
Fortunately, this mistake using #define for a pointer type usually causes a compilation error alerting the programmer to the problem (although what the problem is may be difficult to see!). The solution to this problem is simply to use a typedef declaration for all type declarations (good style dictates this even for nonpointer types). A typedef name has different binding rules for * and the following sequence works correctly: typedef char * string;
string s1, s2; /* OK */