Aussie AI

Appendix: C++ Portability Bug Catalog

Bonus Material for "Generative AI in C++"

by David Spuler, Ph.D.

There are multiple ways that a C++ program can behave differently on a new platform. Minor problems cause compilation errors and are easily resolved. More insidious portability problems cause crashes or unexplained changes in behavior. So, here is my catalog of obscure portability bugs or "pugs" where the C++ code might run differently on a new platform.

The first point is that the computer platforms differ in significant ways. There are portability issues in areas such as:

File and directory structure
Builtin and intrinsic functions
Memory layout and address structures
Operating system routines
Character set representations
Signal handling

The second point is that changed runtime behavior might just be a bug in your code. If your C++ fails on a different platform, it's most likely to be some kind of latent memory or pointer error that's been luckily in hiding on your main platform, but has been brought to the fore by a different compiler and optimizer stack using different memory layouts and data addresses.

Finally, there are also parts of C++ that are not guaranteed to be the same, and these pugs can also be the cause. Technically, there's a difference between "undefined behavior" and "implementation-defined behavior" in standardized C++ features. However, it bores me to tears to write about it, and it's really a distinction without much difference: either way, if your code does one or the other, you've got a pug.

Expression portability issues

C++ has all of those lovely arithmetic operators to do all sorts of things. Unfortunately, not all of them are fully standardized across all platforms. Some of the portability differences can include

Sizes of basic C++ data types
Integer division on a negative
Integer remainder on a negative
Bitshift on a negative signed integer.
Whether "char" means "signed char" or "unsigned char"
Order of evaluation of operands of various operators.
Order of evaluation of function arguments
Zero isn't necessarily zero.
Functions that should be Boolean are not always (e.g. isdigit)
Functions don't return defined results (e.g. strcmp, memcmp, etc.)
Initialization order for static or global objects.

And that's certainly not the full list. Here is some more information on some of these items.

Signed right bitshift is not division

The shift operators << and >> are often used to replace multiplication by a power of 2 for a low-level optimization. However, it is dangerous to use >> on negative numbers. Right shift is not equivalent to division for negative values. Note that the problem does not arise for unsigned data types that are never negative, and for which shifting is always a division.

There are two separate issues involved in shifting signed types with negative values: firstly, that the compiler may choose two distinct methods of implementing >>, and secondly, that neither of these approaches is equivalent to division (although one approach is often equivalent). It is unspecified by the standard whether >> on negative values will:

(a) sign extend, or

(b) shift in zero bits.

Different compilers must choose one of these methods, document it, and use it for all applications of the >> operator. The use of shifting in zero bits is never equal to division for a negative number, since it shifts a zero bit into the sign bit, causing the result to be a nonnegative integer (dividing a negative number by two and getting a positive result is not division!). Shifting in zero bits is always used for unsigned types, which explains why right shifting on unsigned types is a division.

The second method of implementing right shift on negative values is "sign extension". This is similar to division, but is not always equivalent. However, note that integral division of negative values is also not well defined by C++ (i.e., rounding down or rounding up), and so it may happen that / and >> produce the same results. Try the following statement to examine whether >> and division are the same:

   printf("-17 / 2 = %d, -17 >> 1 = %d\n", -17 / 2, -17 >> 1);

8-bit byte integer wrap around

Integer overflow can lead to an interesting problem. Consider the following code to print out the 256 characters in a particular implementation's character set:

    for (char ch = 0; ch <= 255; ch++)  // Bug
        printf("%d %c\n", ch, ch);

There are a few problems with this loop. It never exits, but the way that it fails is undefined.

In C++ it is implementation-specific whether the type "char" behaves as "signed char" or "unsigned char" on any system. If char is signed by default it cannot hold the values 128..255 and the loop test above cannot fail.

However, even if char is unsigned by default (unusual), there is another error causing the loop never to terminate. As soon as the variable ch reaches 255, it is incremented, but instead of becoming 256 it "wraps around" (overflows) to become 0. The for loop condition never fails.

Similar problems can occur with short, int, and long, as these have only a finite range of values and can overflow, which wraps around to zero (for unsigned types) or negative (for signed types).

Divide and remainder on negative integers

Extreme care is needed when the integer division and remainder operators / and % are applied to negative values. Actually, no, forgot that, because you should never use division or remainder in an AI engine, and if you must, then you choose a power-of-two and use bitwise operations instead. Division is unsigned right bitshift, and remainder is bitwise-and.

Anyway, another reason to avoid these operators occurs with negatives. Problems arise if a program assumes, for example, that -7/2 equals -3 (rather than -4) . The direction of truncation of the / operator is undefined if either operand is negative; therefore the following are undefined:

    -7 / 2
     7 / -2
    -7 / -2

It is likely that the last case, -7/-2, is implemented equivalently to 7/2 with rounding down to the result 3. However, the C++ standard does not specify that dividing two negatives is well defined, and thus it is safer to assume that it is not.

Similarly, the % remainder operator is undefined on negative operands. Don't assume that -1%3 equals 2 (rather than −1). However, if n>0 and p<0 it is guaranteed that n%p is in the range:

    p < n % p < -p

Therefore, x%-5 is between -4 and 4, inclusive.

How helpful! I mean, that's what I want in programming, is for my arithmetic to be kind of vaguely in the right ballpark!

Note that this restriction can also be formed as:

    abs(n % p) < abs(p)

Although the above discussion seems to indicate that the results of integer / and % operators on negative operands are very unpredictable, there is a relationship between / and % that restricts the implementation. An implementation must ensure that the following identity remains true for x and y (except y==0):

    (x/y)*y + x%y == x

Zero Is Not Always Zero?

You probably assume that a 4-byte integer containing "0" has all four individual bytes equal to zero. It seems completely reasonable, and is correct on many platforms. But not all. There's a theoretical portability problem on a few obscure platforms. There are computers where integer zero is not four zero bytes.

Really? Well, actually, I just went scouring the internet for information on a real platform where this is the case, and couldn't find anything. Maybe it's some obscure old platforms from the 1980s, when the ANSI C standards were first being created? In any case, it's only a few lines of code that you can add to the startup initialization (or not).

If you want to check, here's a few lines of code for your platform portability self-check code at startup:

    // Test zero integer portability
    int i = 0;
    unsigned char* cptr = (unsigned char*) & i;
    yassert(cptr[0] == 0);
    yassert(cptr[1] == 0);
    yassert(cptr[2] == 0);
    yassert(cptr[3] == 0);

Actually, that code isn't very portable! It's assuming 32-bit int size. Here's some more general code:

    int i2 = 0;
    unsigned char* cptr2 = (unsigned char*)&i2;
    for (int i = 0; i < sizeof(int); i++) {
        yassert(cptr2[i] == 0);
    }

Null Pointer is Zero: The NULL pointer is probably all-bits-zero on all platforms. But you might as well be sure, so here's the code to check NULL in a "char*" type:

    // Test pointer NULL portability
    char *ptr1 = NULL;
    unsigned char* cptr3 = (unsigned char*)&ptr1;
    for (int i = 0; i < sizeof(char*); i++) {
        yassert(cptr3[i] == 0);
    }

If you have a very low risk tolerance, you can also duplicate this code to check "void*" and "float*" pointer types set to NULL are all zero-bits.

Initialization to Zero: If you have a big object, or a long array, it's very slow to initialize every object field, or every array element, explicitly to zero. The faster method is to use memset to set every byte to zero, or alternatively, to use calloc to allocate memory that is already full of zero bytes. These optimizations rely on integer zero and floating-point zero and pointer NULL all being a sequence of zero bytes.

The fast code is typically something like this:

    const ARRSIZE = 512;
    float arr[ARRSIZE];
    memset(&arr, 0, ARRSIZE * sizeof(float));

Or you can do other variants:

    memset(&arr, 0, sizeof(arr));  // Option #2 (a bit risky)
    memset(&arr, 0, ARRSIZE * sizeof(*arr));  // Option #3

This works just fine, provided your platform is a normal one in its handling of zero for int, float, and pointers.

Bug Alert! Be careful with the second sizeof option for arrays that are function parameters, because C++ converts arrays to pointers (because arrays are passed-by-reference in C++).

    void initmyarray(float arr[512]) 
    {
        memset(&arr, 0, sizeof(arr));  // Option #2 is BROKEN!
        ...
    }

The problem is that "arr" is converted to a pointer type with size only 8 bytes, which is the size of "float*" pointer type. So, you didn't really initialize the whole array of size 512. And there's no warnings at compile-time or run-time about this insidious bug.

Memset Wrapper Trick: The only way to catch the bug with sizeof and array parameters is to use your own wrapper around memset calls, and add an explicit runtime test. Here's an example memset wrapper:

    void yapi_memset_wrapper(char* addr, int c, int sz)
    {
        if (sz == sizeof(float*)) {
            yassert(sz == sizeof(float*));  // Probable error!
        }
        if (sz == 0) {
            yassert(sz != 0);  // Wrongly reversed parameters?
        }
        memset(addr, c, sz);  // Call the real deal
    }

And if you want to be sure, you can force memset to use the wrapper:

    #define memset memset_wrapper

Another less imposing way to force yourself to always use the memset wrapper is the kind reminder method:

    #define memset please_use_memset_wrapper_instead

You'll need to add an "#undef" before the real call to memset in the wrapper code (recursion, anyone?). And you probably can't safely redefine memset before including the standard libraries, so don't do it in a "-D" option or MSVS project property. Instead, put it in your own header files, which should be included after the standard library headers. And it's up to you whether or not to leave these debugging wrapper tests in production code.

Floating-Point Zero: Similarly to integer zero, you probably assume that a 4-byte float type with value "0.0f" also has 4 zero bytes (i.e. all 32 bits are zero). You're correct provided your platform is following the IEEE 754 standard for 32-bit floats, which it should be. You can test it explicitly with portability self-testing code:

    // Test float zero portability
    float f1 = 0.0f;
    unsigned char* cptr4 = (unsigned char*)&f1;
    for (int i = 0; i < sizeof(float); i++) {
        yassert(cptr4[i] == 0);
    }

Zero is a special case floating-point number. Because the mantissa has an implicit extra prefix 1 digit, putting all zeros in the mantissa actually means "1.0" (times two raised to the power of the exponent), not "0.0". For an 8-bit exponent with a 127 offset, the all-bits-zero value of the exponent bits is not a zero exponent value, but actually "-127". So, the standard for a "float" treats "1.0 x 2^-127" specially, as if it was exactly zero. The IEEE 754 standard gets around this by treating it as a special case if all of the bits of the exponent are also zero. When all the bits are zero, by definition and signed in triplicate by the IEEE committees, it means "0.0" in 32-bit floats or 64-bits for double.

Negative Zero: Another problem is that floating-point has two zeros. There's a "negative zero" in the standard IEEE 754 representation of floating-point numbers. This has all 0 bits for both the exponent and mantissa, like the normal zero, but the sign bit is set, as if to indicate a negative number. This is negative zero (i.e. "-0.0"), and its arithmetic value is the same as zero (i.e. "0.0"), but not all the bits are zero. Hence, if you assume that float type comparisons with zero (e.g. "x==0.0f") are a simple integer comparison with an all-zero-bits 4-byte integer, actually the compiler has to consider two different values. (Maybe x86 assembler has a single opcode that does this in one cycle?)

Function pointers larger than void* pointers

A portability problem that can arise on some machines is that code addresses differ in size from data addresses. This creates problems if a program ever uses the same type to represent pointers to data and pointers to functions.

The most common such mistake is the erroneous belief that the generic pointer type "void*" can hold pointers to functions. For example, it is incorrect to use a void* type to represent a function pointer of unknown type, such as when two types of function pointers are being passed to a function:

    typedef char (*char_fn)(void);
    typedef char (*int_fn)(void);
    typedef void * generic_fn_ptr; /* WRONG! */

    void output(int type, generic_fn_ptr fn)
    {
        if (type == CHAR)
            printf("%c", (*(char_fn)fn)());
        else
            printf("%d", (*(int_fn)fn)());
    }

The above function will usually fail whenever data pointers are smaller than function pointers, which may occur on some large machines.

The correct type of a generic function pointer is void(*)(), which can be used to represent any type of pointer to function (but not any type of pointer to data). The correct declaration is:

    typedef void (*generic_fn_ptr)();

To examine the different representations of data pointers and code pointers, try executing the following program in your environment. Possibly the values will be the same, or they may differ. Onapersonal computer, try executing the program when compiled using different memory models:

    #include <stdio.h>
    int main()
    {
        printf("data ptrs = %d, function ptrs = %d\n",
            (int) sizeof(void*), (int) sizeof(void(*)()) );
    }

Wrap around of clock ticks

The clock function returns the number of clock ticks since the program began execution. If the system clock is incremented every microsecond, or more frequently, the number of clock ticks rapidly increases. The declaration of clock_t is usually a "long", but even this has only a finite maximum value. After a certain amount of time, the number of ticks will be larger than that which can be represented by clock_t, and will wrap around (either to a large-magnitude negative value if clock_t is a signed type, or to zero if clock_t is an unsigned type). This issue is only a problem on some systems.

Calculation of an address outside an array

It is common on many implementations for an access or dereference of an out-of-bounds address to cause a run-time program failure (e.g., an array index being out of bounds). However, on some implementations even the calculation of such an address without a dereference can cause a failure. Even if the address is an intermediate address that is never used this can cause a failure. Because of such implementations, the C++ standard defines the situation when an illegal address is computed as "undefined."

A common situation where this can be important is the use of a common trick to achieve arrays addressable via indices 1..n instead of 0..n-1. The trick is to use a second pointer variable to act like an array:

    int a[10]; /* a[0..n-1] */
    int *b = a - 1; /* b[1..n] */

Unfortunately, the computation of "a - 1" is an illegal address and computing it may cause a run-time failure before b is even used. It is a shame that such a useful technique for changing array indexing is subverted by this portability problem.

Divide-By-Zero Doesn't Always Crash

You want your code to crash? Well, maybe, and it usually does for a divide-by-zero, whether it's integer or floating-point arithmetic. The same is true for the integer modulo (%) operator when given a zero second operand.

First point, though, smack yourself on the wrist for even using division in the first place. It's slower than a toad race in Alice Springs. Instead, floating-point division should be changed to multiplication by the reciprocal, and integer division can probably be a right bitshift (>>) operator. Integer modulo (%) by a power-of-two (e.g. "x % 512") can be replaced with an unsigned bitwise-and (&) operator with the operand one less (i.e. "(unsigned)x & 511u").

Anyway, there are some platforms where dividing by zero doesn't core dump or BSOD or whatever. So, sometimes you can have an insidious bug where it's dividing by zero, but there's no indication of an error. And that would probably make your AI engine do some squirrelly illogical stuff.

Problems with size_t

C++ attempts to enhance portability by declaring various types as special names in header files. For example, size_t and ptrdiff_t are defined in <stddef.h>, and time_t and clock_t are defined in <time.h>. In most situations there is no problem with the use of these types, but in the situations where the compiler does not check types there can be problems.

The main danger is when passing these types as arguments to a variable-argument function. A common error is:

    printf("size of int = %d\n", sizeof(int));   // Bug

The sizeof operator returns type size_t, which may or may not be an int. The code above will fail if size_t is a long. The solution is simply to use a cast:

    printf("size of int = %d\n", (int) sizeof(int));  // Correct

Note that the C++ output style is better as it does not need a cast:

    std::cout << "size of int = " << sizeof(int) << std::endl;  // Correct

scanf size_t portability bug: An even worse problem arises with the use of scanf because type casts cannot solve the problem. Consider what happens if the user intentionally uses the type name size_t as the type of a variable:

    size_t n;
    scanf("%d", &n); // Bug

This program will fail if size_t is not the same size as an int. Hence, the use of size_t as the type of variables is inherently dangerous with scanf.

Problems with clock_t

Unsigned int problems can arise with the use of any of the standard type names. An example involving clock_t is:

    printf("%d seconds used\n",
        clock() / CLOCKS_PER_SEC); // Bug

The corrected code using a type cast is:

    printf("%d seconds used\n",
        (int)clock() / CLOCKS_PER_SEC);  // Correct (type cast)

Problems with ptrdiff_t

An example of a portability glitch involving ptrdiff_t occurs in a common idiom for calling the bsearch library function to search a sorted array:

    int *result, arr[HOW_MANY], key, n = HOW_MANY;
    ...
    result = bsearch(&key, arr, n, sizeof(int), cmp);
    printf("Found at index %d\n", result - arr); // Bug

The correct code is to add a typecast to int:

    printf("Found at index %d\n", (int)(result - arr));

Multi-byte character constants

Because C++ compilers accept more than one character in a char constant, there are some common errors made by novices. The compiler might not complain about a space in a char constant, as below:

    if (ch == ' A')   // Bug

Note that this is not the right way to declare a "wide" character constant. Instead, C++ treats this as a mult-byte character constant. Some platforms might treat the code as testing whether ch is a space. Other platform will calculate it as a multi-byte value. Assuming an ASCII character set where A has value 65 and a space has value 32, the value of the character constant ' A' is often 65*256+32, or 32*256+65, or even just 32.

Similarly, it is wrong to confuse string constants with char constants, as in the code below (single quotes instead of double quotes).

    printf('Hello World\n');   // Bug

However, this bug will almost certainly elicit a compiler message.

signed and unsigned char

The type qualifiers "signed" and "unsigned" can also lead to implicit type conversion problems. In particular, char variables are implicitly signed char on some platforms, and conversion of a character in the range 128..255 to an integer would yield a negative number.

Fortunately, in ASCII all common alphanumeric characters fall into the range 0..127 and present no problem. However, when accessing bytes from a binary file (i.e., characters in the range 0..255) or using a character set other than ASCII, the type "unsigned char" should be used for bytes.

For similar reasons, it is often bad practice to use the type "char" to represent small integral values because the possibility of overflow is so high. The type char cannot portably represent values outside 0..127. Using explicitly signed or unsigned char can increase the portable range to −128..127 and 0..255, respectively.

Character Test Exceptions on Windows

There's a problem with signed versus unsigned char when the value exceeds 127. I only get this on MSVS C++, and never on Linux, and it's very annoying. My code looks totally normal and standard with "ch" declared as a "char" type:

    if (isalpha(ch))

But it crashes in the MSVS IDE with a runtime exception if "ch" is a character >=128. In other words, whenever it's processing a UTF8 code.

It seems to be a "signed char" issue. The ctype macros seem to expect 0..255 numbers, but an 8-bit value in signed char type is a negative number.

My solution is a bit ugly. I find myself adding a type cast to every call to a <ctype> macro:

    if (isalpha((unsigned char) ch))

Probably I need to write an "isalpha_wrapper" function (and the other ones), and handle the correction that way. Or maybe I could simply make "isalpha_wrapper" a preprocessor macro that adds an unsigned char type cast whenever it's on Windows. I wish I had a better solution.

Unicode Special Characters

There are more Unicode characters than there are black holes in the Universe, which I didn't hear from Stephen Hawking. Whoever wrote the Unicode standard didn't get a lot of sleep time. There are different encodings for every language known to humankind, and probably a dozen Star Trek languages, too. And then there are love hearts and stars, and all sorts of emojis and icons. It's enough to confuse a poor little AI!

What to do about all these Unicode specials is a tricky question. Which ones are in the training data set? And does the AI engine tolerate them in the tokenizer anyway? The issue goes way beyond UTF8 versus Latin1 encoding issues, because there are simply so many different ones. I wish you the best of luck!

memcmp is not array equality

For equality tests on non-char types of arrays, the memcmp function might seem useful to test if two arrays are exactly equal. However, it is buggy too if there's padding in the array, such as arrays of objects, or arrays of structs. You might get away with it if you've allocated the array memory using calloc (rather than new or malloc), or have used memset to initialize the entire array block, including all its padding bytes, to zero. You cannot rely on constructor initialization alone to set the padding bytes to zero.

Plain int bit-fields

Bit-fields should be explicitly declared as either signed or unsigned int. If a bit-field is left to have type int, it is implementation-defined whether it will receive signed or unsigned type. For example, the following declaration relies on the compiler using unsigned representations by default:

    struct { ...
        int flag : 1;   // Bug
    };

This will fail if the compiler uses signed int because a signed type requires at least one bit to represent the sign bit. The correct declaration of the bit-field is:

    unsigned int flag : 1; // Correct

Order of evaluation errors

Humans would assume that expressions are evaluated left-to-right. However, in C++ the order of the evaluation of operands for most binary operators is not specified and is undefined behavior. This makes it possible for compilers to apply very good optimizing algorithms to the code. Unfortunately, it also leads to some problems that the programmer must be aware of.

Order of evaluation of operands is a different concept from order of evaluation of operators. The evaluation of operators is completely specified by brackets, operator precedence, and associativity, but evaluation of their operands is not. To see the effect of order of evaluation of operands, consider the expression:

    (a + b) * (c + d)

Which is to be evaluated first: (a+b) or (c+d)? Neither bracketing, precedence nor associativity decides this order. Bracketing specifies that both additions must be evaluated before the multiplication, not which addition is evaluated first. Associativity does not apply, because the two + operators are not near one another (they are separated by a * operator). Precedence does not apply because the brackets override it.

The order of evaluation depends only upon how the compiler chooses to evaluate the operands of the * operator. Intuitively, we would assume that the evaluation would take place left-to-right. However, the order of evaluation of the operands to the multiplication operator, and most other binary operators, is not specified in C or C++. Different compilers may choose different orderings.

Usually the order does not matter. In the example above, it does not matter which addition is done first. The two additions are independent, and can be executed in any order.

Side-effect problems: Problems arise when there are side-effects in the operands that are not executed left-to-right. If an expression involves side effects, the result is sometimes undefined. The result may be as intended, or it may be incorrect. Sometimes it will be correct when compiled without optimization, but incorrect after optimization. Results may differ on different machines and the code is not portable.

A side effect is an operation that affects a variable being used in the expression or affects some external variable used indirectly in the expression. The most common example of a side effect is the increment (++) or decrement (--) operator. Any assignment operation deep inside an expression is also a side effect to the variable it changes. A function call can also be a side effect if it modifies a variable used in the expression (as a parameter or globally), modifies some other external variable on which another subexpression depends (i.e., a global or local static variable), or if it produces output or accepts input.

To see the effect of side effects, consider the increment operator in the expression below. It is a dangerous side effect.

    y = (x++) + (x * 2);

Because the order of evaluation of the addition operator is not specified, there are two orders in which the expression could actually be executed. The programmer's intended order is left-to-right:

    temp = x++;
    y = (temp) + (x * 2);

The other incorrect order is right-to-left:

    temp = x * 2;
    y = (x++) + (temp);

In the first case, the increment occurs before x*2 is evaluated. In the second, the increment occurs after x*2 has been evaluated. Obviously, the two interpretations give different results. This is a bug because it is undefined which order the compiler will choose.

Function-call side effects: If there are two function calls in the one expression, the order of the function calls can be important. For example, consider the code below:

    f() + g()

Our first instinct is to assume a left-to-right evaluation. If both functions produce output or both modify the same global variable, the result of the expression may depend on the order of evaluation of the + operator (which is undefined).

Order of Evaluation of Assignment Operator: Order of evaluation errors are a complicated problem. Most binary operators have unspecified order of evaluation — even the assignment operators. A simple assignment statement can be the cause of an error. This error can occur in assignment statements such as:

   a[i] = i++;   // Bug

The problem here is that "i" has a side effect applied to it (i.e. ++), and is also used without a side effect. Because the order of evaluation of the = operator is unspecified in C++, it is undefined whether the increment side effect occurs before or after the evaluation of the i in the array index.

The intended order is left-to-right:

    temp = i; /* Programmer intended meaning */
    a[temp] = i++; /* Index is OLD value of i */

The incorrect alternative order that the compiler can also legally implement is right-to-left:

    temp = i++; /* Incorrect order */
    a[i] = temp; /* Index is NEW value of i */

Hence, This assignment has two interpretations, and is undefined. Is the array subscript the old or new value of i? It isn't clear whether a[i] or a[i+1] will be assigned the value, and it becomes a possible bug.

Fortunately, order of evaluation errors can be avoided by the simple policy of performing side effect operations in separate statements. For example, never increment a variable inside a complicated expression or in a function argument. The correct way to rewrite the above statement is:

    a[i] = i;   // Correct
    i++;

Another similar example would be:

   for (int i = 0; i < n; ) v2[i] = v1[i++];  // Bug

And the above loop should increment i separately in the for loop header:

   for (int i = 0; i < n; i++) v2[i] = v1[i];  // Correct

Safe evaluation order operators

Not all C++ operators have an unsafe order of evaluation. Some operators are safe because they have a defined left-to-right evaluation order. In early C++ versions there was the concept of "sequence points" to specify when evaluation order was fixed, but these were deprecated and removed from the language standard in C++11.

Safe operators: There is no order of evaluation problem with unary operators as they have only one operand to evaluate. The structure operators also give no problem as only one of the operands can be evaluated (the first). The second operand is a field name that needs no evaluation — it is not an expression.

Comma operator: The comma operator is completely safe because its specified order for evaluation of operands is left-to-right. Furthermore, it always evaluates both its operands.

Short-circuited logical operators: The logical operators, && and ||, are almost safe. Their specified order of evaluation is left-to-right. However, it is possible to not evaluate some operands. Depending on the value of the first expression, any side effect in the second expression can occur once or not occur at all. This occurs owing to the short-circuiting of logical operators.

Ternary operator: The ternary conditional operator (?:) has its order of evaluation fully specified: evaluate the first operand; if true then evaluate the second, else evaluate the third. Hence the order of evaluation is fixed, but, as with the binary logical operators, whether each operand is actually evaluated is not fixed. There is a problem when the second or third expression involves side effects. Depending on the value of the first expression, the side effect can occur once or not occur at all. As with the logical operators, side effects in the first expression pose no problem.

Binary operators: The ordinary binary operators are completely unsafe in terms of the order of evaluation (i.e. possibly buggy if an operand contains a side-effect). This includes the arithmetic, relational, bitwise, shift, assignment, and extended assignment operators. The order of evaluation of their operands is not specified at all. It can be left-to-right, right-to-left, or even a convoluted mixed-up order.

Preventing Order of Evaluation Bugs: Linux programmers have a partial solution to order of evaluation problems: lint. The lint utility will find many simple order of evaluation ambiguities: it will find those involving increment, but will not find those involving function calls. Modern C++ compilers can also find many of these errors and give a warning.

No amount of brackets can be a solution. Brackets do not affect the order of evaluation of operands. They affect only the order of evaluation of operators (i.e., they change only the precedence and associativity).

An extreme solution follows this motto: if there are no side effects, there is no problem. With that aim, the code is written so that increment and decrement operations are done in separate statements (i.e., different lines). Never increment a variable inside a complicated expression or in a function argument. Assignments are not placed in the middle of expressions. Function calls that may cause dangerous side effects are also separated out to separate lines.

A less extreme solution is to separate any dangerous side effects. Some side effects are no problem. For example, if a variable with a side effect appears only once in an expression, it is not a problem. Only dangerous side effects need be separated out, and side effects that are not dangerous are left alone. To separate side effects, temporary variables may be necessary to hold values of subexpressions.

Order of evaluation of function arguments

Another form of the order of evaluation problem occurs because the order of the evaluation of arguments to a function is not specified in C++. It is not necessarily left-to-right, as the programmer expects it to be. For example, consider the function call:

    fn(a++, a);  // Bug

Which argument is evaluated first? Is the second argument the new or old value of a? The compiler can legally implement this statement as if it were equivalent to either of:

    fn(a, a), a++;
    fn(a, a+1), a++;

Another example occurs where the side effect is a global variable. In the following example, the global variable is the file stream:

    void process_2_chars(int first_char, int second_char);
    ...
    process_2_chars(getchar(), getchar());   // Bug

It is undefined whether the function will receive the 2 characters in the intended order, or in reverse order.

Evaluation order of C++ overloaded operators

When a class has overloaded operators declared to operate on itself and other values, there are order of evaluation errors that can creep in. Consider the output statement:

    cout << i++ << " " << i++;

This use of the overloaded << operator suffers the same order of evaluation problems as any other function call. Overloaded operators are just like any other functions in terms of evaluation of their arguments. The above statement is equivalent to:

    cout.operator<<(i++).operator<<(" ").operator<<(i++);

This is similar to a function call sequence such as:

    fn(fn(fn(cout,i++)," "), i++);

There is no specified ordering of the two i++ side effects. It could be equivalent to any of the following statements:

    (cout << i << " " << i), i++, i++;
    (cout << i+1 << " " << i), i++, i++;
    (cout << i << " " << i+1), i++, i++;

Order of initialization errors

The order of evaluation of initializers is not defined. Therefore, the following declaration may cause strange program results on some compilers:

There is no problem if evaluation is from left-to-right, but if it occurs right-to-left then j will be assigned the value of i before i has been initialized — that is, j will contain garbage rather than 1.

Fortunately, the use of the address of a variable is no problem since the address does not depend on whether the variable has been initialized with a pointer. Therefore, the initialization below is not dangerous:

    int i = 1, *ptr = &i;   // Safe

A problem related to order of evaluation of initializers occurs with the order of initialization of constructors in C++ classes. The following constructor typifies the problem:

    Object::Object(int x) : i(x), j(i)  // Bug
    {
        // ...
    }

The order of initialization, although well defined within each constructor, is not necessarily left-to-right as the above code requires. Determining the order of initialization is quite a complicated process, depending on the ordering of the declarations of the data members being initialized, rather than the order in which they appear in the initializer list. Hence, depending on what i and j refer to, and in what order their declarations were placed, j may be initialized using i before i has itself been initialized by x.

Order of initialization of static objects in C++

A special order of evaluation error exists because the order of initialization of static or global objects is not defined across files. Within a single file the ordering is the same as the textual appearance of the definitions. For example, the Chicken object is always initialized before the Egg object in the following code:

    Chicken chicken; // Chicken comes first
    Egg egg;

However, as for any declarations there is no specified left-to-right ordering for initialization of objects within a single declaration. Therefore, it is undefined which of c1 or c2 is initialized first in the code below:

    Chicken c1, c2;

If the declarations of the global objects "chicken" and "egg" appear in different files that are linked together using independent compilation, it is undefined which will be constructed first. Try compiling and linking the following two files. The file "egg.cpp" is as follows:

    #include <iostream>
    class Egg {
        public:
        Egg() { std::cout << "Egg initialized\n"; }
    };

    Egg egg; // global object

The file "chicken.cpp" also contains the main function:

    #include <iostream>
    class Chicken {
        public:
        Chicken() { std::cout << "Chicken initialized\n"; }
    };

    Chicken chicken; // global object

    int main()
    {
        // do nothing; only constructors executed
    }

Different compilers will have different orders of appearance of the two output messages at run-time, and in fact, the order may well change for a particular compiler, depending on the order in which the object files are passed to the linker:

    cc chicken.o egg.o # may have different results
    cc egg.o chicken.o

Fortunately, the above dummy problem is not serious, but imagine if "chicken" required "egg" to be constructed. Such a dependency could well produce a run-time error. Therefore, a good general rule is that programs should avoid dependencies between global objects in constructors, particularly if objects are defined in separate files. If a global object is accessed in the constructor for another global object, the access may occur before the global object has been initialized.

Global Destructor Order of Evaluation Error: The order of destruction of class objects is also undefined across files, so there is a similar error if destructors have dependencies between objects in different files. Unless care is taken the destructor for one global object might use another global object that has already been destroyed.

Order of initialization of static variables

Note that the problems with order of initialization across files are not limited to class objects; the order of evaluation of any global or static variables is undefined across files. Although there is not usually any problem for variables initialized with constant expressions, there is potential danger in any initializations that involve nonconstant expressions that must be evaluated at run-time (i.e., immediately before main is called). For example, consider the problem if a program uses a global pointer variable initialized to point to dynamic memory:

    char *g_buffer = new char[BUFFER_SIZE];

This is legal C++ code, but it is quite dangerous. If "g_buffer" is used in a different source file by initialization code that is executed before main is called (e.g., a global object's constructor), then any such uses may use the value of buffer before it has been initialized properly (i.e., the value NULL). The moral of the story is to think carefully before using global variables with nonconstant expressions.

File Portability Problems

The file systems are very different on platforms such as Windows versus Linux. This introduces portability concerns with any issues of file management, naming, and directory locations. Some of the issues include:

Filename differences (e.g. backslash vs slash)
Directory path differences
Text files use \n versus \r\n sequences

There are also some low-level C++ issues with the standard I/O libraries having undefined behavior:

Buffering issues
EOF is an int not a char
fflush on stdin is unclear
fread/fwrite without intervening fseek

Backslash in DOS filenames

Windows and DOS use backslashes for directory paths in filenames, whereas Linux uses a forward slash character. A common error with file operations occurs when a DOS filename is encoded with its full path name. The backslash starts an escape inside the string constant. Hence, the filename below is wrong:

    fp = fopen("c:\file.cpp", "r");     // Bug

The backslash character starts the escape \f, which is a formfeed escape. The correct statement uses two backslash characters:

    fp = fopen("c:\\file.cpp", "r");    // Correct

EOF is not a char

The EOF constant is a special value for C++ file operations. One problem related to signs vs unsigned chars is comparing a char variable with EOF. Because EOF is represented as integer value −1, it should never be directly compared with a char type. Although it will usually work if characters happen to be signed for a particular implementation, if characters are unsigned, the comparison of a char type with EOF is not correct since −1 is promoted to unsigned int, yielding a huge value not representable by a char. An example of this type of bug:

   char ch = getchar();
   if (ch == EOF) { ... }   // Bug

The correct definition is to use "int ch".

Problems with SE Linux

The SE versions of Linux can cause you additional pain. That means Security Enhanced (SE) Linux. I usually find that I just need to turn it off, which might not be the best solution. The config file is "/etc/selinux/config" and the setting "SELINUX=disabled" will turn it off. You'll need to reboot your Linux box, which is annoying because I have some Linux boxes that have clocked uptime.

Missing Linux Temp Files

Are you having trouble with temporary files under Linux? There's a weird recent feature in the OS that moves the temporary files automatically under sub-directories. I guess it's secure and private, but it sure breaks stuff. If your code is working fine in test mode, but cannot find its temporary files when it's running under Apache, I think I might know the answer.

The configuration of systemd has a mode to store temporary files in subdirectories like /tmp/systemd-private-blah-blah or similar. To turn if off, you need to set "PrivateTmp=false" in the config file. For Apache, the setting is probably in the "/usr/lib/systemd/system/httpd.service" config file. If I ever figure out how to fix my code and leave PrivateTmp enabled, I'll let you know with a postcard.

Direct access on text files

There is no difference between text files and binary files on UNIX systems. The return value of ftell is an integer indicating the number of characters within the file. However, this is not necessarily so for other operating systems and it is common to abuse direct access on a text file. The long value returned from ftell is not necessarily a count of how many characters have been read.

Some implementations of text files will use two characters to represent each newline. Instead of a single newline character, the file will contain a newline \n character and a line-feed character (\r). Although the I/O libraries on such implementations will usually hide this fact from the user (by ignoring \r on input and automatically adding \r when outputting a \n), this can create problems when using direct access on text files where these conversions are not performed. Te xt files should not be opened using binary mode unless you really need low-level processing; otherwise there is a high probability that the \r-\n problem will not be handled correctly.

fflush on an input file

The fflush function is used to flush the buffer associated with a file pointer. Unfortunately, it can only be used to flush an output buffer, causing output to appear on screen (or be flushed to a file). Applying fflush on an input file leads to undefined results; it will succeed on some systems, but cause failure on others. The problem is typified by the following statement that often appears in code:

    fflush(stdin);

The intention is to flush all input characters currently awaiting processing (i.e., stored in the buffer), so that the next call to getchar (or another input function) will only read characters entered by the user after the fflush call. This functionality would be very useful, but is unfortunately not possible in general, as the effect of fflush is undefined on input streams. There is no portable way to flush any "type ahead" input keystrokes; fflush(stdin) may work on some systems, but on others it is necessary to call some non-standard library functions.

fread and fwrite without intervening fseek

When a binary file is opened for update using a mode such as "rb+", the programmer must be careful when using fread and fwrite. It is an error to mix fread and fwrite operations without an intervening call to a repositioning function such as fseek or rewind.

For example, do not assume, after sequentially reading all records in a file using fread, that a call to fwrite will write data to the end of the file; use an fseek call to explicitly reach the end of the file. The best method of avoiding this error is to call fseek immediately before every fread or fwrite call.

Buffering problems with <stdio.h>

Consider the following method of prompting for and accepting input with printf and scanf:

    #include <stdio.h>

    int main()
    {
        int n;
        printf("Enter a number: ");
        scanf("%d", &n);
        printf("The number is %d\n", n);
    }

This simple program contains an error that is far from obvious. Since stdout is line-buffered, and the call to printf does not contain a newline character, the initial prompt need not appear until the second printf statement (which contains a newline). The program could well wait for input from the scanf statement before the prompt has appeared. Fortunately, most implementations of the <stdio.h> library seem to silently flush stdout whenever an input function such as scanf or getchar is called. However, such behavior is not required by the standard, and there are implementations that do not do this. Therefore, this simple program contains a dangerous portability problem

There are a number of solutions to this problem. One method is to always place a newline at the end of a printf statement, thereby forcing the characters in the internal buffer to appear. Another method is to call fflush(stdout) before any input statement.

There is no similar problem with iostream and the cout and cin statements. The relationship between these streams is explicitly documented, and in fact there is a member function tie that allows a programmer to "tie" an input and output stream so that requesting input will flush the corresponding output stream.

Mixing <stdio.h> and <iostream>

It is poor C++ style to intermingle usages of functions from the C-style <stdio.h> library and the C++ <iostream> library (or the older <stream.h> library). Although mixing the two is often harmless, there are some errors that may occur. One problem is that they use diferent buffers and different buffering methods. Try executing the following program:

    #include <stdio.h>
    #include <iostream>

    int main()
    {
    printf("Hello ");
        cout << "World\n";
    }

This program may well produce the erroneous output:

    World
    Hello

The problem is that printf is line-buffered and will not output characters until it outputs a newline. cout uses a different buffering scheme and its output appears immediately. The characters in the printf buffer do not appear until the program terminates, causing the buffers to be flushed. Although it is far better to avoid mixing operations, one "quick-and-dirty" fix for this problem is the use of fflush(stdout) after stdio output statements, particularly printf statements when the format string does not end with a newline.

Mixed Input Buffering Problems

Buffering problems can also appear related to mixed input. Consider the following sequence of statements when given the input 35a, with no space between the number and the character:

Buffering problems can also appear related to input. Consider the following sequence of statements when given the input 35a, with no space between the number and the character:

    int x = 0;
    char c = 0;
    scanf("%d", &x);
    cin >> c;

The second input statement using cin will not see the 'a' character because it has been read into the stdio internal buffer. To read an integer, scanf must keep reading characters until it finds a nondigit character. Thus, it always reads one extra character, and pushes that character back onto the input buffer to be read next time. Unfortunately, scanf and cin don't read from the same buffer, so cin cannot read the character; any later call to scanf or getchar will receive the character.

The same problem with input buffering means that the explicit use of pushback using ungetc from stdio or pushback from iostream will not permit the pushed back character to be read by input functions from the opposite library.

Sometimes it is necessary to mix both stdio and iostream input libraries, such as when linking C++ code with existing C code. In such cases the basic rule to follow is to mix operations between the different libraries only on a line-by-line basis.

Character set portability issues

AI uses character processing in its tokenization phase, and also possible in data structures such as hash tables or tries, if string data is used as the key (e.g. KV caching or the inference cache). As if there weren't enough issues with UTF8 versus Unicode versus Latin1 encoding on all platforms, but platforms with EBCDIC present additional problems.

ASCII is used on most platforms, including Windows and Linux, and the 7-bit values are mostly portable. EBCDIC is an older encoding that is used mostly on IBM mainframes. One area of portability problems is those programs that rely on the underlying ASCII representation of the character set.

There are many ways that a program can have a dependency on the ASCII representation. The most blatant is the use of integers instead of character constants. For example, the following code testing for capital A works in ASCII:

    if (ch == 65)   // Nonportable

This works only because capital letter A is 65 in ASCII; this test will probably fail in a non-ASCII environment. Naturally, the completely portable method is to use character constants, since the compiler will always supply the correct integer for the given implementation's character set:

    if (ch == 'A')   // Portable

Another dangerous programming style is the use of order tests involving character constants. For example, consider the test whether ch is a lowercase letter:

    if ('a' <= ch && ch <= 'z')   // Nonportable

This test relies on the fact that the letters 'a'...'z' are consecutive integers in ASCII. This is not guaranteed in a non-ASCII environment. As an example, in EBCDIC there are more than 26 values in the range of 'a'...'z' and this test will incorrectly accept a number of nonletter characters. The required test for EBCDIC, because of the two "holes" in the sequence, is as follows:

    if ('a' <= ch && ch <= 'i') ||
       ('j' <= ch && ch <= 'r') ||
       ('s' <= ch && ch <= 'z'))

Fortunately, there is a portable alternative using the C++ standard library functions, which will always be correct for each implementation:

    if (islower(ch)) // Portable

The use of islower is also likely to be more efficient than either of the above tests, because most implementations use fast macros accessing precalculated tables for <ctype.h> functions. Hence, making good use of these macros will solve most character set portability problems.

Portable conversion between characters and integers

There are also a few common portability mistakes made by programmers involving conversions between characters and integers. For example, how does one convert the integers 0..25 into the letters 'A'...'Z'? Most programmers would use:

    ch = i + 'A'; // Nonportable

This will fail in the EBCDIC character set for any letters after I since the EBCDIC character set has 3 contiguous sequences of letters: 'A'..'I', 'J'..'R', and 'S'..'Z'. The obvious portable method is to use a big switch statement with 26 case labels, but a more elegant portable method is:

   ch = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"[i]; // Portable

This uses the character string as an array of characters and chooses the ith character. A similar idea can be used to portably convert a number 0..15 into a hexadecimal digit:

    hex_dig = "0123456789ABCDEF"[i]; // Portable

A similar nonportable practice is to iterate through characters in a particular range using an increment or decrement. Consider the following code to iterate through all the letters of the alphabet:

    for (char ch = 'A'; ch <= 'Z'; ch++) // Nonportable
        ....

A portable method is:

    char * letters = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
    for (int i = 0; i <= 25; i++) {
        ch = letters[i];   // Portably choose ith letter
        ....
    }

Portable Letter-to-Integer Conversions: Conversions from characters to integers are a little more difficult to perform neatly. For example, conversion of a letter in the range 'A'..'Z' into an integer 0..25 can be performed simply but nonportably in ASCII using:

    i = ch - 'A';   // Nonportable

The simplest portable method is to use a switch statement similar to the following:

    switch(ch) { /* Portable */
        case 'A': i = 0; break;
        case 'B': i = 1; break;
        // etc... all 26 cases
    }

One reasonably neat method is to use the strchr function. The code below uses strchr to search the letters for the given character, and then uses pointer arithmetic to return the index position of that character.

    char *letters = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
    i = strchr(letters, ch) - letters;
    printf("%c -- > %d \n", ch, i);

Note that strchr will not return NULL, provided that ch is an uppercase letter, but a safety test should be added. This may also be less efficient than a switch statement because of the function call overhead. A faster method might be to create a lookup table version.

Numeric digits. Similar issues should not arise for digits because '0'..'9' are sequential in both EBCDIC and ASCII.

    // Convert 0..9 to '0'..'9'
    char c = x + '0';  // Works on ASCII & EBCDIC

Preprocessor Macros: Tolerating EBCDIC in a widespread way will introduce extra inefficiency. An alternative is to use #if preprocessor tests to detect EBCDIC efficiently at compile-time:

    #if ('A' != 65)
    #define YAPI_IS_EBCDIC 1
    #else
    #define YAPI_IS_EBCDIC 0
    #endif

And if your code requires ASCII, you can put a #error statement into the #if test (i.e. to trigger a compilation error), or at runtime put an assertion as part of your portability check at startup:

   yassert(!YAPI_IS_EBCDIC);  // ASCII, please!

You can probably use "static_assert" here, too, which is faster.

EBCDIC Sorting Order: Another problem that's much harder to fix is that ASCII sorts the upper-case letters before the lower-case letters, because 'A' is 65, and 'a' is 97. EBCDIC is the reverse, with lower-case letters having smaller integer values than upper-case letters.

String Data Structure Portability: Another instance where character set dependencies arise is in data structures that use string representations, notably tries and hash tables. Although a reasonable hash function will work for any character set (although it will generate different numbers, this doesn't matter since it does so consistently for a given implementation), any hash function or trie indexing access that uses, say, the 26 letters to refer to an array of 26 "buckets" is error-prone. Also, if you've calculated used perfect hashing with one character set, it's won't be perfect any more with a different character set. Care must be taken to portably convert the 26 letters into the 26 integer values. Unfortunately, efficiency is lost by writing portable code in this situation.

Modification of string literals

String literals should not be modified in C++ because they could potentially be stored in read-only memory. They should be thought of as having type const char*. Therefore, using char* string types without caution can lead to errors, such as applying strcpy to a pointer that currently points to a string literal, as below:

    char *result = "yes";
    if (...)
        strcpy(result, "no"); /* WRONG *

The effect of this code is to try to modify the memory containing the string literal "yes". If this is stored in read-only memory the strcpy function has no effect (or possibly a run-time failure). Even if string literals happen to be modifiable for the particular implementation this form of modification can lead to strange errors. Overwriting "yes" with "no" means that the initialization of result will never again set result to "yes". The code can be thought of as equivalent to the following code:

    char yes_addr[4] = { 'y', 'e', 's', '\0' };
    char *result = yes_addr;
    if (...)
        strcpy(result, "no"); /* WRONG */

Hence, the strcpy call changes yes_addr and the initialization will always set "result" to whatever yes_addr currently contains.

Worse still is the problem that many compilers merge identical string literals so as to save space. Hence, the above strcpy call will change all uses of the constant "yes" to be "no" throughout the program (all appearances of that constant use the same address, yes_addr). Therefore, one change to a string constant will affect all other instances of the same string constant — a very severe form of aliasing.

Av oiding the modification of string literals is not all that difficult, requiring only a better understanding of strings. One solution to the above problem is to use an array of characters instead of a pointer:

    char result[] = "yes";
    if (...)
        strcpy(result, "no"); /* RIGHT */

In this case the compiler allocates 4 bytes for result, rather than making it point at the 4 bytes for the string literal (which was the same address that all uses were given).

Portability to early C++ versions

I can remember coding with Turbo C++ using 16-bit int types, and there were plenty of overflow problems there (i.e., any number higher than 32,000). There's also the issue that older compilers don't always support the newer C++ standardization improvements. If you have the need to support an old platform with only an old C++ compiler, then are a number of portability problems when attempting to write C++ code that is compatible with older C++ compilers. Although many of the changes relate to language syntax and will cause compilation errors, there are a few "quiet" changes, where the program will compile, link, and run on different C++ versions, but will produce unexpected results.

Old Library Function Problems

Note that there are many additional portability problems with some early implementations. Some of these problems are:

ungetc(EOF) was illegal but has no effect in the standard (but it's still odd usage).
free(NULL) was illegal, but has no effect now (again, don't do this!)
tolower/toupper required an upper/lower case letter in some implementations; standard C++ allows any characters, leaving non-letter arguments unchanged.
Spaces in scanf format strings were ignored in some early implementations rather than causing whitespace skipping. The main problem with this appears with spaces preceding the %c format specification.

Redefining library functions

Name clashes used to be more of a problem than they are with better compilers and management of namespaces. The names of the standard library functions are reserved in the sense that the programmer should not use such names for external variables or for other functions. It is a common error to accidentally redefine one of the existing functions, such as the "remove" function.

Another example is trying to replace malloc by simply defining your own in the global namespace. Redefining an existing library function may cause some strange failures because some library functions may call each other. For example, some functions may call malloc and these functions may fail if malloc has been redefined by the program. On the other hand, redefining a function might also be harmless on a particular implementation, and a dangerous portability problem will remain undetected.

Trigraphs in string constants

Trigraphs are an ancient feature of C and C++ that is now removed from the language. Hence, this error will only occur on very old platforms. Trigraphs are special three-letter sequences, starting with two question marks. For example, the trigraph sequence "??!" stands for the vertical pipe (|) symbol. Trigraphs were intended to support machines with limited keyboards or character sets. For example, instead of typing #, the programmer can type the 3-character sequence "??=", and the compiler will automatically convert this trigraph into #.

For the most part a programmer can ignore the existence of trigraphs. The only danger is that of accidentally placing a trigraph in a string literal, such as:

    printf("What??!");    // Trigraph bug?

On an old trigraph-supporting compiler, this will become:

    printf("What|");

The solution is to use the \? escape instead of plain ? characters, so as to ensure that no trigraph sequence is seen by the compiler:

printf("What\?\?!");   // Correct

Overall, trigraph problems are rare and are only an obscure portability problem in early C++ compilers.

First call to clock

Although the standard states that clock should return the time elapsed since the program started, some old implementations return the time since the first call to clock. Therefore, the first call to the clock function will always return zero. Thus, it is dangerous (nonportable) to measure execution time by a single call to clock at the end of the main function.

    clock_t before = clock();
    /* ..... do something */
    printf("Time taken is %5.2f seconds\n",
        (clock() - before) / (double) CLOCKS_PER_SEC);

Errors with setjmp and longjmp

The setjmp and longjmp are old standard libraries that used to be for creating exception handlers or coroutines. There are now much better C++ features available, so these errors will only occur in old code.

There are very few legal methods of calling setjmp. This allows implementors freedom in not supporting the most general usage, which may be difficult to achieve. Some of the most common legal usages are:

    if(setjmp(...))
    if(setjmp(...) == ...)
    switch(setjmp(...))
    if(!setjmp(...))

A common but strictly non-standard method is the assignment of its return value to a temporary variable, as below. Therefore, the following usage of setjmp may fail on some implementations:

    temp = setjmp(...); // Bug

This restriction appears because some implementations have difficulty in implementing setjmp so that it can keep enough "context" so as to continue the evaluation of an expression after a return via longjmp. Therefore, standard usage specifies a minimal number of call methods that the compiler must support.

Use of local variables after setjmp call

The setjmp function, when it returns with a nonzero value from a longjmp call, causes the values of non-volatile automatic local variables to be indeterminate in a particular situation. If a local variable is modified between the setjmp call, and the later longjmp call its value is indeterminate. (Note that a variable that has not been modified in this period will have the expected value.) The local variable may contain the value it had when setjmp was originally executed, or it may contain the updated value (strictly speaking, it might have any value because the standard merely says it is "indeterminate," but one of these two values is the usual behavior). The following code illustrates the problem:

    #include <stdio.h>
    #include <stdlib.h>
    #include <setjmp.h>

    jmp_buf env;   // Global

    int main()
    {
        int i = 1; /* Local automatic variable */
        if (setjmp(env) != 0) {
            /* longjmp has occurred */
            printf("i = %d\n", i); /* Value of i is 1 or 2 ? */
            exit(1);
        }
        i = 2; /* change i after setjmp */
        longjmp(env,1);
    }

Does the variable i contain the value 1 or 2 after the return from the longjmp call? If longjmp restores the values of automatic local variables it will have value 1; otherwise, it will have value 2.

If variables must be accessed after the return from a longjmp call they should be declared volatile or static. The program above will report i as having value 2 if it is declared as either volatile or static. Note that the values of automatic local variables are not affected in the original call to setjmp, when it returns 0 (i.e., it is just like any other function call in the initial call).

setjmp returns before longjmp

The setjmp function should be called in a function that will not finish before the longjmp call is executed. If the function calling setjmp has terminated when longjmp is executed, the behavior is undefined and the result is probably a fatal error. For example, it is erroneous to write a neat function to set up the exception handling facilities, such as the init_handler function in the test program below:

    #include <stdio.h>
    #include <stdlib.h>
    #include <setjmp.h>

    jmp_buf env;

    void init_handler(void) /* WRONG!! */
    {
        if (setjmp(env) != 0) {
            /* longjmp has occurred - exception handler here */
            fprintf(stderr, "\nFatal error\n");
            exit(1);
        }
    }

    int main()
    {
        init_handler();
        longjmp(env,1);
    }

This method of handling exceptions is totally flawed. In fact, the execution of this short program on one system produced the fatal error message "longjmp botch" and did not execute the fprintf statement. The usual method of avoiding this problem is to call setjmp in a function that never returns, typically the main function.

longjmp inside a signal handler

It is bad programming practice to call longjmp from within a signal handler that may have been invoked owing to an asynchronous hardware signal. For example, if the user presses ctrl-c to provoke the SIGINT signal, this can occur at any time. The program may have been executing any code, and prematurely leaving that code block may have left a data structure corrupted. Therefore, attempting to continue after a signal by using longjmp is dangerous. The only portable method of handling a signal is to report an error and then terminate.

Link Errors with Math Libraries

The first obstacle that novice programmers working under UNIX must overcome is how to get a program using mathematical functions to link. For example, in a simple program using sqrt, the linker may complain that _sqrt is undefined. The problem is probably that the math library is not linked. UNIX compilers have the annoying feature that they do not try to link in math functions unless this is requested by the -lm option:

    cc file.c -lm

Note that -lm must be at the end of the statement. A common mistake in the use of -lm is to put it first, in which case the linker error messages persist:

    cc -lm file.c # WRONG

Once this obstacle has been overcome and the program is finally running, the programmer is free to discover other forms of run-time errors involving the math library functions.

Null pointer assignment on old Windows PC

A number of old compilers for PCs produces an error message such as "Null pointer assignment" after the program terminates. These compilers examine the value stored at NULL before program execution, and then reexamine it after execution and complain if the value has changed. In this way any assignment to NULL is detected, albeit a long time after the actual error has occurred. Typically a number of bytes are examined close to address zero, so that assignments such as:

    p->key = 10;

where p is NULL will provoke the error. This statement will not change the value at address zero, but at the offset of the key field.

Unfortunately, this error can only be detected in the small and medium models; in other models an assignment via NULL will probably change the interrupt vectors at address 0000:0000, thereby provoking a fatal crash. Only the small and medium models have NULL addresses that lie safely within a (single) writable segment. It is also unfortunate that these compilers do not provide run-time testing of every pointer dereference. It would not be difficult to implement such a run-time check in a manner similar to the run-time stack overflow checking that is common. However, this "Null pointer assignment" message is the best available on these compilers, and we must examine the difficult task of tracing the cause of the error. There are a number of possible debugging methods:

Set a "watch" on address zero using a debugger (small/medium models only)
— for Turbo Debugger, watch "*(char*)0,4m" and "(char*)4".
Use postmortem debugging under Windows.
Port the code to UNIX and use postmortem debugging to trace segmentation faults.
Find a compiler or debugger supporting run-time NULL checking.
Use roving exit calls.
Use calls to a routine to check for NULL assignments.

Only the last two methods require elaboration. A brute-force method is to put exit calls at different places in the program to determine how early in program execution the NULL assignment takes place. However, this is a very laborious method because it requires repeated compilation and the same input case must be supplied.

A more effective method is to sprinkle calls to a function that detects the error throughout the program. Microsoft C++ supports the non-standard function _nullcheck. Howev er, it is not difficult to write your own testing function if using other compilers; let's call ours nullchk. In addition to detecting the error, the preprocessor can be used to allow diagnosis of the line and filename of the nullchk call that first detects the error. This call will be the one "closest" in terms of execution time to the cause of the error.

The following version of nullchk has been tested using Turbo C/C++, but should work for all compilers (except possibly that the #if-#error-#endif test might need to be changed or deleted).

    /*-------------------------------------------------*/
    /* NULLCHK.C: Detect NULL pointer assignments */
    /*-------------------------------------------------*/
    #include <stdio.h>
    #include <stdlib.h>

    #define BYTES_TESTED 20 /* how close to NULL? */
    #if ! __SMALL__ && ! __MEDIUM__
    # error nullchk() will only work for small/medium models
    #endif

    long int nullchk_line = -1;
    char *nullchk_file = "???";

    void nullchk(void)
    {
        static unsigned char bytes[BYTES_TESTED];
        static bool first = true;
        bool ok;
        int i = 0;
        unsigned char *p;
        if (first) { /* if first nullchk call */
            first = false;
            /* store the bytes */
            for (p = NULL, i = 0; i < BYTES_TESTED; i++, p++) {
                bytes[i] = *p;
            }
        }
        else { /* Not the first call; check for changes */
            ok = true;
            for (p = NULL, i = 0; i < BYTES_TESTED; i++, p++) {
                if (bytes[i] != *p) {
                    if (ok) /* first change only */
                    fprintf(stderr,"Line %ld, File '%s': nullchk fails\n",
                            nullchk_line, nullchk_file);
                    fprintf(stderr, "Byte %d changed from %d to %d\n",
                            i, bytes[i], *p);
                    ok = false;
                }
            }
            if (!ok) exit(EXIT_FAILURE); /* terminate */
        }
        nullchk_line = -1; /* Reset to dummy values ... */
        nullchk_file = "???"; /* .. in case next time doesn't use macro */
    }

If it is desirable to continue execution after an error has been identified, this can be achieved by deleting the line that calls exit. The header file "nullchk.h" is shown below. This allows nullchk to report the line and filename where the error was detected, in a similar manner to the reporting of the assert macro.

    // NULLCHK.H: Detect NULL pointer assignments
    void nullchk(void);
    extern long int nullchk_line;
    extern char *nullchk_file;

    #define nullchk() nullchk_line = __LINE_ _, \
        nullchk_file = __FILE_ _, \
        nullchk()

The line and filename are passed via global variables rather than as function arguments so that nullchk can be used both with and without the inclusion of the header file.

Floating-point run-time error messages

There are a few common run-time error messages produced by early C++ compilers:

    Floating-point format not linked

Typically, this error arises when a printf statement attempts to output a float or double value in a small program that doesn't use float or double for any other purpose. This error message is caused by what amounts to a bug in the compiler. The compiler attempts to deduce when the code to perform conversions of floating-point values to characters (e.g., for %f), but occasionally gets it wrong, and leaves out the code. When the printf statement is reached, the absence of this code is detected and the error occurs. One suggested workaround to force the linking of this code is to place the following function in one file (but don't call it!):

    static void forcefloat(float *p)
    {
        float f = *p;
        forcefloat(&f);
    }

Microsoft C++ compilers have been known to produce the run-time error message:

    Floating-point not loaded

This indicates that the compiler has assumed the presence of a coprocessor, and has not loaded instructions to emulate the operations. Execution terminates when it detects the absence of the coprocessor. The solution is to relink with the correct library; consult your manual for details.

rand function non-portable bits

An example of a nonportable practice is the extraction of high-order bits from the return value of rand. On some early implementations the low-order bits of the random number exhibited nonrandom patterns (e.g., rand()&01 produced the pattern 0,1,0,1...), and programmers solved this problem by extracting bits from the higher 16 bits. A common non-portable method of producing random binary values in old code is:

   bit = rand() & (1 << 16); /* extract 17th bit */

Anachronistic tokens and ancient compilers

In a very early definition of the C language the extended assignment operators such as "-=" had the characters reversed (i.e., "=-" was an operator in ancient C). Fortunately, this usage quickly died out, and neither C nor C++ supports these tokens. However, for very old C compilers there is some ambiguity in statements such as:

    x=-1;

Some very old compilers could conceivably treat "=-" as the extended subtraction operator and subtract 1 from x. However, in my experience the worst that occurs is an annoying compiler warning about using an "anachronism", which is a nice word that I use daily every time I order coffee. This obscure code warning can be fixed by simply adding a space to separate the "-" and "=" characters:

    x = -1;