Aussie AI
C++ Type Bugs
-
Bonus Material for "Generative AI in C++"
-
by David Spuler, Ph.D.
Type Bugs
Type Conversion Problems
The large number of type conversions performed implicitly by the compiler leaves room for error. In an AI engine, there's many cases of mixing float and int types. Be careful when mixing any floating-point and integral types because conversion from a real to an integer truncates to the nearest integer. When mixing these types it is important to use the correct form of constant. For example, the code below may not have the effect intended by the programmer:
int x = 7; float f = x / 2; // Bug?
This code will perform integral division on x, thereby truncating it to the nearest whole integer, and then convert this integer to float, yielding the value 3.0 and not 3.5. The problem is that both x and 2 have type int and therefore the / operator performs integral division. Corrected code uses the constant 2.0, which has type double:
float f = x / 2.0; // Working
Fully correct code uses a float constant 0.0f to match the variable type:
float f = x / 2.0f; // Correct
Even better for readability would be to also use an explicit cast to indicate a conversion of int to float:
float f = (float)x / 2.0f; // Better
Conversion from double to float, or from one integral type to a smaller integral type, may lose information. Most C++ compilers will issue a warning for instances of such conversions. Generally speaking, such conversions should be avoided wherever possible through the use of consistent types of variable and constants.
Mixing integer types has similar issues. For example, always use int to represent an integer even if it will be small enough to be stored in a char or a short. Don't use a char or a short to represent an integer value unless you really need a performance improvement (either space or time) and you have measured the code to determine that this will yield an improvement. Use type "int" for most integral values; use type "char" only for characters or bytes; use type "long" when a value may exceed MAXINT (about 4 billion for 32-bit ints); and use type "short" when you need a 16-bit data type.
Const Type defaults to int
It is a dangerous feature of C++ that if no type is supplied, the type defaults to int. One example of the dangers is the constant declaration:
const half = 0.5f; // Bug
The constant "half" is accidentally declared as type int, and given the (truncated) value of 0. A good compiler will warn about converting double to int. The correct declaration is simply:
const float half = 0.5f; // Correct
Unsigned Type Problems
The type qualifiers signed and unsigned can also lead to problems. In particular, char variables are often implicitly "signed char", and conversion of a character in the range 128..255 to an integer can yield a negative number. All common alphanumeric characters fall into the range 0..127 and present no problem. However, when accessing 8-bit bytes (i.e. characters in the range 0..255), such as UTF8 encoding, the type "unsigned char" should be used.
The unsigned qualifier can be a problem when dealing with negative integer values. Novices may wrongly assume that C++ prefers normal int over unsigned int when doing mixed arithmetic, but that is incorrect. For example, consider the code below:
unsigned int x = 0; int y = -1; if (x > y) printf("x > y \n");
The operands of ">" are converted to the "larger" type — in this case, unsigned int. Hence, −1 is converted to an unsigned quantity, yielding a very large value.
size_t unsigned type problems
A similar problem with unsigned types can be hidden by the special type size_t. Since many of the library functions return or use this type it is common style for programmers to declare variables of type size_t. However, in my opinion, this is a dangerous and unnecessary practice since the type int will usually adequately handle the same task. One danger is illustrated by the following function to print a string in reverse:
#include <stdio.h> #include <string.h> #include <stddef.h> // declare size_t void print_reverse(char *s) { size_t len = strlen(s); for ( ; len >= 0; len--) { putchar(s[len]); } }
The programmer has declared len having type size_t because strlen has this return type. This function will have an infinite loop if size_t is an unsigned type on the platform (which is usual), because the comparison with 0 will never fail. If the particular implementation defined size_t as a signed type, the code will accidentally work but hides a potential portability problem. Fortunately, some C++ compilers will produce a warning about a comparison of an unsigned type with zero.