Aussie AI
Data Type Sizes
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
Data Type Sizes
The typical AI engines work with 32-bit floating-point (float
type).
Note that for 32-bit integers you cannot assume that int
is 32 bits,
but must define a specific type.
Furthermore, if you assume that short
is 16-bit, int
is 32-bit, and long
is 64-bit, well, you'd be incorrect.
Most platforms have 64-bit int
types,
and the C++ standard only requires relative sizes,
such as that long
is at least as big as int
.
Your startup portability check should check that sizes are what you want:
// Test basic numeric sizes yassert(sizeof(int) == 4); yassert(sizeof(float) == 4); yassert(sizeof(short) == 2);
And you should print them out in a report, or to a log file.
Here's a useful way with a macro
that uses the “#
” stringize preprocessor operator and also the standard adjacent string concatenation feature of C++.
#define PRINT_TYPE_SIZE(type) \ printf("Config: sizeof " #type " = %d bytes (%d bits)\n", \ (int)sizeof(type), 8*(int)sizeof(type));
You can print out whatever types you need:
PRINT_TYPE_SIZE(int); PRINT_TYPE_SIZE(float); PRINT_TYPE_SIZE(short);
Here's the output on my Windows laptop with MSVS:
Config: sizeof int = 4 bytes (32 bits) Config: sizeof float = 4 bytes (32 bits) Config: sizeof short = 2 bytes (16 bits)
16-Bit Integer Data:
For quantization to 16 bits, you might use a 16-bit integer (“short
”).
However, you should check it with a static_assert
in your C++ code.
16-Bit Floating-Point:
For 16-bit floats (FP16 or BF16), there are still issues.
The main C++ compilers at the time of writing (Oct 2023) do not have any great builtin support
of 16-bit floating-point types.
There's no “short float
” type, for example,
in GCC or Microsoft Visual Studio C++.
Maybe you can find a platform-specific way to do 16-bit float types in C++,
since there's no standard way at the time of writing.
There are some new standard type names written into the C++23 standard,
but not many compilers are there yet.
Standard Library Types:
Other data types to consider are the builtin ones in the standards.
I'm looking at you, size_t
and time_t
, and a few others that belong on Santa's naughty list.
People often assume that size_t
is the same as “unsigned int
”
but it's actually usually “unsigned long
”.
• Next: • Up: Table of Contents |
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |