Aussie AI

Data Type Sizes

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

Data Type Sizes

The typical AI engines work with 32-bit floating-point (float type). Note that for 32-bit integers you cannot assume that int is 32 bits, but must define a specific type. Furthermore, if you assume that short is 16-bit, int is 32-bit, and long is 64-bit, well, you'd be incorrect. Most platforms have 64-bit int types, and the C++ standard only requires relative sizes, such as that long is at least as big as int.

Your startup portability check should check that sizes are what you want:

    // Test basic numeric sizes
    yassert(sizeof(int) == 4);
    yassert(sizeof(float) == 4);
    yassert(sizeof(short) == 2);

And you should print them out in a report, or to a log file. Here's a useful way with a macro that uses the “#” stringize preprocessor operator and also the standard adjacent string concatenation feature of C++.

#define PRINT_TYPE_SIZE(type) \
        printf("Config: sizeof " #type " = %d bytes (%d bits)\n", \
        (int)sizeof(type), 8*(int)sizeof(type));

You can print out whatever types you need:

    PRINT_TYPE_SIZE(int);
    PRINT_TYPE_SIZE(float);
    PRINT_TYPE_SIZE(short);

Here's the output on my Windows laptop with MSVS:

    Config: sizeof int = 4 bytes (32 bits)
    Config: sizeof float = 4 bytes (32 bits)
    Config: sizeof short = 2 bytes (16 bits)

16-Bit Integer Data: For quantization to 16 bits, you might use a 16-bit integer (“short”). However, you should check it with a static_assert in your C++ code.

16-Bit Floating-Point: For 16-bit floats (FP16 or BF16), there are still issues. The main C++ compilers at the time of writing (Oct 2023) do not have any great builtin support of 16-bit floating-point types. There's no “short float” type, for example, in GCC or Microsoft Visual Studio C++. Maybe you can find a platform-specific way to do 16-bit float types in C++, since there's no standard way at the time of writing. There are some new standard type names written into the C++23 standard, but not many compilers are there yet.

Standard Library Types: Other data types to consider are the builtin ones in the standards. I'm looking at you, size_t and time_t, and a few others that belong on Santa's naughty list. People often assume that size_t is the same as “unsigned int” but it's actually usually “unsigned long”.

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++