Aussie AI
Float Family Loyalty
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
Float Family Loyalty
Hidden unnecessary C++ type conversions are a common source of extra inefficiency.
The main type in a Transformer is usually “float
” (32-bit),
rather than “double
” (64-bit).
Avoid unnecessary type conversion code in two ways:
- Don't mix float and double
- Don't mix float and int
The use of float
and int
tends to be something professional C++ programmers are aware of,
after having been burned a few times,
and doesn't occur that often by accident.
However, inadvertently mixing float
and double
is difficult to avoid,
and sneaks into your code all the time.
For example, here's some C++ code that looks perfectly correct:
float scalefactor = sqrt(2.0) * 3.14159;
You know this isn't real AI code because it doesn't have 27 decimal places for pi, which we've memorized by rote. AI engines don't really need anywhere near that much precision, but it looks good for the boss.
The above code is also a small slug,
because it may be unnecessarily using “double
” size arithmetic,
although the compiler might fix it with constant folding (but emit a warning anyway).
Here's the corrected code:
float scalefactor = sqrtf(2.0f) * 3.14159f;
Note that this example shows there are two places where an “f” suffix is needed
to signify that float
arithmetic is required:
- Numeric constants (i.e. “
2.0f
” specifying a 32-bitfloat
, rather than “2.0
”, which is a 64-bitdouble
constant). - Standard C++ functions (i.e. “
sqrtf
” returnsfloat
rather than “sqrt
” returningdouble
).
Without the suffix “f
”, the default is double
type constants and double
arithmetic functions.
A lot of C++ compilers will warn about these type conversions losing precision,
so if you aim for warning-free compilation as a quality goal, you'll also fix most of these wasteful hidden type conversions.
• Next: • Up: Table of Contents |
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |