Aussie AI
Negative Zero
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
Negative Zero
Floating-point representations have two zeros: positive zero (the usual “0.0f
” one) and negative zero (“-0.0f
”).
Note that there's no negative zero in integers, but only in floating-point types,
because integers use two's complement in C++.
Usually, you don't have to worry about negative zero float values,
because all of the floating-point operations treat zero and negative zero as equal.
Negative zero is not less than positive zero,
but is equal instead.
For example, the “==
” and “!=
” operators should correctly handle both zeros as the same,
and testing “f==0.0f
” will succeed for zero and negative zero.
Normal C++ operations on float
types will automatically handle negative zero for you,
such as “<
” will treat the two zeros are equal, not less-than.
This happens at the cost of some inefficiency.
Detecting Negative Zero.
Testing for negative zero is not easy.
Unfortunately, you cannot use the std::fpclassify
function because
it returns FP_ZERO
for both positive and negative zero.
Here are some fast macros for 32-bit floats that look at the bits
by pretending it's an unsigned
32-bit integer:
#define AUSSIE_FLOAT_TO_UINT(f) (*(unsigned int*)&f) #define AUSSIE_FLOAT_IS_POSITIVE_ZERO(f) \ (((AUSSIE_FLOAT_TO_UINT(f) )) == 0) // All 0s #define AUSSIE_FLOAT_IS_NEGATIVE_ZERO(f) \ (((AUSSIE_FLOAT_TO_UINT(f) )) == (1u<<31)) // Sign bit
Note that these macros only work for float
variables,
not constants, because the address-of “&
” operator
gets a compilation error for floating-point constants (e.g. 0.0f
or -0.0f
).
Also, these only work for 32-bit float
types, and comparable macros are needed
for 64-bit double
or 128-bit long double
types.
Pitfall: Bitwise tricks on negative zero. There are some pitfalls with negative zero if you are trying to subvert the normal floating-point number representations and do bitwise operations on them (as I just did above!).
For example, if you're doing bitwise tests on a float
, you may still
need to test for two values of zero,
such as using one or both of the above zero testing macros.
For magnitude comparisons of float
types via their underlying bits, there's also a problem.
Whereas positive zero is all-bits-zero and will equal integer zero or unsigned integer zero,
negative zero has the uppermost bit set (the sign bit), so it will be a negative integer
or a very large unsigned number.
Hence, negative zero will sort as less than positive zero if using signed integer tests,
or will sort as massively greater than many numbers if using unsigned integers for testing.
The problem with negative zero also means that doing any bitwise comparisons will fail.
You cannot just compare the underlying integers for equality against each other,
nor can you use byte-wise testing.
For example, using memcmp
for equality testing a float
vector will occasionally
fail for float
values where positive zero compares against negative zero,
leading to insidious bugs.
Optimization by Suppressing Negative Zero.
Since negative zero introduces an inefficiency into basic float
operations (e.g. ==
or !=
with 0.0
),
can we block it for a speedup?
Are there any settings that fix the CPU or the compiler to ignore negative zero?
The FTZ and DAZ modes are mainly for subnormal numbers, not negative zero.
I'm not aware of any hardware CPU modes specifically for disallowing skipping negative zeros,
and I wonder whether they would actually be a de-optimization anyway,
by forcing the FPU to explicitly check for negative zeros.
Apparently, FTZ might help avoid negative zero in computations, but I'm not sure it's 100% of cases.
There is a GCC flag “-ffast-math
” which disables the production of negative zero in software.
AI Engines and Negative Zero. Can we speed up the floating-point computations of our AI engine by blocking all floating-point negative zeros? Then the FPU or GPU can assume there's only one type of zero, and run faster.
One situation with a lot of zeros are the “sparsity” methods
from unstructured pruning (i.e. magnitude pruning
or movement pruning), although hopefully
these would be pruned to the normal zero, rather than negative zero.
However, if a pruned model is doing any “zero skipping” routine that tests “f==0.0f
” on a weight,
there's an unnecessary hidden test for negative zero.
We could either run in a negative-zero-disabled mode, or use our own bitwise test for floating
point zero as all-bits-zero (i.e. using the unsigned integer trick).
Another point about negative zero in AI engines is that weights are static,
so you can certainly pre-process the model file to ensure
that none of the weights are negative zero.
Then, during runtime inference, you can assume that all float
values of weights
can ignore negative zero.
Alternatively, the training method can avoid negative zeros in its computations by running in such a mode,
or it could explicitly check for negative zero as a final safety check.
What about zero values at runtime?
The main float
computation is the vector of logits,
which represents probabilities of words (conceptually).
Can we guarantee that it never contains a negative zero, and thereby speed up analysis?
Can we ensure that vector dot products never compute negative zero?
Or is there a way to have the RELU activation function fix any negative zero values that are computed?
These optimizations are examined in later chapters.
• Next: • Up: Table of Contents |
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |