Aussie AI

Bit Representations of Floating-Point Numbers

Book Excerpt from "Generative AI in C++"

by David Spuler, Ph.D.

Bit Representations of Floating-Point Numbers

Standardized bit patterns are used to represent floating-point numbers in a kind of scientific notation. There are three types of bits:

Sign bit
Exponent bits
Mantissa bits

Firstly, there's one bit for the sign, indicating whether the whole number is positive or negative. Then the remaining bits are split up between the “exponent” (i.e. the “power”), and the “mantissa” (also called the “digits” or the “significand” or the “fraction”). In a standard 32-bit “float” type used in AI, there is:

1 sign bit
8 exponent bits
23 mantissa bits

How does that even make a number? Well, it's like scientific notation, if you are familiar with that. The exponent is the power and the mantissa is the digits.

Let's pretend computers use decimal digits. If it were in base 10 storage, the decimal number 1234 would be stored as:

“0” for the sign bit — because non-negative.
“3” in the exponent — the power is 10^3=1000.
“1234” as the mantissa — the digits make the fraction “1.234”.

This would represent +1.234x10^3 (which hopefully equals 1234). That's how it would work for a decimal version.

But, as you know, silicon beasts are not decimal. A floating-point number is actually stored in binary, in a kind of base-two “binary scientific notation” numbering scheme. So, conceptually, 1234 would be stored as a power-of-two exponent that represents the largest power-of-two, which would be 1024, because 2^10=1024, so the exponent has to store power “10” (ten), which is 1010 in binary. And the 1234 would be converted to whatever the heck 1234/1024 is when you represent that in binary 0's and 1's, and remove the decimal point (which is implicitly “floating,” you see?).

It's more complicated than this, of course. That's what standards are for! The exponent bits are actually stored with an “offset” number (also called a “bias”), which differs by the size of the exponent bits. And there also some special bit patterns for particular numbers, such as zero or “NaN” (not-a-number).

Clear as mud? Don't you wish someone could go back in time and invent a base-10 computer?

• Next:

• Up: Table of Contents

• Buy: Generative AI in C++: Coding Transformers and LLMs

The new AI programming book by Aussie AI co-founders:

AI coding in C++
Transformer engine speedups
LLM models
Phone and desktop AI
Code examples
Research citations

Get your copy from Amazon: Generative AI in C++

Aussie AI

Bit Representations of Floating-Point Numbers

Bit Representations of Floating-Point Numbers

Quick Links

Product

New to Writing?

Writing Styles