Aussie AI

8. Bitwise Operations

Book Excerpt from "Generative AI in C++"

by David Spuler, Ph.D.

“Tryna find my way back to you 'cause I'm needin'
A little bit of love.”

— Little Bit of Love, Tom Grennan, 2021.

C++ Bitwise Operators

Here's a refresher on the C++ bitwise operators:

x & y — binary bitwise-AND

x | y — binary bitwise-OR

x ^ y — binary bitwise-XOR

x << y — binary left bitshift

x >> y — binary right bitshift

~x — unary bitwise-complement

Binary literals. Also, a reminder that C++ also supports binary literal constants with a “0b” prefix, similar to the hexadecimal “0x” prefix. For example, to represent the constant 10 (ten), your C++ code can use:

    const int ten = 10;     // decimal
    const int ten = 0xA;    // hexadecimal
    const int ten = 012;    // octal
    const int ten = 0b1010; // binary

Bitwise badness: A few pitfalls in coding C++ bitwise operators should be mentioned:

Integer-only: the C++ bitwise operators do not work on floating-point data types.
Quiet overflow: if you do anything to overflow an integer type, nobody's going to tell you. For example, shifting the sign bit too far left with “1<<32” instead of “1<<31” will simply lose it. You might get a compiler warning, though.
Two is not better than one. The & operator is bitwise, but && is logical. Similarly, | and ||. It's the reverse for < and << or > and >>. Choose the wrong one and you might get a compiler warning, if the stars are aligned and the wind is blowing easterly.
Operator precedence is tricky and not what you'd expect (it's arguably broken, but rather too late to fix), so use lots of parentheses in bitwise expressions, and don't ignore C++ compilation warnings.
Bitwise operators are not always well-defined on negative values (e.g. bitwise right shift is officially “undefined behavior” on a negative), so it's best to use “unsigned” types as operands to bitwise operators. Note also that it's often useful to add the suffix letter “u” to integer constants (e.g. 10u, 0xAu or 0b1010u), when dealing with bitwise operations. This makes the constant of type “unsigned” and avoids various bitwise operator problems with signed numbers.

Bitwise operation algebraic properties: The interaction with zero is an important difference between the main operations:

Bitwise-AND with zero equals zero: a & 0 == 0
Bitwise-OR with zero equals the other value: a | 0 == a

The following inequalities for bitwise operators on non-negative integers can also be useful to know:

Bitwise-AND only clears bits and is <= each operand: a & b <= a
Bitwise-OR only sets bits and is >= each operand: a | b >= a
Bitwise-AND equals the larger value only for equal numbers.
Bitwise-OR equals the larger value only for subset bit patterns.

Addition versus bitwise operations: The relationship between the bitwise operators and the integer “+” operator can be useful to understand:

Bitwise-AND is <= the sum of its operands: a & b <= a + b
Bitwise-AND equals addition only if both numbers are zero.
Bitwise-OR is >= the sum of its operands: a | b >= a + b
Bitwise-OR equals addition only for disjoint bit sets or zeros.

Note that these relationships are for positive integer values. Bitwise operators need positivity in their daily lives, whereas addition is fine with lots of negativity.

Bit Flag Basics

The main use of C++ bitwise operators is to use bit flags in integer variables, which is very efficient in both storage space and execution time. A vanilla “int” can store 32 bit flags, and a “long” can store 64 bits. The basic bit operations in C++ use these bitwise operators:

Check a bit — bitwise-AND (&)
Set a bit — bitwise-OR (|)
Toggle a bit — bitwise-XOR (^)
Clear a bit — bitwise-AND with complement (& with ~)

Here are some example macros for examining the bits in a 32-bit integer, which should be of “unsigned int” type:

    // Bit Flags in Integers
    #define AUSSIE_ONE_BIT_SET(x, b)   \
      (( ((unsigned)(x)) & ((unsigned)(b))) != 0 )
    #define AUSSIE_ANY_BITS_SET(x, b) \
      (( ((unsigned)(x)) & ((unsigned)(b))) != 0 )
    #define AUSSIE_ALL_BITS_SET(x, b) \
      ((((unsigned)(x))&((unsigned)(b))) == ((unsigned)(b)))
    #define AUSSIE_NO_BITS_SET(x, b)  \
      (( ((unsigned)(x)) & ((unsigned)(b))) == 0 )

The corresponding macros to set and clear these bit flags are:

    #define AUSSIE_SET_BITS(x, b)    \
      (( ((unsigned)(x)) | ((unsigned)(b))))
    #define AUSSIE_CLEAR_BITS(x, b)  \
      (( ((unsigned)(x)) & (~((unsigned)(b)))))
    #define AUSSIE_TOGGLE_BITS(x, b) \
      (( ((unsigned)(x)) ^ ((unsigned)(b))))

Yikes! What a mess! But all those parentheses are necessary to avoid precedence issues with preprocessor macros.

Bit Sets

You can consider a 32-bit integer to be a “bit set” of 32 distinct bit flags, where all 1s represent a bit flag that is in the set. A bit set is an inherently parallel architecture, even in ordinary sequential C++ code. The basic idea is that a 32-bit unsigned int stores 32 bit flags. Certain actions on the integer as a whole effectively process 32 bits in parallel. For example, it is very fast to check if any bits are set at all by testing whether the whole integer is zero.

In regards to bit sets stored in an integer, the basic set operations can be implemented very efficiently with C++ bitwise operators:

Bitwise-AND (&) — intersection
Bitwise-OR (|) — union
Bitwise-complement (~) — set complement (negated set)
Bitwise-and-complement (“A&~B”) — set difference (set minus)

In addition, there are a number of fast operations that can be useful for bit sets:

Integer zero — null set of bits.
Integer negative-one — full set of all 1s.
Bitwise “popcount” — set cardinality or number of elements.

Example code with these ideas for 32-bit sets implemented as unsigned integers:

    u != 0         // Test if any bit is set
    u3 = u2 & u1;  // Intersection of sets (Bitwise-AND)
    u3 = u2 | u1;  // Union of sets (Bitwise-OR)
    u3 = u2 ^ u1;  // Toggle bits in sets (Bitwise-XOR)
    u3 = ~u1;      // Set complement or inverse

The total number of bits set out of 32 can be computed fast as a “popcount” operation using intrinsic functions, such as “__popcnt” in Microsoft Visual Studio and “__builtin_popcount” for GCC (there are also versions for 64-bit longs). In x86 architectures, popcount is a single CPU instruction (POPCNT) implemented in hardware, and is therefore very fast.

Note that these C++ macros assume type “unsigned int” with 32 bits, and therefore 32 distinct bit flags in a single integer variable. For more bits, the “unsigned long” type could be used (64-bit), and there is also the “long long” type (128-bit).

The above macros would need to be changed to use type casts to “unsigned long” rather than just “unsigned” for a 64-bit version. For even more bits, a data structure called a “bit vector” can be implemented as an array of unsigned integers, which generalizes the bit set idea.

Bitwise Intrinsic Functions

Intrinsic functions, or “builtin” functions, are special C++ functions that are specific to the compiler environment. For example, Microsoft Visual Studio and GCC have different builtins. Intrinsics are usually implemented in very efficient ways, often directly mapping to CPU instructions, so they can be very powerful optimizations.

Some of the useful builtin functions for integer bitwise arithmetic are listed below. Most of these functions are for “int” or “unsigned int” (32-bit), but have other versions for long 64-bit or unsigned long 128-bit types. There isn't usually a version for “short” 16-bit integers.

Count Leading Zeros (CLZ): Various functions count the leading zeros, or similarly, the offset of the first set bit. This is scanning the bits from left-to-right and finding the most significant bit. One application of the CLZ intrinsic is a fast way to compute a truncated log2 of an integer, or similarly, computing the highest power-of-two in a number.

_BitScanReverse (Microsoft intrinsic <intrin.h>): Finds the most-significant bit in a 32-bit integer. There's also _BitScanReverse64.
clz: Count leading zeros (various versions); also sometimes called “nlz” for “number leading zeros”.
__lzcnt: Leading zeros count in Microsoft Windows intrinsics, use <intrin.h> for Microsoft Visual Studio C++.
__builtin_clz (count leading zeros): GCC function to count the number of leading prefix zeros in an unsigned integer.
_CountLeadingZeros: Microsoft <intrin.h> ARM intrinsics.

For all you silicon addicts, here's the CPU hardware instructions are underpin these intrinsics:

BSR: Bit Scan Reverse x86 assembler instruction.
LZCNT: x86 instruction for leading-zero count, similar to BSR.

Count Trailing Zeros (CTZ): Contrasting to the leading zero functions, these functions find the zeros on the right-hand-side of an integer. This is the least-significant bit.

_BitScanForward (Microsoft intrinsic <intrin.h>): Finds the least-significant bit set. Long int version is _BitScanForward64.
__builtin_ctz (count trailing zeros): GCC function counts zero bits on the right (least-significant bits).
ffs/ffsl: Find first set (least-significant bit).
__builtin_ffs (find first set): GCC function: find first set bit from the least significant bits (from the right bits).

The related x86 CPU hardware instructions are:

BSF: Bit Scan Forward x86 assembler instruction.
TZCNT: x86 instruction for trailing-zero count, similar to BSF.

If you'd rather code it yourself, there's Brian Kernighan's bit trick for LSB: bitwise-and of n and n-1 (i.e. in C++ n&(n-1) finds the lowest set bit). But using the intrinsics should be faster.

Popcount (Set Bits Count): The count of 1s in a number is known as the “popcount” (which is short for population count) and there are various intrinsics:

__builtin_popcount: GCC function to count the number of 1s in an unsigned integer.
BitOperations.PopCount: Microsoft intrinsic function for bitwise popcount.
__popcnt: AMD x86 popcount intrinsic using POPCNT x86 instruction (Microsoft platform)
_mm_popcnt_u32: Intel x86 popcount intrinsic using POPCNT x86 instruction (Microsoft platform); use <intrin.h> on MSVS C++.
__builtin_parity: GCC function tracking bitwise binary parity (whether the number of 1s is odd or even).

The x86 CPU hardware instruction is POPCNT, which computes the popcount faster than a hummingbird's wings.

Example: Integer Popcount

The “popcount” is short for “population count” of a binary number, and is the number of binary 1s in an integer number. This has applications such as quickly counting the number of elements in a bit set or bit vector.

Bitwise arithmetic can be used to check for a '1' value in each bit of an integer. Usually an unsigned type is used (as below), but bit twiddling of signed integers is also possible. This is the slow version in C++ that simply loops through each bit, checking if it is set:

   int aussie_popcount_basic(unsigned int x) 
   {
        // Count number of 1s
        const int bitcount = 8 * sizeof(x);
        int ct = 0;
        for (int i = 0; i < bitcount; i++) {
            if (AUSSIE_ONE_BIT_SET(x, 1u << i)) ct++;
        }
        return ct;
   }

Kernighan Popcount Algorithm: A faster version is to use a bit trick found by Brian Kernighan, author of The C Programming Language. For all values of n, the previous number n-1 has one less bit set. So, if you do bitwise-AND of n and n-1, it removes the rightmost bit that is 1 (i.e., least significant bit). Hence, you can use this to optimize popcount by only looping as many times as there are 1s in the number (rather than always doing 32 iterations). Here's the new C++ code:

   int aussie_popcount_kernighan_algorithm(unsigned int x) 
   {
        // Count number of 1s with Kernighan bit trick
        int ct = 0;
        while (x != 0) {
            x = x & (x - 1);  // Remove rightmost 1 bit
            ct++;
        }
        return ct;
   }

Intrinsic Popcount Functions: The Kernighan method is faster, but far from optimal. To do it super-fast, we have to look at existing builtin function primitives. For example, Microsoft intrinsics include “__popcnt” or “_mm_popcnt_u32” (in header file <intrin.h>) and the GCC library has a “__builtin_popcount” function, which count the number of 1s in an unsigned integer. On x86 CPUs, the underlying intrinsics should be using the x86 assembler instruction named POPCNT. Here is some example C++ code that works for Microsoft Visual Studio:

    int aussie_popcount_intrinsics2(unsigned int x)
    {
        return __popcnt(x);  // Microsoft intrinsics
    }

Obviously, a faster version is to declare this one-line function as “inline” in a header file, or to convert to a C++ preprocessor macro, such as:

    #define AUSSIE_POPCOUNT(x) (__popcnt((unsigned)(x)))

Example: Bitwise Log2 on Integers

Calculating the base-two logarithm of integers can be quite useful. There are various algorithms that use logarithms in AI.

Let's calculate the integer logarithm of an integer. This means we aren't doing the proper fractional logarithm of a number, but we are truncating it down to the nearest integer. For example, log2(7) will be truncated to 2, rather than 2.807. Note that we're assuming the input is unsigned numbers, since logarithms of negatives are undefined. Also, we have to decide how to handle zero, because log2(0) is undefined (or negative infinity if you prefer).

A simple way to implement a truncated integer log2 function is to use floating-point functions and type casts back to int:

    int aussie_log2_integer_slow(unsigned int u)  
    {
        // Slow float-to-int version
        return (int)log2f(u);
    }

This works, but it's inefficient to use floating-point arithmetic on integers. Surely there's a faster way?

After some thoughts about binary bits, we notice that log2 of an integer is just the index position of the highest bit in a number. The log2 of 1 is 0, because the '1' is in position 0. The log2 of 2 (binary 10) is 1 because the leftmost 1 is in position 1. The log2 of 4 (binary 100) is 2, where the 1 is in index 2. The number 7 is binary 111, so log2 is the position of the leftmost 1, which is position 2. So, log2(7) is the same as log2(4), but log2(8) is 3.

There are numerous builtin bitwise functions that can find the leftmost set bit. With sudden insight, we note that we can use “CLZ” (count leading zeros) to compute how many prefix zeros there are before the leftmost 1 bit (i.e. counts the zeros up to the most-significant bit from the left). We can then compute the bit index position from the right in a 32-bit integer as “32-CLZ”. It's on the right track, and a bit of testing shows that the formula to use is “32-CLZ-1”.

Here's some example code that uses this CLZ method to compute log2 of an integer. This works on Microsoft Visual Studio using the <intrin.h> header file to declare intrinsics.

    int aussie_log2_integer_clz_intrinsic(unsigned int u) 
    {
        // LOG2 using CLZ
        int clz = __lzcnt(u);  // Count leading zeros
        const int bits = 8 * sizeof(u);
        return bits - clz - 1;
    }

And here's the macro version for those who don't trust compilers to inline properly:

    #define AUSSIE_LOG2_LZCNT(u) \
     ((8 * sizeof(unsigned)) - (__lzcnt((unsigned)(u))) - 1)

And this is actually not optimal. We really should help the C++ optimizer by reordering this to move the “-1” subtraction operation next to the other constant, noting that “sizeof” is a compile-time constant expression in C++. Putting them together would make sure that the compiler correctly merges these operations using constant folding. On x86 implementations, the CLZ builtin functions are presumably using the x86 LZCNT or BSR assembler instructions, which are both similar and fast.

Bug alert! Note that you can't use “ffs” (find first set bit) for this log2 method, because it gives you the offset of the least-significant set bit (i.e. the rightmost bit rather than the leftmost bit). The other x86 instructions of TZCNT (Trailing Zeros Count) and BSF (Bit Scan Forward) are also incorrect.

Example: Highest Integer Power-of-Two

Another simple trick related to the log2 calculation is to truncate a number to its largest power-of-2. This is equivalent to the value of its leftmost bit in binary representation.

For example, 8 (binary 1000) stays as 8, because it's 2^3, but 7 (binary 111) reduces down to 4 (binary 100), which is 2^2. As with the truncated integer log2 calculation, this method focuses on computing the leftmost 1 bit, which is known as the Most-Significant Bit (MSB).

Whereas the log2 calculation found the index position of that MSB, this power-of-two calculation requires the value of the MSB. In other words, we need to find the bit that is the MSB, and then keep only that bit. A simple way to do this is to compute the log2 of the integer efficiently, and then left-shift a 1 by that many places (using unsigned type). The basic idea is:

   int bitoffset = log2_integer_fast(i);
   int highestpowerof2 = 1u << bitoffset;

Note that this doesn't handle cases like zero, so it still needs a bit of extra code polishing work.

Integer Overflow and Underflow

Integer arithmetic overflow and underflow have traditionally been ignored in C++ programs, mostly by assuming that operations won't exceed the range of 32-bit integers. Most platforms don't fail on integer overflow, and quietly continue, without even triggering a signal like SIGFPE (floating-point error).

The absence of runtime warnings can potentially leave insidious bugs in your code, and is also an undefended attack vector for security. Also, perhaps ignoring overflow isn't the best strategy if you're using integer operations for your AI model, such as with integer quantization. On the other hand, there's this weird stochastic feature of AI models, which is that they often get better when errors occur occasionally, because some randomness can be helpful. Even so, it's better to know what's going on.

Integers have a fixed range of numbers that they can represent. For example, a signed 16-bit integer represents the relatively small range of -32,768 to +32,767, and an unsigned 16-bit number can be from 0 to 65,535. A 32-bit signed integer has a much bigger range from about negative 2 billion (–2,147,483,648) to about positive 2 billion (+2,147,483,647). For an unsigned 32-bit integer, there's no negatives, and the range is from zero up to about 4 billion (+4,294,967,295). Feel free to memorize those numbers, as you'll be needing them at least once a decade. The ranges for 64-bit integers are massive numbers around 2^64, which is approximately decimal 10^19.

If integer arithmetic on a data type falls outside the range supported by that integer type, then an overflow or underflow occurs. There are symbolic constants for the minimum and maximum numbers for many types declared in the <limits.h> standard header file.

int — INT_MAX and INT_MIN
unsigned int — UINT_MAX and UINT_MIN

The effect of integer overflow or underflow is platform-specific, but on most platforms, it is usually: nothing! It's a silent insidious bug in many cases. For a signed integer, overflow quietly wraps around from positive to negative, and underflow does the reverse.

Here's an example of overflow of an int type:

    int x = INT_MAX;
    assert(x >= 0);
    ++x;  // Overflow!
    assert(x < 0);

And this is underflow of int:

    int x = INT_MIN;
    assert(x < 0);
    --x;  // Underflow!
    assert(x > 0);

Floating-point types can represent much larger magnitude numbers than integers. Hence, another way for an integer to overflow is in a conversion from floating-point numbers.

    float f = (float)INT_MAX * (float)INT_MAX;  // Fine!
    int x = (float)f;  // Overflow!

For an unsigned integer, the results are a little different, since negatives are not possible. Instead, overflow wraps around from a large number to zero, and underflow (going below zero) wraps around to the largest unsigned number.

Preventing Integer Arithmetic Overflow. There's not really a good way to detect arithmetic overflow or underflow before it happens. Post-testing is easier.

For example, GCC and Clang have some intrinsics, such as “__builtin_add_overflow” for addition, which use post-testing of the x86 CPU overflow or carry flags for detecting integer overflow, and return a Boolean flag which you can use. The GCC documentation say it uses “conditional jump on overflow after addition” and “conditional jump on carry” for unsigned overflow. Here's an example:

   if (__builtin_add_overflow(x, y, &z)) {
       // Overflow!
   }

The mainstream prevention strategy is simply to choose a big integer type (at least 32-bit) and then hope that no outliers occur in your input data. Most programmers let the overflow occur and then check. Or rather, just between you and me, most programmers simply don't even check at all!

Technically, integer overflow is “undefined behavior” on C++, and it's certainly non-portable, so you really should check. But most platforms handle it the same way, by quietly wrapping the integers around in two's complement form.

Increment overflow. For incrementing integers, you can do a pre-test like:

    if (INT_MAX == x) {
        // Overflow!
    }
    else {
        x++;  // Safe increment
    }

Addition overflow. And here's a version to pre-test addition of two positive integers for overflow:

    if (x > INT_MAX - y ) {  // x + y > INT_MAX
        // Overflow!
    }
    else {
        x += y;  // Add safely
    }

Multiplication overflow. The test for multiplication overflow is even worse because it uses division:

    if (x > INT_MAX / y ) {  // x * y > INT_MAX
        // Overflow!
    }
    else {
        x *= y;  // Multiply safely
    }

Head in the sand approach. Unfortunately, pre-testing for overflow is massively inefficient, as shown above. Do you really want to do this for every addition or increment? Even post-testing for overflow isn't much better. Overall, there's good reason why most C++ programmers just skip it, and hope for the best.

Overflow management. The alternative to ignoring the problem is to consider various different risk mitigation strategies for integer overflow:

Larger data types (e.g. long) for a larger range.
Use floating-point types instead.
Use unsigned type for non-negative variables (e.g. sizes, counts).
Use size_t for the unsigned variable type (it's standardized).
Enable compiler runtime checks (when debugging/testing)
Range checking input numbers (e.g. model weights).
Post-testing the sign of arithmetic results.
GCC and Clang intrinsic functions with overflow testing.
The <stdckdint.h> header file in C23 (that's the C standard, not C++23).
Safe integer class wrappers.

Runtime overflow detection. Some C++ compilers provide limited support for runtime error checking of arithmetic. The x86 CPU has builtin overflow detection, with a quietly-set overflow flag and a carry flag, which some C++ compiler-writers have made use of.

GCC has an “-ftrapv” option which elevates overflow errors (presumably by using post-checking). GCC has defined a number of C++ intrinsic functions which you can use to perform overflow-safe integer arithmetic, such as:

__builtin_add_overflow — addition
__builtin_mul_overflow — multiplication

Microsoft Visual Studio C++ provides the “/RTC” option, which stands for “Run-Time Checks”, or there's “Basic Runtime Checks” in the MSVS IDE Project Settings. However, these MSVS features don't check much for arithmetic overflow, with a focus on stack frame checking and uninitialized variables. The closest is “/RTCc” to detect data type truncations at runtime.

There's also a runtime debugging tool that focuses on integer overflow and other oddities. It's named “Undefined Behavior Sanitizer” or UBSAN for short. It works like Valgrind, by adding runtime instrumentation code.

Safe integer classes. Currently there's no standard safe integer types in C++, but adding them was unsuccessfully proposed in 2016. If you like a busy CPU, and what AI programmer doesn't, you can replace all int variables with “safe integer” class objects, with many examples of such classes available on the Internet. They're probably not as bad as I've implied, since C++ inlining should make the critical path quite short.

Missing Bitwise Operators: NAND, NOR, XNOR

Note that there's no simple operator for NOR, NAND or XNOR in C++. And you might need them, since neural networks uses these uncommon bitwise operations more than normal C++ coding. For example, XNOR is needed as the vector dot product operator for binarized bit vectors, such as in binary quantization and also XNOR neural networks.

These missing operators can be easily simulated using two C++ bitwise operations, with a binary bitwise operation and the “~” bitwise two's complement unary operator afterwards.

    NAND(x,y) = ~(x & y)
    NOR(x,y)  = ~(x | y)
    XNOR(x,y) = ~(x ^ y)

So, you can just code this as fast C++ macros, right?

    #define NAND(x,y) ~(x & y)  // Bug alert!
    #define NOR(x,y)  ~(x | y)
    #define XNOR(x,y) ~(x ^ y)

No, this is broken in about half a dozen ways. To write macros correctly, you need to ensure there's parentheses around the whole expression, and also around each parameter name, to avoid getting bitten by C++ macro expansion operator precedence problems. And these macros also don't work correctly if you pass in a non-unsigned integer.

Here's some example C++ macros that work for 32-bits:

    #define AUSSIE_BITWISE_NAND(x,y) \
      (~(((unsigned)(x)) & ((unsigned)(y))))
    #define AUSSIE_BITWISE_NOR(x,y)  \
      (~(((unsigned)(x)) | ((unsigned)(y))))
    #define AUSSIE_BITWISE_XNOR(x,y) \
      (~(((unsigned)(x)) ^ ((unsigned)(y))))

You could also declare these macros as “inline” functions if you prefer. Note that these macros have a lot of parentheses to avoid various insidious precedence errors, and they also are limited to 32-bit operations. For 64-bit, you'd need to create alternative “unsigned long” versions.

These NAND/NOR/XNOR macros are convenient, but not very efficient since they perform two arithmetic operations. Single-operation versions are available in assembler if you really need them, accessible via C++ builtin intrinsic functions such as:

_kxnor — x86 intrinsic for XNOR bitwise operation.
KXNORW/KXNORB/KXNORQ/KXNORD — x86 assembler bitwise XNOR operations.
VPTESTNMB/VPTESTNMW/VPTESTNMD/VPTESTNMQ — x86 assembler bitwise NAND operations.

Note for the sake of completeness that there are more weird bitwise operators that do different things on a pair of bits. There are four input combinations and therefore 16 possible binary operator functions. There are three C++ bitwise operators (AND/OR/XOR), plus the three extra ones coded above (NAND/NOR/XNOR), two trivial always-zero and always-one operations, two copy-operand functions, and six other ones that are equivalent to variations with negated operands (e.g. “x&~y” is one). I'm not sure why you needed to know that.

Bitwise AI Applications

Bitwise operations are a well-known coding trick that has been applied to neural network optimization. Bitwise-shifts can be equivalent to multiplication and division, but faster. Other bitwise operators can also be used in various ways in inference algorithms. Some of the common uses of bitwise operators in AI engines include:

Arithmetic computation speedups: Bit tricks are used in optimizations of multiplication operations with bitshifts, and also faster approximate arithmetic methods.
Sign bit manipulation: Various optimizations are possible by direct bitwise operations on the sign bit of integers or floating-point numbers. For example, the RELU activation function tests for negatives, which are changed to zero, but positive values are unchanged. This can be implemented efficiently as a sign bit test.
floating-point bit operations: The bits of the numeric representations of IEEE 754 floating-point numbers, or the Google bfloat16 type, include a sign bit, an exponent, and a mantissa. Normal bitwise arithmetic operators cannot be applied to floating-point numbers, because the C++ bitwise and bitshift operators only work on integer types. However, floating-point numbers are really just integers underneath, so there are various tricky ways that bitwise operators can be used on the underlying IEEE standard bit representations that are used by floating-point numbers. This is discussed in the next chapter.
Look-up Tables: Algorithms that use table lookups for speed improvement typically involve bitwise shifts in computing the table offset.
Data structures: Some data structures used in optimization of neural networks that involve bits include hashing and Bloom filters.

Bits of AI Research: Some of the advanced areas where bitwise optimizations have been used in neural network research include:

Power-of-two quantization (bitshift quantization): By quantizing weights to the nearest integer power-of-two, bitwise shifts can replace multiplication.
Bitserial Operations: Bitserial operations are bitwise operations on all of the bits of an integer or bit vector. For example, the “popcount” operation counts how many 1s are set in the bits of an unsigned integer. The bitserial operations can be useful in neural network inference for computing the vector dot products in binary quantization or 2-bit quantization.
Advanced number system division: See dyadic numbers and dyadic quantization for an obscure number system involving power-of-two division, which can be implemented as bitwise right-shifting.
Low-bit integer quantization: When quantized to only a few bits, inference can use bitwise arithmetic and bitserial operations to replace multiply-accumulate. The main examples are binary quantization and ternary quantization, both of which avoid multiplications in favor of bitwise operations (or addition) and sign bit handling.
Shift-add networks: Multiply-and-add (or “multiply-accumulate”) can be replaced with bitshift-and-add.
Bit arithmetic neural networks. These are neural networks where the neurons operate as bitwise operations. For example, see Weightless Neural Networks (WNNs).
XNOR Networks: XNOR neural networks are similar to binarized networks. Their internal operations rely on the bitwise XNOR operation. The idea is that XNOR is actually an implementation of the multiplication operation on binary values. XNOR is an uncommonly used bitwise operation, and there's no builtin C++ operator for binary XNOR. However, there is always hardware XNOR support, such as a 64-bit XNOR instruction in the x86 CPU instruction set.

References on Bitwise Operations

If I've whetted your appetite for bit fiddling magic, there's plenty more:

Sean Eron Anderson (2005), Bit Twiddling Hacks, Stanford University, https://graphics.stanford.edu/~seander/bithacks.html
Ian Brayoni (2020), https://github.com/ianbrayoni/bithacks (Python code inspired by Sean Eron Anderson's Bit Twiddling Hacks.)
Henry S Warren (2012), Hacker's Delight, 2nd Edition, Addison-Wesley Professional, https://www.amazon.com/Hackers-Delight-2nd-Henry-Warren/dp/0321842685 Code: https://github.com/hcs0/Hackers-Delight
Antonio Gulli (2014), A Collection of Bit Programming Interview Questions solved in C++ Kindle Edition, https://www.amazon.com.au/Collection-Programming-Interview-Questions-solved-ebook/dp/B00KIIDPUG/
Jörg Arndt (2010), Matters Computational: Ideas, Algorithms, Source Code, https://dl.acm.org/doi/10.5555/1941953, https://www.jjj.de/fxt/fxtpage.html#fxtbook, Code: https://www.jjj.de/bitwizardry/bitwizardrypage.html
Sigrid/Jasper Neuman (2023), Programming pages, http://programming.sirrida.de/
Harold (2023), Bits, Math and Performance, Sep 2023, http://bitmath.blogspot.com/
Stephan Brumme (2023), The bit twiddler, https://bits.stephan-brumme.com/
Gurmeet Manku (2008), Fast Bit Counting, 5 Aug 2008, https://gurmeet.net/puzzles/fast-bit-counting-routines/

• Next: Chapter 9. Floating Point

• Up: Table of Contents

• Buy: Generative AI in C++: Coding Transformers and LLMs

The new AI programming book by Aussie AI co-founders:

AI coding in C++
Transformer engine speedups
LLM models
Phone and desktop AI
Code examples
Research citations

Get your copy from Amazon: Generative AI in C++