Aussie AI

Bitwise Intrinsic Functions

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

Bitwise Intrinsic Functions

Intrinsic functions, or “builtin” functions, are special C++ functions that are specific to the compiler environment. For example, Microsoft Visual Studio and GCC have different builtins. Intrinsics are usually implemented in very efficient ways, often directly mapping to CPU instructions, so they can be very powerful optimizations.

Some of the useful builtin functions for integer bitwise arithmetic are listed below. Most of these functions are for “int” or “unsigned int” (32-bit), but have other versions for long 64-bit or unsigned long 128-bit types. There isn't usually a version for “short” 16-bit integers.

Count Leading Zeros (CLZ): Various functions count the leading zeros, or similarly, the offset of the first set bit. This is scanning the bits from left-to-right and finding the most significant bit. One application of the CLZ intrinsic is a fast way to compute a truncated log2 of an integer, or similarly, computing the highest power-of-two in a number.

  • _BitScanReverse (Microsoft intrinsic <intrin.h>): Finds the most-significant bit in a 32-bit integer. There's also _BitScanReverse64.
  • clz: Count leading zeros (various versions); also sometimes called “nlz” for “number leading zeros”.
  • __lzcnt: Leading zeros count in Microsoft Windows intrinsics, use <intrin.h> for Microsoft Visual Studio C++.
  • __builtin_clz (count leading zeros): GCC function to count the number of leading prefix zeros in an unsigned integer.
  • _CountLeadingZeros: Microsoft <intrin.h> ARM intrinsics.

For all you silicon addicts, here's the CPU hardware instructions are underpin these intrinsics:

  • BSR: Bit Scan Reverse x86 assembler instruction.
  • LZCNT: x86 instruction for leading-zero count, similar to BSR.

Count Trailing Zeros (CTZ): Contrasting to the leading zero functions, these functions find the zeros on the right-hand-side of an integer. This is the least-significant bit.

  • _BitScanForward (Microsoft intrinsic <intrin.h>): Finds the least-significant bit set. Long int version is _BitScanForward64.
  • __builtin_ctz (count trailing zeros): GCC function counts zero bits on the right (least-significant bits).
  • ffs/ffsl: Find first set (least-significant bit).
  • __builtin_ffs (find first set): GCC function: find first set bit from the least significant bits (from the right bits).

The related x86 CPU hardware instructions are:

  • BSF: Bit Scan Forward x86 assembler instruction.
  • TZCNT: x86 instruction for trailing-zero count, similar to BSF.

If you'd rather code it yourself, there's Brian Kernighan's bit trick for LSB: bitwise-and of n and n-1 (i.e. in C++ n&(n-1) finds the lowest set bit). But using the intrinsics should be faster.

Popcount (Set Bits Count): The count of 1s in a number is known as the “popcount” (which is short for population count) and there are various intrinsics:

  • __builtin_popcount: GCC function to count the number of 1s in an unsigned integer.
  • BitOperations.PopCount: Microsoft intrinsic function for bitwise popcount.
  • __popcnt: AMD x86 popcount intrinsic using POPCNT x86 instruction (Microsoft platform)
  • _mm_popcnt_u32: Intel x86 popcount intrinsic using POPCNT x86 instruction (Microsoft platform); use <intrin.h> on MSVS C++.
  • __builtin_parity: GCC function tracking bitwise binary parity (whether the number of 1s is odd or even).

The x86 CPU hardware instruction is POPCNT, which computes the popcount faster than a hummingbird's wings.

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++