Aussie AI

12. Pointer Arithmetic

Book Excerpt from "Generative AI in C++"

by David Spuler, Ph.D.

“I feel the need—the need for speed!”

— Top Gun, 1986.

What is Pointer Arithmetic?

Pointer arithmetic is a tricky C++ optimization that can be used to get rid of incremented variables in loops. Instead, a pointer can be incremented each loop iteration. This changes an array access “arr[i]” into a pointer access “*ptr” and is usually faster.

What is pointer arithmetic? Arrays and pointers are buddies in C++ and there's a way that mathematical arithmetic operators can work on both. Consider the declarations:

    int arr[10];
    int *ptr;

To start with, we can set the pointer at the array, and C++ allows us to use index notation on a pointer:

    ptr = arr;
    x = ptr[3];

Here, x will get the value of arr[3] via ptr[3]. The pointer and array are equivalent. Note that the “&” address-of operator can be optionally used here. We could have written “ptr=&arr” to copy the address, but it's optional.

C++ allows array index accesses on pointers with “ptr[3]” as above. We can also do this using “pointer arithmetic” with the “+” operator and the “*” pointer de-reference operator:

    x = *(ptr + 3);  // Same as ptr[3]

The expression “ptr+3” is the address of the third element in the array (i.e., &arr[3]), and the “*” dereference operator gets the value pointed to by the pointer (i.e. arr[3]).

Why does this work? If ptr is pointing to the start of an integer, shouldn't “ptr+3” be a weird address in the middle of an integer?

No, because C++ does “pointer arithmetic” on pointers. Because “ptr” is an “int*” type pointer, the compiler knows to work on “int” data. With pointer arithmetic, the “+” operation adds a multiple of the bytes of the size of int types. So “ptr+1” is not the address 1 more than ptr, it's actually 4 more than ptr for a 4-byte int (assuming 32-bit integers). And “ptr+3” is actually the address “ptr+12” in terms of bytes.

Which Operators Do Pointer Arithmetic? Pointer arithmetic works with a number of arithmetic operators:

Increment — ptr++ adds 1*size bytes to ptr.
Decrement — ptr-- subtracts 1*size bytes from ptr.
Addition — ptr + n adds n*size bytes.
Subtraction — ptr-n subtracts n*size bytes.
Assign-Add — ptr += n adds n*size bytes to ptr.
Assign-Subtract — ptr -=n subtracts n*size bytes from ptr.

Note that there's no pointer arithmetic multiplication or division. Actually, I was told that C++37 was going to have a C++ pointer multiplication operator that scanned down an array doing paired multiplications, adding them up as it went, and all in one CPU cycle, but then someone woke me up.

Pointer Comparisons: You can also compare pointers, which isn't really doing any special pointer arithmetic, but works as normal comparisons on their addresses:

Equality tests — ptr1 == ptr2 or ptr1 != ptr2
Less than — ptr1 < ptr2 or ptr1 <= ptr2
Greater than — ptr2 > ptr2 or ptr1 >= ptr2

Segmented Memory Model Pointer Comparisons: Note that there's a weird portability gotcha in relative pointer comparisons (i.e. less-than or greater-than). They're only guaranteed to work in very limited scenarios by the C++ standard, such as when the pointers are both operating over the same array data. Programmers tend to think of the address space as one huge contiguous range of addresses, where you can compare all of the pointers in the program against each other, and make some coding assumptions based on that. However, there are architectures where pointer addressing is more complicated, such as where pointers are a multi-part number pointing into different memory banks with a more convoluted segmented addressing scheme. For example, pointers to allocated heap memory might be separate from the pointers to global static data, and not easily comparable.

Pointer Differences: You can subtract two pointers using the normal “-” subtraction operator. The result is not the number of bytes between them, but the number of objects. Hence, the two pointers must be of the same type (i.e., pointing to the same type of object). Consider this code:

    int arr[10];
    int *ptr1 = &arr[1];
    int *ptr2 = &arr[2];
    int diff = ptr2 - ptr1;

The value of “diff” should be 1 in C++ (rather than 4 bytes), because the two pointers are one element apart (i.e. 1 integer difference). Note that “diff” is a signed integer here, and the value of subtracting two pointers can be negative (e.g. “ptr1-ptr2” above would be “-1” instead). Technically, the official type of the difference between two pointers is “std::ptrdiff_t” which is an implementation-specific integral signed type that you can use if you are the sort of person who alphabetizes their pantry.

Adding Pointers Fails: Note that adding two pointers with “ptr1 + ptr2” is meaningless and usually a compilation error. Also invalid are weird things like the “+=” or “-=” operators on two pointers. Even though “-” is valid on two pointers, “ptr1-=ptr2” fails to compile because the result of “ptr1-ptr2” is a non-pointer type.

Char Star Pointers (Size 1 Byte): Note that if you want to avoid pointer arithmetic, and see the actual numeric value of addresses, you can use a “char*” type pointer (or “unsigned char*”). Since sizeof(char) is 1 byte, then all of the pointer arithmetic will just add the expected number of bytes (e.g. ptr++ on a char* pointer adds 1 to the address). If you want to know the actual number of bytes between two pointers, then cast them to “char*” type before doing the pointer subtraction.

    int diffbytes = (char*)ptr2 - (char*)ptr1;

Stride of an Array. A useful piece of terminology when processing lots of AI model data in memory is the “stride” of an array. This means the number of bytes between adjacent array elements. We can try to compute it as follows:

    int arr[100];
    int stride = &arr[2] - &arr[1];  // Wrong

Nope, that's a fail. This isn't the stride, because it did pointer arithmetic. The addresses of array elements are really pointers, so the stride variable above is always 1 (the adjacent elements are 1 apart in pointer arithmetic). We need to convert to char pointers to get the stride in bytes.

    int arr[100];
    int stride = (char*)&arr[2] - (char*)&arr[1];

Can't we just use sizeof to get the stride? Isn't the stride above going to equal 4, which is sizeof(int)? Yes, in the example above the use of sizeof is correct, but no, that is not true in general. The stride will often equal the element size, but may be larger. For a simply packed array of integers or other simple types, the stride is almost certainly the size of the array element type. But this is not always true, such as if it's an array of a larger object with an awkward size that requires padding bytes for address alignment considerations.

Loop Unrolling Stride. The term “stride” also has a secondary meaning when talking about array processing with loop unrolling. The stride of an unrolled loop is how long of a segment is being processed in each section of loop unrolling code. For example, if a loop is unrolled with AVX-2's 256-bit registers (equals 8 32-bit floats), then the stride when discussed in the literature is either 8 floats or 8x4=32 bytes.

Void Pointer Arithmetic Fails: Note also that pointer arithmetic on a generic “void*” pointer should be a compile error, because it points to unknown size objects. Some C++ compilers will allow pointer arithmetic on void pointers with a warning, and pretend it's a “char*” pointer instead.

Finally, I don't think you can increment a “function pointer” in valid pointer arithmetic, but you're welcome to try.

Pointers and Arrays

There is a close relationship in C++ between arrays and pointers. Array names are, in many ways, just pointers to the first element in the array. The array indexing operation is identical to a pointer expression involving address arithmetic. The following algebraic identities hold:

    array[exp] == *(array + exp)
    &array[exp] == array + exp

These relationships have a number of consequences. First, the commutativity of + means that exp1[exp2] is equivalent to exp2[exp1], which leads to weird syntax tricks like “n[ptr]” instead of “ptr[n]”.

Another consequence is that, in many situations, pointers can be used instead of arrays. For example, it is legal to apply the array indexing operator (i.e. square brackets) to a pointer. For example:

    x = ptr[3];

Just like arr[3], this sets x to equal the third element away from ptr, where ptr is pointing into an array.

Array Function Parameters: The array and function relationship is complicated when an array is a function parameter. When an array is passed to a function, the address of the first element of the array is passed. An array formal parameter is implemented as a pointer variable (i.e. a pointer pointing to the start of the array).

This explains why arrays are passed by reference, not by value. A local copy of the array is not used inside the function. Instead, a pointer to the original array is used. Hence, any change to an element of the local array variable is actually changing the original array (i.e. pass-by-reference instead of pass-by-value).

The differences between pointers and arrays are few. The main one is that an array name is not a variable, whereas a pointer is. Hence, an ordinary array name declared as a local variable cannot be assigned to, or incremented, whereas a local pointer variable can be. An array is similar to a constant pointer (e.g. int *const ptr). Note that this is untrue when the array is a function parameter, when it can be incremented or modified.

There are also the differences between pointers and arrays in relation to initializations. Consider the two initializations:

   char *p = "hello";
   char arr[100] = "hello";

For the pointer p, the string "hello" is stored in separate memory. Only the required number of bytes are allocated (six, because of the extra character zero added by the compiler to terminate the string). For the character array “arr”, 100 bytes are allocated, but only the first six are filled.

Pointer Arithmetic Loop Optimizations

The main way that we use pointer arithmetic for optimization is to change a loop over an array into loop pointer arithmetic. Note that this is primarily a sequential code optimization, and does not change anything in terms of vectorization for parallel execution.

Pointer arithmetic is mainly used to get rid of an incrementer variable in sequential code. Here's a vector dot product with basic incremented loop variable i++ and array index syntax v1[i] used inside the loop:

    float aussie_vecdot_basic(float v1[], float v2[], int n)
    {
        // Basic vector dot product
        float sum = 0.0f;
        for (int i = 0; i < n; i++) {
            sum += v1[i] * v2[i];
        }
        return sum;
    }

And here's the same code when converted to pointer arithmetic:

    float aussie_vecdot_ptr(float v1[], float v2[], int n)
    {
        // Pointer arithmetic vector dot product
        float sum = 0.0f;
        float* endv1 = v1 + n;  // v1 plus n*4 bytes
        for (; v1 < endv1; v1++,v2++) {
                sum += (*v1) * (*v2);
        }
        return sum;
    }

How does this work? We got rid of the temporary variable “i” by using pointer arithmetic “*v1” instead of array indices “v1[i]”. We are also using the function parameters “v1” and “v2” as temporary local variables, as permitted in C++, so we don't need an extra temporary pointer variable.

The way this works with pointer arithmetic is v1 and v2 are treated as pointers, which works due to the near-equivalence of pointers and arrays in C++. Rather than using an array index “i” we increment both these pointer-array variables:

    v1++,v2++

These for loop incrementers “v1++” and “v2++” are both adding 4 bytes (the size of a 32-bit float) to the pointers. Also note these two increment statements are separated by the C++ comma operator, not by a semicolon.

The “endv1” end marker is calculated as the address of “v1[0]” plus “n*4” bytes, because the “+” operator in “v1+n” is pointer arithmetic addition, which is auto-scaled by the size of the pointed-to object (i.e., 4 bytes for 32-bit float here), rather than normal integer addition.

Note that a further micro-optimization is possible. We can change the less-than test (“v1 < endv1”) to an inequality test (“v1 != endv1”), because equality tests are slightly faster than less-than tests. Since this test is effectively inside the loop and done every iteration, this might be worth doing.

The trade-off is safety: it'll become an infinite loop if you get the pointer math slightly wrong, but hey, your code has no bugs, right?

Smart Pointers

Smart pointers are a programming idiom to make C++ pointers safer. They are not a speed optimization, and in fact, they are a wrapper that adds extra logic around the use of a raw pointer, and will be marginally slower. However, they avoid many C++ pointer pitfalls, thereby improving reliability, and will reduce total allocated memory usage by avoiding memory leaks. There may even be an indirect benefit to execution speed if overall memory management is improved.

Programmers have been defining their own smart pointer wrapper classes for decades, but there is now standard support for the idea in the C++ library. In the typical idiom, a smart pointer tracks the creation and destruction of the object it points to, which ensures that the destructor is called. This helps avoid “memory leaks” in standard C++ pointers where an object is allocated with “new”, but is never deallocated by “delete”.

The C++ standard libraries have various templates to support smart pointers, mostly since C++11, so they are longstanding features.

std::shared_ptr
std::unique_ptr
std::weak_ptr

std::shared_ptr is a reference-counted shared pointer implementation. The idea is that it tracks the total number of pointers to an object, and then automatically destroys the object whenever there's no more pointers to it. This occurs when the last of the “shared_ptr” objects is itself destroyed, and then the reference count for the underlying object is zero.

std::unique_ptr is a one-to-one mapping of a smart pointer to an object. Whenever the unique_ptr object is destroyed (e.g. goes out of scope as a local variable), then both the smart pointer and its underlying object are destroyed or otherwise cleaned up. The unique_ptr object can refer to a single object allocated by “new” or a single array-of-objects allocated by the “new[]” operator.

std::weak_ptr is a less commonly used type that has relevance to std::shared_ptr in some complicated scenarios. Usually, you should choose either of std::unique_ptr or std::shared_ptr, depending on how many pointers will point to the underlying object.

Pointers vs References

Overall, pointers are a good and bad feature of C++. They are low-level variables that allow efficient processing of memory addresses, so we can code some very fast methods with pointers. They allow us to get very close to the machine.

On the downside, there are pointer pitfalls. Pointers trip up novices and experienced programmers alike. There is an immense list of common faults with pointer manipulation, and coding problems with pointers and memory management are probably half of the causes of bugs in C++ (at least). There are some tools that mitigate against pointer problems (e.g. Linux Valgrind) but it is a never-ending battle against them.

Pointers and arrays were implemented very similarly, and came from the earliest designs of the original C language. Basically, arrays are treated as a specific type of pointer, with various differences depending on whether they are variables or function parameters.

Then came C++ to the rescue. References arrived with the new-fangled programming language (cleverly named as “C++”) and were thoughtfully designed as a type of safe pointer that cannot be null, but is just as efficient as a pointer because the constraints on references are enforced at compile-time.

C++ allows two ways to indirectly refer to an object without needing to create a whole new copy: pointers and references. The syntax is either “*” or “&” for their declarations.

    MyVector *myptr = &mv;  // Pointer to mv object
    MyVector &myref = mv;   // Reference to mv object

Pointers and references are more efficient than spinning up a whole new copy of the object, especially when the underlying object is a complicated object. And when you have a function call, you should definitely avoid sending in a whole object.

    void processit(MyVector v)  // Slow
    {
        // ....
    }

This is inefficient because the whole MyVector object will get copied, via whatever copy constructor you have defined, which is slow. And if you haven't defined a copy constructor, then the compiler uses default bitwise copy of a structure, which is not only slow, but also rarely what you want, and often a bug.

The faster reference version is to use a “const” reference (or non-const if you're modifying it inside the function):

    void processit(const MyVector & v) // Reference argument
    {
        // ....
    }

The pointer version is:

    void processit(MyVector * v)  // Pointer argument
    {
        // ....
    }

Which is faster in C++ — pointers or references? The short answer of “not any difference” is the general view, because references are implemented as pointers by the compiler behind the scenes. The two functions above are not going to be significantly different in terms of speed.

The slightly longer answer is that references can be faster because there's no null case. A reference must always be referring to an object for the duration of its scope. The C++ compiler ensures that references cannot occur without an object:

    MyVector &v;          // Cannot do this
    MyVector &v = NULL;   // Nor this
    MyVector &v = 0;      // Nor this

A reference must be initialized from an object, and you cannot set references equal to pointers, because you actually have to de-reference the pointer with the “*” operator, which crashes if it's a null pointer:

    MyVector &v = myptr;  // Disallowed
    MyVector &v = *myptr; // Works if non-null

There's no way in C++ to get a zero value into a reference variable (we hope). For example, the address-of operator (&) applied to a reference variable returns the address of the referenced object, not the memory location of the reference itself. Hence, references are always referring to something and they cannot be equivalent to the null pointer.

References are slightly faster: The guarantee of an object for a reference fixes all those null pointer core dumps, and also relieves the programmer of the burden of testing for null pointers. The compiler does this guarantee for references at compile-time, so there's no hidden null check being done by the compiler at run-time, making it efficient. So, there's a minor speed improvement from using references, by not having to add safety checks for “ptr!=NULL” throughout the function call hierarchy.

Pointers can be better than references if you need a “null” situation to occur. For example, you're processing an object that may or may not exist, and you need the pointer to be allowed to be “NULL” if there's no object. This should occur rarely, and references should be preferred in many cases.

And finally, references aren't very useful when you're trying to scan through the data in vectors, matrices, or tensors in an AI engine. You can't do pointer arithmetic on a reference in C++.

• Next: Chapter 13. Algorithm Speedups

• Up: Table of Contents

• Buy: Generative AI in C++: Coding Transformers and LLMs

The new AI programming book by Aussie AI co-founders:

AI coding in C++
Transformer engine speedups
LLM models
Phone and desktop AI
Code examples
Research citations

Get your copy from Amazon: Generative AI in C++

Aussie AI

12. Pointer Arithmetic

What is Pointer Arithmetic?

Pointers and Arrays

Pointer Arithmetic Loop Optimizations

Smart Pointers

Pointers vs References

Quick Links

Product

New to Writing?

Writing Styles