Aussie AI
12. Pointer Arithmetic
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
“I feel the need—the need for speed!”
— Top Gun, 1986.
What is Pointer Arithmetic?
Pointer arithmetic is a tricky C++ optimization
that can be used to get rid of incremented variables in loops.
Instead, a pointer can be incremented each loop iteration.
This changes an array access “arr[i]
” into a pointer access “*ptr
” and is usually faster.
What is pointer arithmetic? Arrays and pointers are buddies in C++ and there's a way that mathematical arithmetic operators can work on both. Consider the declarations:
int arr[10]; int *ptr;
To start with, we can set the pointer at the array, and C++ allows us to use index notation on a pointer:
ptr = arr; x = ptr[3];
Here, x
will get the value of arr[3]
via ptr[3]
.
The pointer and array are equivalent.
Note that the “&
” address-of operator can be optionally used here.
We could have written “ptr=&arr
” to copy the address, but it's optional.
C++ allows array index accesses on pointers with “ptr[3]
” as above. We can also do this using “pointer arithmetic” with the “+
” operator
and the “*
” pointer de-reference operator:
x = *(ptr + 3); // Same as ptr[3]
The expression “ptr+3
” is the address of the third element in the array (i.e., &arr[3]
), and the “*
” dereference operator gets the value
pointed to by the pointer (i.e. arr[3]
).
Why does this work? If ptr
is pointing to the start of an integer, shouldn't “ptr+3
” be a weird address in the middle of an integer?
No, because C++ does “pointer arithmetic” on pointers.
Because “ptr
” is an “int*
” type pointer, the compiler knows to work on “int
” data.
With pointer arithmetic, the “+
” operation adds a multiple of the bytes of the size of int
types.
So “ptr+1
” is not the address 1 more than ptr
, it's actually 4 more than ptr
for a 4-byte int
(assuming 32-bit integers).
And “ptr+3
” is actually the address “ptr+12
” in terms of bytes.
Which Operators Do Pointer Arithmetic? Pointer arithmetic works with a number of arithmetic operators:
- Increment —
ptr++
adds1*size
bytes toptr
. - Decrement —
ptr--
subtracts1*size
bytes fromptr
. - Addition —
ptr + n
addsn*size
bytes. - Subtraction —
ptr-n
subtractsn*size
bytes. - Assign-Add —
ptr += n
addsn*size
bytes toptr
. - Assign-Subtract —
ptr -=n
subtractsn*size
bytes fromptr
.
Note that there's no pointer arithmetic multiplication or division. Actually, I was told that C++37 was going to have a C++ pointer multiplication operator that scanned down an array doing paired multiplications, adding them up as it went, and all in one CPU cycle, but then someone woke me up.
Pointer Comparisons: You can also compare pointers, which isn't really doing any special pointer arithmetic, but works as normal comparisons on their addresses:
- Equality tests —
ptr1 == ptr2
orptr1 != ptr2
- Less than —
ptr1 < ptr2
orptr1 <= ptr2
- Greater than —
ptr2 > ptr2
orptr1 >= ptr2
Segmented Memory Model Pointer Comparisons: Note that there's a weird portability gotcha in relative pointer comparisons (i.e. less-than or greater-than). They're only guaranteed to work in very limited scenarios by the C++ standard, such as when the pointers are both operating over the same array data. Programmers tend to think of the address space as one huge contiguous range of addresses, where you can compare all of the pointers in the program against each other, and make some coding assumptions based on that. However, there are architectures where pointer addressing is more complicated, such as where pointers are a multi-part number pointing into different memory banks with a more convoluted segmented addressing scheme. For example, pointers to allocated heap memory might be separate from the pointers to global static data, and not easily comparable.
Pointer Differences:
You can subtract two pointers using the normal “-
” subtraction operator.
The result is not the number of bytes between them, but the number of objects.
Hence, the two pointers must be of the same type (i.e., pointing to the same type of object).
Consider this code:
int arr[10]; int *ptr1 = &arr[1]; int *ptr2 = &arr[2]; int diff = ptr2 - ptr1;
The value of “diff
” should be 1
in C++ (rather than 4 bytes), because the two pointers are one element apart (i.e. 1 integer difference).
Note that “diff
” is a signed integer here, and the value of subtracting two pointers can be negative (e.g. “ptr1-ptr2
” above would be “-1
” instead).
Technically, the official type of the difference between two pointers is “std::ptrdiff_t
” which is an implementation-specific
integral signed type that you can use if you
are the sort of person who alphabetizes their pantry.
Adding Pointers Fails:
Note that adding two pointers with “ptr1 + ptr2
” is meaningless and usually a compilation error.
Also invalid are weird things like the “+=
” or “-=
” operators on two pointers.
Even though “-
” is valid on two pointers, “ptr1-=ptr2
” fails to compile
because the result of “ptr1-ptr2
” is a non-pointer type.
Char Star Pointers (Size 1 Byte):
Note that if you want to avoid pointer arithmetic, and see the actual numeric value of addresses,
you can use a “char*
” type pointer (or “unsigned char*
”).
Since sizeof(char)
is 1 byte, then all of the pointer arithmetic will just add the expected number of bytes
(e.g. ptr++
on a char*
pointer adds 1 to the address).
If you want to know the actual number of bytes between two pointers, then cast them to “char*
” type before
doing the pointer subtraction.
int diffbytes = (char*)ptr2 - (char*)ptr1;
Stride of an Array. A useful piece of terminology when processing lots of AI model data in memory is the “stride” of an array. This means the number of bytes between adjacent array elements. We can try to compute it as follows:
int arr[100]; int stride = &arr[2] - &arr[1]; // Wrong
Nope, that's a fail. This isn't the stride, because it did pointer arithmetic. The addresses of array elements
are really pointers, so the stride variable above is always 1 (the adjacent elements are 1 apart in pointer arithmetic).
We need to convert to char
pointers to get the stride in bytes.
int arr[100]; int stride = (char*)&arr[2] - (char*)&arr[1];
Can't we just use sizeof
to get the stride?
Isn't the stride above going to equal 4
, which is sizeof(int)
?
Yes, in the example above the use of sizeof
is correct, but no, that is not true in general.
The stride will often equal the element size, but may be larger.
For a simply packed array of integers or other simple types, the stride is almost certainly the size of the array element type.
But this is not always true, such as if it's an array of a larger object with an awkward size that requires padding bytes for address alignment considerations.
Loop Unrolling Stride. The term “stride” also has a secondary meaning
when talking about array processing with loop unrolling.
The stride of an unrolled loop is how long of a segment is being processed
in each section of loop unrolling code.
For example, if a loop is unrolled with AVX-2's 256-bit registers (equals 8 32-bit float
s),
then the stride when discussed in the literature is either 8 float
s or 8x4=32 bytes.
Void Pointer Arithmetic Fails:
Note also that pointer arithmetic on a generic “void*
” pointer should be a compile error,
because it points to unknown size objects.
Some C++ compilers will allow pointer arithmetic on void pointers with a warning, and pretend it's a “char*
” pointer instead.
Finally, I don't think you can increment a “function pointer” in valid pointer arithmetic, but you're welcome to try.
Pointers and Arrays
There is a close relationship in C++ between arrays and pointers. Array names are, in many ways, just pointers to the first element in the array. The array indexing operation is identical to a pointer expression involving address arithmetic. The following algebraic identities hold:
array[exp] == *(array + exp) &array[exp] == array + exp
These relationships have a number of consequences. First, the commutativity of +
means
that exp1[exp2]
is equivalent to exp2[exp1]
, which leads to weird syntax tricks like “n[ptr]
” instead of “ptr[n]
”.
Another consequence is that, in many situations, pointers can be used instead of arrays. For example, it is legal to apply the array indexing operator (i.e. square brackets) to a pointer. For example:
x = ptr[3];
Just like arr[3]
, this sets x
to equal the third element away from ptr
, where ptr
is pointing into an array.
Array Function Parameters: The array and function relationship is complicated when an array is a function parameter. When an array is passed to a function, the address of the first element of the array is passed. An array formal parameter is implemented as a pointer variable (i.e. a pointer pointing to the start of the array).
This explains why arrays are passed by reference, not by value. A local copy of the array is not used inside the function. Instead, a pointer to the original array is used. Hence, any change to an element of the local array variable is actually changing the original array (i.e. pass-by-reference instead of pass-by-value).
The differences between pointers and arrays are few. The main one is that an array
name is not a variable, whereas a pointer is.
Hence, an ordinary array name declared as a local variable
cannot be assigned to, or incremented, whereas a local pointer variable can be.
An array is similar to a constant pointer (e.g. int *const ptr
).
Note that this is untrue when the array is a function parameter, when it can be incremented or modified.
There are also the differences between pointers and arrays in relation to initializations. Consider the two initializations:
char *p = "hello"; char arr[100] = "hello";
For the pointer p
, the string "hello"
is stored in separate memory. Only the required
number of bytes are allocated (six, because of the extra character zero added by the
compiler to terminate the string). For the character array “arr
”, 100 bytes are allocated, but
only the first six are filled.
Pointer Arithmetic Loop Optimizations
The main way that we use pointer arithmetic for optimization is to change a loop over an array into loop pointer arithmetic. Note that this is primarily a sequential code optimization, and does not change anything in terms of vectorization for parallel execution.
Pointer arithmetic is mainly used to get rid of an incrementer variable
in sequential code.
Here's a vector dot product with basic incremented loop variable i++
and array index syntax v1[i]
used inside the loop:
float aussie_vecdot_basic(float v1[], float v2[], int n) { // Basic vector dot product float sum = 0.0f; for (int i = 0; i < n; i++) { sum += v1[i] * v2[i]; } return sum; }
And here's the same code when converted to pointer arithmetic:
float aussie_vecdot_ptr(float v1[], float v2[], int n) { // Pointer arithmetic vector dot product float sum = 0.0f; float* endv1 = v1 + n; // v1 plus n*4 bytes for (; v1 < endv1; v1++,v2++) { sum += (*v1) * (*v2); } return sum; }
How does this work?
We got rid of the temporary variable “i
” by using pointer arithmetic “*v1
” instead of array indices “v1[i]
”.
We are also using the function parameters “v1
” and “v2
” as temporary local variables, as permitted in C++,
so we don't need an extra temporary pointer variable.
The way this works with pointer arithmetic is v1
and v2
are treated as pointers,
which works due to the near-equivalence of pointers and arrays in C++.
Rather than using an array index “i
” we increment both these pointer-array variables:
v1++,v2++
These for loop incrementers “v1++
” and “v2++
” are both adding 4 bytes (the size of a 32-bit float
)
to the pointers.
Also note these two increment statements are separated by the C++ comma operator, not by a semicolon.
The “endv1
” end marker is calculated as the address of “v1[0]
” plus “n*4
” bytes,
because the “+
” operator in “v1+n
” is pointer arithmetic addition,
which is auto-scaled by the size of the pointed-to object (i.e., 4 bytes for 32-bit float here),
rather than normal integer addition.
Note that a further micro-optimization is possible. We can change the less-than test (“v1 < endv1
”) to
an inequality test (“v1 != endv1
”), because equality tests are slightly faster than less-than tests.
Since this test is effectively inside the loop and done every iteration,
this might be worth doing.
The trade-off is safety: it'll become an infinite loop if you get the pointer math slightly wrong, but hey, your code has no bugs, right?
Smart Pointers
Smart pointers are a programming idiom to make C++ pointers safer. They are not a speed optimization, and in fact, they are a wrapper that adds extra logic around the use of a raw pointer, and will be marginally slower. However, they avoid many C++ pointer pitfalls, thereby improving reliability, and will reduce total allocated memory usage by avoiding memory leaks. There may even be an indirect benefit to execution speed if overall memory management is improved.
Programmers have been defining their own smart pointer wrapper classes for decades,
but there is now standard support for the idea in the C++ library.
In the typical idiom, a smart pointer tracks the creation and destruction of the object
it points to, which ensures that the destructor is called.
This helps avoid “memory leaks” in standard C++ pointers where an object is allocated
with “new
”, but is never deallocated by “delete
”.
The C++ standard libraries have various templates to support smart pointers, mostly since C++11, so they are longstanding features.
std::shared_ptr
std::unique_ptr
std::weak_ptr
std::shared_ptr
is a reference-counted shared pointer implementation.
The idea is that it tracks the total number of pointers to an object,
and then automatically destroys the object whenever there's no more pointers to it.
This occurs when the last of the “shared_ptr
” objects is itself destroyed,
and then the reference count for the underlying object is zero.
std::unique_ptr
is a one-to-one mapping of a smart pointer to an object.
Whenever the unique_ptr
object is destroyed (e.g. goes out of scope
as a local variable), then both the smart pointer and its underlying object
are destroyed or otherwise cleaned up.
The unique_ptr
object can refer to a single object allocated by “new
”
or a single array-of-objects allocated by the “new[]
” operator.
std::weak_ptr
is a less commonly used type that has relevance to std::shared_ptr
in some complicated scenarios.
Usually, you should choose either of std::unique_ptr
or std::shared_ptr
,
depending on how many pointers will point to the underlying object.
Pointers vs References
Overall, pointers are a good and bad feature of C++. They are low-level variables that allow efficient processing of memory addresses, so we can code some very fast methods with pointers. They allow us to get very close to the machine.
On the downside, there are pointer pitfalls. Pointers trip up novices and experienced programmers alike. There is an immense list of common faults with pointer manipulation, and coding problems with pointers and memory management are probably half of the causes of bugs in C++ (at least). There are some tools that mitigate against pointer problems (e.g. Linux Valgrind) but it is a never-ending battle against them.
Pointers and arrays were implemented very similarly, and came from the earliest designs of the original C language. Basically, arrays are treated as a specific type of pointer, with various differences depending on whether they are variables or function parameters.
Then came C++ to the rescue. References arrived with the new-fangled programming language (cleverly named as “C++”) and were thoughtfully designed as a type of safe pointer that cannot be null, but is just as efficient as a pointer because the constraints on references are enforced at compile-time.
C++ allows two ways to indirectly refer to an object without needing to create a whole new copy: pointers and references.
The syntax is either “*
” or “&
” for their declarations.
MyVector *myptr = &mv; // Pointer to mv object MyVector &myref = mv; // Reference to mv object
Pointers and references are more efficient than spinning up a whole new copy of the object, especially when the underlying object is a complicated object. And when you have a function call, you should definitely avoid sending in a whole object.
void processit(MyVector v) // Slow { // .... }
This is inefficient because the whole MyVector object will get copied, via whatever copy constructor you have defined, which is slow. And if you haven't defined a copy constructor, then the compiler uses default bitwise copy of a structure, which is not only slow, but also rarely what you want, and often a bug.
The faster reference version is to use a “const
” reference (or non-const
if you're modifying it inside the function):
void processit(const MyVector & v) // Reference argument { // .... }
The pointer version is:
void processit(MyVector * v) // Pointer argument { // .... }
Which is faster in C++ — pointers or references? The short answer of “not any difference” is the general view, because references are implemented as pointers by the compiler behind the scenes. The two functions above are not going to be significantly different in terms of speed.
The slightly longer answer is that references can be faster because there's no null case. A reference must always be referring to an object for the duration of its scope. The C++ compiler ensures that references cannot occur without an object:
MyVector &v; // Cannot do this MyVector &v = NULL; // Nor this MyVector &v = 0; // Nor this
A reference must be initialized from an object, and you cannot
set references equal to pointers, because you actually have to
de-reference the pointer with the “*
” operator,
which crashes if it's a null pointer:
MyVector &v = myptr; // Disallowed MyVector &v = *myptr; // Works if non-null
There's no way in C++ to get a zero value into a reference variable (we hope).
For example, the address-of operator (&
) applied to a reference variable
returns the address of the referenced object, not the memory location
of the reference itself.
Hence, references are always referring to something
and they cannot be equivalent to the null pointer.
References are slightly faster:
The guarantee of an object for a reference
fixes all those null pointer core dumps,
and also relieves the programmer of the burden of testing for null pointers.
The compiler does this guarantee for references at compile-time,
so there's no hidden null check being done by the compiler
at run-time, making it efficient.
So, there's a minor speed improvement from using references,
by not having
to add safety checks for “ptr!=NULL
” throughout the function call hierarchy.
Pointers can be better than references if you need a “null” situation to occur. For example, you're processing an object that may or may not exist, and you need the pointer to be allowed to be “NULL” if there's no object. This should occur rarely, and references should be preferred in many cases.
And finally, references aren't very useful when you're trying to scan through the data in vectors, matrices, or tensors in an AI engine. You can't do pointer arithmetic on a reference in C++.
• Next: Chapter 13. Algorithm Speedups • Up: Table of Contents |
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |