Aussie AI
Pointer Arithmetic Loop Optimizations
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
Pointer Arithmetic Loop Optimizations
The main way that we use pointer arithmetic for optimization is to change a loop over an array into loop pointer arithmetic. Note that this is primarily a sequential code optimization, and does not change anything in terms of vectorization for parallel execution.
Pointer arithmetic is mainly used to get rid of an incrementer variable
in sequential code.
Here's a vector dot product with basic incremented loop variable i++
and array index syntax v1[i]
used inside the loop:
float aussie_vecdot_basic(float v1[], float v2[], int n) { // Basic vector dot product float sum = 0.0f; for (int i = 0; i < n; i++) { sum += v1[i] * v2[i]; } return sum; }
And here's the same code when converted to pointer arithmetic:
float aussie_vecdot_ptr(float v1[], float v2[], int n) { // Pointer arithmetic vector dot product float sum = 0.0f; float* endv1 = v1 + n; // v1 plus n*4 bytes for (; v1 < endv1; v1++,v2++) { sum += (*v1) * (*v2); } return sum; }
How does this work?
We got rid of the temporary variable “i
” by using pointer arithmetic “*v1
” instead of array indices “v1[i]
”.
We are also using the function parameters “v1
” and “v2
” as temporary local variables, as permitted in C++,
so we don't need an extra temporary pointer variable.
The way this works with pointer arithmetic is v1
and v2
are treated as pointers,
which works due to the near-equivalence of pointers and arrays in C++.
Rather than using an array index “i
” we increment both these pointer-array variables:
v1++,v2++
These for loop incrementers “v1++
” and “v2++
” are both adding 4 bytes (the size of a 32-bit float
)
to the pointers.
Also note these two increment statements are separated by the C++ comma operator, not by a semicolon.
The “endv1
” end marker is calculated as the address of “v1[0]
” plus “n*4
” bytes,
because the “+
” operator in “v1+n
” is pointer arithmetic addition,
which is auto-scaled by the size of the pointed-to object (i.e., 4 bytes for 32-bit float here),
rather than normal integer addition.
Note that a further micro-optimization is possible. We can change the less-than test (“v1 < endv1
”) to
an inequality test (“v1 != endv1
”), because equality tests are slightly faster than less-than tests.
Since this test is effectively inside the loop and done every iteration,
this might be worth doing.
The trade-off is safety: it'll become an infinite loop if you get the pointer math slightly wrong, but hey, your code has no bugs, right?
• Next: • Up: Table of Contents |
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |