Aussie AI

Pointer Arithmetic Loop Optimizations

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

Pointer Arithmetic Loop Optimizations

The main way that we use pointer arithmetic for optimization is to change a loop over an array into loop pointer arithmetic. Note that this is primarily a sequential code optimization, and does not change anything in terms of vectorization for parallel execution.

Pointer arithmetic is mainly used to get rid of an incrementer variable in sequential code. Here's a vector dot product with basic incremented loop variable i++ and array index syntax v1[i] used inside the loop:

    float aussie_vecdot_basic(float v1[], float v2[], int n)
    {
        // Basic vector dot product
        float sum = 0.0f;
        for (int i = 0; i < n; i++) {
            sum += v1[i] * v2[i];
        }
        return sum;
    }

And here's the same code when converted to pointer arithmetic:

    float aussie_vecdot_ptr(float v1[], float v2[], int n)
    {
        // Pointer arithmetic vector dot product
        float sum = 0.0f;
        float* endv1 = v1 + n;  // v1 plus n*4 bytes
        for (; v1 < endv1; v1++,v2++) {
                sum += (*v1) * (*v2);
        }
        return sum;
    }

How does this work? We got rid of the temporary variable “i” by using pointer arithmetic “*v1” instead of array indices “v1[i]”. We are also using the function parameters “v1” and “v2” as temporary local variables, as permitted in C++, so we don't need an extra temporary pointer variable.

The way this works with pointer arithmetic is v1 and v2 are treated as pointers, which works due to the near-equivalence of pointers and arrays in C++. Rather than using an array index “i” we increment both these pointer-array variables:

    v1++,v2++

These for loop incrementers “v1++” and “v2++” are both adding 4 bytes (the size of a 32-bit float) to the pointers. Also note these two increment statements are separated by the C++ comma operator, not by a semicolon.

The “endv1” end marker is calculated as the address of “v1[0]” plus “n*4” bytes, because the “+” operator in “v1+n” is pointer arithmetic addition, which is auto-scaled by the size of the pointed-to object (i.e., 4 bytes for 32-bit float here), rather than normal integer addition.

Note that a further micro-optimization is possible. We can change the less-than test (“v1 < endv1”) to an inequality test (“v1 != endv1”), because equality tests are slightly faster than less-than tests. Since this test is effectively inside the loop and done every iteration, this might be worth doing.

The trade-off is safety: it'll become an infinite loop if you get the pointer math slightly wrong, but hey, your code has no bugs, right?

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++