Aussie AI

Loop Normalization

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

Loop Normalization

Loop normalization is not directly an optimization, but is a preliminary loop transformation that can make further loop optimizations easier. Followup optimizations might be to fuse the two loops with loop fusion, or to parallelize each loop, such as with loop fission or vectorization.

The goal of loop normalization is to make the loop iteration variables act across the same range. This applies to two sequential loops, rather than nested loops. Hence, loop normalization is needed when two loops in sequence are starting at different offsets (e.g. one is i=1 and another starts at i=0), or are finished at different endpoints (e.g. n versus n-1).

If two loops have the same number of computations, but with different ranges, then one loop can be changed with an offset. For example, these loops differ with ranges 0..n-1 and 1..n:

    for (int i = 0; i < n; i++) a[i] = 0;
    for (int j = 1; j <= n; j++) b[j] = 0;

These can be adjusted to the same ranges with a “j+1” index offset, as follows:

    for (int i = 0; i < n; i++) a[i] = 0;
    for (int j = 0; j < n; j++) b[j+1] = 0;

If the two loops have a different number of iterations, typically off by 1 or 2, then “loop peeling” can be used to unroll and split off one or two iterations and shorten the longer loop, so that both loops have the same number of iterations over the same range. For example, in this example, one loop is 0..n-1 and another is 0..n:

    for (int i = 0; i < n; i++) a[i] = 0;
    for (int j = 0; j <= n; j++) b[j] = 0;

The way to normalize the loop ranges is to “peel” off the last iteration of the “j” loop:

    for (int i = 0; i < n; i++) a[i] = 0;
    for (int j = 0; j < n; j++) b[j] = 0;
    b[n] = 0;  // Peeled

This example has peeled the longer loop to make it shorter. An alternative would be “loop spreading” to lengthen the shorter loop, such as by adding an extra padding element into the array.

Normalizing two loops doesn't change the number of arithmetic computations. However, once two loops have normalized ranges, it becomes easier to see opportunities for further optimizations such as loop fusion or loop fission.

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++