Aussie AI

Loop Distribution

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

Loop Distribution

Loop distribution is type of loop code motion that creates two loops from a single loop that contain an “if” statement. The hoisted code is a conditional test. Some early papers in the 1990s called it “loop unswitching.” Some papers use the term “loop distribution” with the different meaning of splitting a loop into two loops, which we call “loop fission.”

The goal of loop distribution is to move an “if” test out of the loop body, by creating two loops, and ends up creating two separate loops on two pathways. This sounds similar to loop fission, but loop distribution is a more general optimization that doesn't require parallelization to get a speed improvement (whereas loop fission does). Instead, loop distribution gets a benefit in ordinary sequential execution because it moves the if-test computation out of the loop body to a once-only pre-initialization test (i.e. “hoisted”). Note that only one of the two loops is executed each time, and these two loops are never executed in parallel, so this technique is not really a type of loop fission.

Example: Loop Distribution: Here's a dummy example of implementing an “add-or-subtract” function using a passed-in Boolean flag.

    void aussie_vector_addition_slow(
        float v[], int n, 
        bool do_add, float scalar)
    {
        for (int i = 0; i < n; i++) {
            if (do_add) 
                v[i] += scalar; // Add
            else
                v[i] -= scalar; // Subtract
        }
    }

The problem is that the test “if(do_add)” is computed for every loop iteration, and yet “do_add” is a loop-invariant flag variable. The faster version is to use loop distribution to move the if-test into the loop initialization, and then split the two pathways inside the loop to instead have two separate loops. Here's the faster version:

    void aussie_vector_addition_loop_distribution(
        float v[], int n, 
        bool do_add, float scalar)
    {
        if (do_add) { // Add scalar
            for (int i = 0; i < n; i++) {
                v[i] += scalar;  // Add
            }
        }
        else {  // Subtract scalar
            for (int i = 0; i < n; i++) {
                v[i] -= scalar; // Subtract
            }
        }
    }

This example is still far from optimal. For starters, it should be using pointer arithmetic rather than array indices.

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++