Aussie AI

Loop Fusion

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

Loop Fusion

Loop fusion is a well-known code optimization where two separate loops are merged into a single loop. This does not change the amount of in-loop computation in either loop body, but reduces the loop overhead of the exit test by half. There is also often a benefit from data locality that reduces data movement and temporary data storage, which can also improve overall speed.

Note that loop fusion is not great at vectorization, because complicated loop bodies are actually harder to parallelize. Most of the benefits arise in traditional sequential code execution, which is why its theory dates back many decades. For modern parallel execution on GPUs, loop fusion is often a poor choice, and more benefits may arise from loop fission (the opposite of fusion) and loop vectorization.

Example: Loop Fusion: The general idea is to combine the body of two loops into a single loop. Here is a simplistic example with the (non-fused) loops for initializing two vectors using two sequential loops:

   for (i = 0; i < n; i++) v1[i] = 0;
   for (i = 0; i < n; i++) v2[i] = 0;

And here is the version with loop fusion:

   for (i = 0; i < n; i++) {
       v1[i] = 0;
       v2[i] = 0;
   }

Note that the loop fusion version incurs the same number of assignments for initialization, but only half of the loop overhead cost (i.e. half of the “i < n” and “i++” operators have been optimized away). And for the sake of argument, let's pretend we don't know a better way to initialize a vector in C++ like memset or calloc or load-time static variable initialization.

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++