Aussie AI
Loop Splitting
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
Loop Splitting
Loop splitting refers to splitting the sequential iterations of a loop into two loops, which each perform part of the original loop's iterations. Loop splitting is closely related to “loop sectioning” (“strip mining”), but often relates to more complex arithmetic in the loop body. Note that “loop peeling” is a special case of loop splitting where the first section is a small range of a few initial iterations, but these few iterations are unrolled rather than looped.
Loop splitting takes a single loop and transforms it into at least two “split-out” loops, one for the early iterations, and one for the remainder. However, loops can also be split out into more than two loops.
In loop splitting, each split-out loop is shorter than the original loop. Unlike loop fission, the two loops operate over different subportions of the iterator variable range, executing the same number of total iterations, rather than double iterations as in loop fission.
Example: Loop Splitting: Here's some example code to “sqrtize” a vector, using a cached optimization for the numbers up to 100.
void aussie_vector_do_sqrt(float v[], int n) { for (int i = 0; i < n; i++) { if (i < 100) { // Fast cases v[i] = aussie_sqrt_optimized(v[i]); } else { // General case v[i] = sqrtf(v[i]); } } }
However, we can use loop splitting to split this big loop into two shorter disjoint ranges.
Instead of 0..n-1, we do 0..99, and then 100..n-1.
Each loop is over part of the range, and has a simpler loop body.
Note that this code fails with an array bounds violation
for small values of n
less than 100.
void aussie_vector_do_sqrt_loop_splitting(float v[], int n) { for (int i = 0; i < 100; i++) { // Fast cases v[i] = aussie_sqrt_optimized(v[i]); } for (int i = 100; i < n; i++) { // General cases v[i] = sqrtf(v[i]); } }
The loop splitting optimization is beneficial if the loop body has different sections of code that only relate to a subset of the iterator range. Hence, the loop bodies of the two loops can be reduced to execute less code. Overall, there is still the same number of iterations performed in the two loops combined, but each loop performs only a proportion of the original iterations on a simpler loop body. This optimizes sequential execution and the simpler code in each loop body may make vectorization of one or both subloops easier. Furthermore, both subloops could run in parallel.
• Next: • Up: Table of Contents |
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |