Aussie AI

Kernel Fission

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

Kernel Fission

Kernel fission is the opposite of kernel fusion. Whereas kernel fusion merges two operators into one kernel, kernel fission splits a single kernel into two simpler kernels. Kernel fission is analogous to the loop transformation of loop fission, which involves splitting a loop in two, whereas loop fusion is merging two loops. Kernel fission is less used than kernel fusion, but can be effective.

Like kernel fusion, kernel fission involves changes to the C++ kernel code doing the inference rather than the model's data. Kernel fission can apply to inference and/or training, depending on which operation is split apart into two.

Faster Apart. The optimization goal of loop fission is usually to create two simpler kernels that can each be more efficiently vectorized. The performance improvement from kernel fission occurs in separating two combined operations, hopefully resulting in either or both of the two smaller kernels being faster through vectorization or streamlined scheduling for pipelining. Another goal may be to run both of the two simpler kernels in parallel, rather than merged.

There can sometimes be an advantage in terms of data locality and cache access speed, but it will more often be worsened by having two kernel operator loops skimming through the same data twice. At least one of the split-out pair of simpler kernels must run much faster separately, usually from accessing hardware acceleration, or else we've simply added extra loop overhead and worsened the overall performance.

Exact (Not Approximate). Kernel fission is also an exact optimization, like kernel fusion. The same computations are performed, but in a different order, split-out into two vectorized loops. One of the split-out loops could subsequently be changed to an approximate version, but that is a separate optimization, and is not usually a goal of kernel fission.

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++