Aussie AI

Sparse Tensors

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

Sparse Tensors

Sparse tensors occur when most of the values are zero. These are a generalization of sparse vectors and sparse matrices, and offer the same advantages: compressed storage and faster arithmetic operations (by skipping operations involving zero).

The level of sparsity required for optimization usually means 80-90% of the weights are zero. With so few non-zero values, tensor arithmetic involves fewer operations and the memory requirements are low (i.e. store only the non-zero weights). Such sparsity is often the result of a “pruning” optimization (see Chapter 33), but there are also obscure theoretical means to get sparse tensors using tensor algebra (let's not even go there!).

When there is a high degree of sparsity, such as when 80-90% of the values are zero, it becomes more efficient to use alternative algorithms. Sparse tensors can be stored in a permutation index format, where only the index locations of non-zero items are stored (e.g. storing a four-tuple with the non-zero value and the three indices at which it is located in the tensor). Operations on sparse tensors can use the alternative storage format to create much more efficient kernels that avoid most of the computations involving the missing zero values.

Parallelization of sparse tensor operations is a double optimization, because there are fewer operations (only on non-zero weights), and you can parallelize them as well. Although a permuted index data format is not the usual contiguous memory space amenable to vectorization, there are other methods to vectorize permutation indices, such as with “gather” and “scatter” SIMD operations.

A sparse tensor product kernel parallelized on a GPU goes at the speed of light. Which reminds me of those Physicists again. They've stopped reading by now, so we can talk freely. I mean, they've been arguing about whether light is a blob or a wiggle for 100 years. Any computer programmer could solve this in ten seconds. Here's the algorithm:

    if (always_bounces_off_stuff(light)) {
        // Particle
    }
    else if (always_goes_around_things(light)) {
        // Wave
    }
    else {
        // Neither (obviously)
    }

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++