Aussie AI
Data Structures in AI Engines
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
Data Structures in AI Engines
The main data structures used in AI engines are vectors, matrices, and tensors. Examples of how these are used:
- Vectors (1-D arrays): The input sequence of words is converted to a vector of tokens. Each token is processed to create an embedding vector.
- Matrices (2-D arrays): The weights and activations are stored in matrices. Applying a set of weights to an embedding vector (which is a vector of probabilities) is a matrix multiplication of the weight matrix over the vector, creating a new vector that has updated probabilities (with amazing intelligence added).
- Tensors (3-D arrays): Every “slice” of a 3-D tensor is a 2-D matrix. Because there are so many 2-dimensional matrix multiplications happening, it can be efficient to generalize this procedure into 3-dimensional tensors. It is very mind-bending to try to understand what's happening, but at its core, it's basically just doing a lot of 2-D matrix multiplications, where the 3-D structure of tensors allows for some fancy parallelizations.
Okay, we're done. There's more on vectors and matrices in Chapters 23 and 23, and that's all you need to know about data structures for AI. You can stop reading this chapter.
I'm only half kidding, because AI inference and training does a whole lot of vector and matrix operations (using tensors), and not a whole lot of anything else. In fact, I'm struggling to think of where in an AI engine there's even one hash table. Ah, yes, there's probably a small hash table in the tokenizer that maps 50,000 words to token numbers, but there doesn't need to be, because you could implement your tokenizer as an automaton, and that's more of an algorithm than a data structure. So, I'm going to say it out loud:
You don't need classic data structures in AI.
I think it's fair to say that a plain vanilla Transformer needs a lot of fancy coding of algorithms, but doesn't need all those obscure data structures you learned in Computer Science 101. You need maybe one hash table in the tokenizer, but then its vectors, vectors, vectors (e.g. embeddings, dot product, probabilities) and matrices, matrices, matrices (e.g. FFNs, attention heads, GEMM/MatMul kernels), some weird statistical math functions (e.g. activation functions, normalization, Softmax), and some AI-specific algorithms (e.g. decoding algorithms, parallelization, vectorization, tiling).
I'm not seeing any binary trees.
Where the data structures come out to play is when you try to optimize any of that tensor stuff to go faster. Then the roster of data structures looks like:
- Lookup tables. Precomputed arrays are used to optimize activation functions, Softmax, and other mathematical methods. If you work in AI research for long enough, you'll call it a LUT, and it's your go-to data structure for speedups (and not in the Edsger Dijkstra sense).
- Permutation arrays. Used to sort data without losing track of the indices (e.g. for mappings between word tokens and their probabilities) and also important for sparse matrices.
- Bit vectors. Can be a fast way to do masks, or to mark some items as pruned.
- Locality-sensitive hashing (LSH). This is “vector hashing.” Can be useful for optimizing weights and tracking previously seen inputs.
- KV Caching. This is a widely used optimization that needs a specific hand-coded data structure.
- Inference caching. This overall cache of user input strings can potentially be done using many data structures. Probably not a binary tree, though.
- Bloom filters. These are a probabilistic combination of hashing and bit vectors. I've only seen these in research papers, although they look fast to me, and deserve more consideration.
The reason that classic data structures are missing from AI engines seems simple: parallelization. It's much easier to do parallel arithmetic on the contiguous memory blocks that underly vectors, matrices, and tensors. Similarly, lookup tables, permutation arrays, bit vectors, and vector hashing also have good vectorization characteristics.
• Next: • Up: Table of Contents |
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |