Aussie AI
Neural Network Theory and Tensors
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
Neural Network Theory and Tensors
I'm not going to take you in detail through the theory of how neural networks function. But in broad strokes, there are “neurons” in layers, where each neuron has a “signal,” and there are also connections between neurons that forward the strength of a signal on to the next layer of neurons. So, each neuron in connected to every neuron in the previous layer by an “arc” and on that arc is a “weight” that says how strong or weak to consider the incoming neuron's signal.
But how do we get to tensors from that? Not obvious.
Let's step back a little and be one with the neuron. So, we are just one neuron in a layer of 100 neurons. And the previous layer has 100 neurons, and we are “fully connected” with arcs from every one of those 100 prior neurons. With 100 neurons in the previous layer, our little lonely neuron has to consider the signals from all of the 100 neurons in the prior layer, with 100 weights on the arcs to help decide how much attention to pay to each of the 100 prior neurons.
If we consider the previous layer of 100 neurons as a “vector” of each neuron's computed values. What this means is that every one of the 100 prior neurons has a number of its computed signal, so we have a vector of 100 signal numbers from the prior layer (i.e. a vector full of 100 neuron computed values).
Again, our little neuron has to receive a computed signal value from every one of the 100 prior layer neurons, so we have 100 arcs coming into our little neuron, each with a different number, that is the “weight” of that arc. The computed value of a prior neuron is multiplied by the “weight” that's on each arc (i.e. there's 100 weights, one for each arc). So, every one of the arcs from the 100 neurons in the prior layer has a weight, and what does that sound like? A vector of weights.
So, we have a bunch of 100 prior-layer neuron's computed values in a vector, where each one of those 100 signal values is multiplied by a weight that's in a vector of 100 weights. Hence, we've got to pairwise multiplication, where we multiply 100 neuron values times 100 associated weights. Hence, we've got a bunch of element-wise multiplications of two vectors (100 values times 100 weights), which creates a vector of 100 multiplication computations.
But our little neuron cannot have 100 computed values, but can really only have one number, the total computed signal for our current neuron. There are various things we could do to “reduce” our interim vector of 100 multiplications, but the simplest is to add them all up, and this gives us one number. Now we have one number, and it's the computed signal value for our current neuron.
Umm, I remember that from High School. If we multiply two vectors together with the numbers in pairs, and then add it all up: vector dot product.
In summary, we have a vector dot product for our single neuron in the current layer, based on two vectors from the prior layer (the vector of 100 calculated neuron values, and the vector of 100 weights).
But this is just for our one lonely neuron. Except, it's not lonely, since it has 99 friends, because it's in a layer of 100 neurons itself. So, our neuron and its 99 friends in the current layer, all have to do a different dot product computation because the weights are different for each set of arcs into each neuron. We have a whole vector of 100 neurons in the current layer, for which we have to compute dot products of 100 values times 100 weights (i.e. using the prior layer). So, we have to do 100 vector dot products to calculate the result for our neuron and its 99 friends. If we do 100 repetitions of vector dot products, this sounds like...matrix multiplication.
But that's not all. There's a third dimension based on the “tokens” in the prompt, which is represented by an “embeddings” vector. And with this third dimension thrown in, well, then it's a whole vector worth of matrix multiplications, and we get to a 3-D operation called a “tensor product.” Tensors are three-dimensional blocks full of numbers (i.e. cubes or rectangular prisms), which generalize two-dimensional matrices, which generalize one-dimensional vectors, which generalize zero-dimensional scalars. And if you have any common sense, you've stopped reading this section by now, so I'm not going to try explaining this mind-bending tensor stuff any further.
• Next: • Up: Table of Contents |
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |