Aussie AI

Vector Norms

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

Vector Norms

Vector norms are measurements of vectors, and are used in various AI algorithms. For example, we can measure if two vectors are “close” to each other.

Vector norms map vectors to a single number. Note that vector norms are not the same thing as the “normalization” layer in a Transformer (ie. LayerNorm or BatchNorm). Note also that a vector “norm” is not at all related to the similarly-named “normal vector” (a vector perpendicular to a surface). The norm is a number, whereas the normal is a vector, and they're not on speaking terms since that incident last summer.

L2 Norm: The basic norm of a vector is the level-2 (L2) norm, and you probably already know it. This is the length of the vector in physical space, also called the vector's “modulus” or “magnitude” in Mathematics 101. If you treat a vector as a “point” in space, the L2 norm is its straight-line distance from the origin.

The calculation of the L2 norm of a vector is a generalization of Pythagoras's Theorem: sum the squares of all the vector elements, and then take the square root. The code looks like:

    float aussie_vector_L2_norm(float v[], int n)
    {
        float sum = 0.0f;
        for (int i = 0; i < n; i++) {
            sum += (v[i] * v[i]);   // Square
        }
        return sqrtf(sum);
    }

Because we square every element, they all get turned positive. Zero squared is still zero. Once we've summed all the squares, we usually get a big positive number, which we then square root to get a smaller positive number. Hence, the result of the L2 norm is compressing a whole vector down to a single positive floating-point number.

The properties of the L2 norm are:

  • Floating-point number (e.g. 0.567 or 5.6789 or 3.0 or whatever)
  • Positive number (not ever negative)
  • Zero only if the whole vector is zero.
  • Represents the “length” (or “modulus” or “magnitude”) of a vector, called the “Euclidean distance”.
  • Usually a non-integer, even if the vector was all integers.

For a simple 2-D or 3-D vector in Senior Math, the L2 norm is the physical length of the vector in 2-D or 3-D space (or the length of the line from the origin to the equivalent point). For AI, which has vectors in 1024-dimensions, or N-dimensional vectors for whatever N is being used, there's not really a physical explanation of the L2 norm that's easy to visualize, but it's kind of a measure of the length of the vector in N-dimensional space. The value of the L2 norm can be zero, but only if all the vector's elements are zero.

Note that the value of the L2 norm is not unique. Two different vectors can have the same value for the L2 norm. In fact, an infinite number of vectors can have the same value, and those vectors are the set of vectors with the same length (magnitude), which will define a sphere in N-dimensional space.

L2-squared norm: A minor modification of the L2 norm is the “squared L2 norm”, which is, as you may have guessed, the square of the L2 norm. To put it another way, it's just the L2 norm without the square-root at the end. The code looks like:

    float aussie_vector_L2_squared_norm(float v[], int n)
    {
        float sum = 0.0f;
        for (int i = 0; i < n; i++) {
            sum += (v[i] * v[i]);  // Square
        }
        return sum;  // NOT sqrtf(sum);
    }

The value of the L2-squared norm is a positive number, but a much larger one. The physical meaning is the square of the physical/Euclidean length of the vector. The L2-squared norm also equals the vector's dot product with itself.

Why use the L2-squared norm? Because it's faster to skip the square-root operation, of course. Also, if the vector contains integers, then the L2-squared norm is also an integer, which can make it even faster to compute for an AI engine running in integer-only mode. The L2-squared norm is just as good as basic L2 for some uses. The properties of L2 and L2-squared norms are very similar except that one is a much larger number. Both are positive and related to Euclidean distance, and both increase monotonically the further the vector is away from the origin.

Level 1 Norm: As you can guess from my calling it the L2 norm, there's also an L1 norm, and there's L3 norms, and more. Let's look at the L1 norm, because it's even simpler, although it's not usually something that's covered when studying vectors in Math class.

The L1 norm is simply the sum of the absolute values of all the vector elements. We don't square them. We don't take the square root. We just make them positive and add them up. The code looks like:

    float aussie_vector_L1_norm(float v[], int n)
    {
        float sum = 0.0f;
        for (int i = 0; i < n; i++) {
            sum += fabsf(v[i]);   // Absolute value
        }
        return sum;
    }

Using the absolute values of elements reverses any negative vector elements to positive. The absolute value ensures the whole total can't go negative, and any negative value also adds to the total. A zero element is fine in the vector, but does nothing. The result of the L1 norm is a single positive float number, which can be fractional or whole, ranging from zero to as high as it goes (i.e. if you have big numbers in the vector elements, then the L1 norm will also be large).

The properties of the L1 norm are:

  • Floating-point number (fractional or whole).
  • Positive number (never negative).
  • Zero only if all vector elements are zero.
  • Physical meaning is an obscure distance measure (the “Manhattan distance”).
  • Will be an integer if the vector elements are integers.

What does an L1 norm mean? It's kind of like the distance you'd travel if you walked the longest way by going along each element/dimension of the vector, one at a time, and not going backwards (no negatives). So, the L2 norm was the fastest diagonal direct way to get to a point, but the L1 norm is going the scenic route, and the L1 norm is usually bigger than the L2 norm.

Like the L2 norm, the L1 norm is not unique. Multiple vectors can have the same L1 norm. For example, the vectors (1,2) and (0.5,2.5) will have L1 norm of 3.0. I'm not really sure what the set of all the vectors with the same L1 norm means. Maybe it's this: all the points that you can walk to from the origin if you travel a certain distance (going forwards-only)?

L3 Norms and Above: The mathematical vector norms can be generalized to L3 and higher norms, even to infinity. For an L3 norm, you cube all the vector elements (made positive by absolute value), and take the cube root at the end. It's tricky to find the cube root in C++ until you remember that a cube root is exponentiation to the power of 1/3 (from Year 10 math), so we can use the “powf” function. Here's the code:

    float aussie_vector_L3_norm(float v[], int n)
    {
        float sum = 0.0f;
        for (int i = 0; i < n; i++) {
            sum += (v[i] * v[i] * v[i]);  // Cube
        }
        const float frac_third = 1.0f / 3.0f;
        return powf(sum, frac_third);
    }

Can you guess what an L4 norm is? The higher order versions are really fun and interesting if you wear socks with your sandals, but not very useful in AI coding.

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++