Aussie AI
Top-K Vector Algorithm
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
Top-K Vector Algorithm
The top-k algorithm is more complicated than vector max or min: find the largest k elements in a vector.
Note that “maximum” is the same as top-k with k=1.
If you want the short version of the top-k story in C++, there's a std::partial_sort
function that sorts the top k elements,
and there's also std::sort
for a full array sort.
The top-k algorithm is quite important in AI engines. It gives us “top-k decoding” which is how to choose which word to output. The whole encoder-decoder computes a vector giving us the probability that each word should be output next. Using the maximum probability word gives us “greedy decoding” which always outputs the most likely word. But that's kind of boring and predictable, so top-k decoding randomly chooses between the k most likely words (e.g. top-50), which is still very accurate and also more interesting because it has some creative variation.
Example: Hand-coded top-2 algorithm: Since top-1 is the maximum of a vector, we can also find a fairly simple linear scan for k=2. The basic idea is to scan through and keep track of the two largest values as we go.
void aussie_vector_top_k_2( float v[], int n, float vout[]) { // Order the first 2 elements float vmax1 = v[0], vmax2 = v[1]; if (v[1] > v[0]) { vmax1 = v[1]; // Reverse them vmax2 = v[0]; } for (int i = 2 /*not 0*/; i < n; i++) { if (v[i] > vmax2) { // Bigger than the smallest one if (v[i] > vmax1) { // Bigger than both (shuffle) vmax2 = vmax1; vmax1 = v[i]; } else { // In the middle (fix 2nd only) vmax2 = v[i]; } } } vout[0] = vmax1; // Biggest vout[1] = vmax2; // 2nd biggest }
Note that the above top-2 algorithm is still impractical for our word decoding algorithm. We need to know not only the top probabilities, but also which two indices in the vector had those probabilities, because that's how we know which words map to which probabilities. So, we'd need to modify the above code to track and return the two array indices as well (or instead).
Example: Shuffle top-k algorithm: For a larger value of k the code becomes more complicated. The above code for k=2 motivates the general idea for a brute-force algorithm: shuffle sort the first k elements, and then scan the rest, shuffling any larger items up into place. We can merge the two shuffling phases into one block of code that handles both the startup and ongoing scan.
void aussie_vector_top_k_shuffle( float v[], int n, int k, float vout[]) { yassert(n >= k); vout[0] = v[0]; int nout = 1; for (int i = 1 /*not 0*/; i < n; i++) { float fnew = v[i]; int maxj; if (nout < k) { vout[nout++] = fnew; maxj = nout - 2; } else { maxj = nout - 1; } maxj = nout - 1; for (int j = maxj; j >= 0; j--) { if (fnew > vout[j]) { // Shuffle & insert if (j + 1 < k) // Shuffle down vout[j + 1] = vout[j]; vout[j] = fnew; // Keep going } else { // Done.. insert it if (j != maxj) { if (j + 1 < k) vout[j + 1] = vout[j]; vout[j] = fnew; } break; } } // end for j } // end for i }
The above example is a simplistic and inefficient top-k algorithm, not to mention that it was horribly fiddly
and failed my unit tests for hours (i.e. a special kind of “fun”).
Several loop optimizations suggest themselves: loop sectioning for the outer i
loop to do the first k
iterations as a separate loop
(avoiding lots of tests against k
), and loop peeling of the first iteration of the inner j
loop (i.e. j==maxj
).
This version also should be extended to track the indices from where the top-k values came.
Theoretical Top-K Algorithms: There's a lot of theory about computing the top-k function of an array for large k values, which is the same structure as our “logit” word probability vectors (we want “k=50” or more). These theoretical top-k algorithm papers mainly consider sequential processing, rather than vectorization. Even so, it's not a simple linear scan like max or min functions, but doesn't need to be as slow as shuffling.
Example: Top-k with qsort sorting:
The simplest method for large k is to sort the array with a fast method (e.g. the quicksort algorithm)
and then pick off the top k elements from the sorted array.
In C++ there are the std::sort
methods or the older style qsort
function.
Here's an example using the C++ standard qsort
function:
int aussie_top_k_qsort_cmp( void const* addr1, void const* addr2) { float f1 = *(float*)addr1; float f2 = *(float*)addr2; if (f1 < f2) return +1; // Reversed (descending) else if (f1 > f2) return -1; else return 0; } void aussie_vector_top_k_qsort( float v[], int n, int k, float vout[]) { // Top-k with general k (qsort algorithm) // Sort the array qsort(v, n, sizeof(vout[0]), aussie_top_k_qsort_cmp); // Copy top-k elements for (int i = 0; i < k; i++) vout[i] = v[i]; }Top-k with qsort and permutation array: We really need a version that returns the indices of the probabilities, rather than just their values. So, I coded up a
qsort
version that sorts via a permutation array,
and then returns the top-k of these permutation indices.
void aussie_permutation_identity(int permut[], int n) { for (int i = 0; i < n; i++) permut[i] = i; } float* g_float_array_for_qsort = NULL; int aussie_top_k_qsort_permutation_cmp( void const* addr1, void const* addr2) { int index1 = *(int*)addr1; int index2 = *(int*)addr2; float f1 = g_float_array_for_qsort[index1]; float f2 = g_float_array_for_qsort[index2]; if (f1 < f2) return +1; // Reverse (descending) else if (f1 > f2) return -1; else return 0; } void aussie_vector_top_k_qsort_permut( float v[], int n, int k, float vout[], int permut_out[] ) { // Create a dynamic permutation array int* permut_arr = ::new int[n]; // Identity permutation aussie_permutation_identity(permut_arr, n); // Sort the array (by permutation) g_float_array_for_qsort = v; qsort(permut_arr, n, sizeof(permut_arr[0]), aussie_top_k_qsort_permutation_cmp); // Copy top-k elements for (int i = 0; i < k; i++) { permut_out[i] = permut_arr[i]; vout[i] = v[permut_arr[i]]; } delete[] permut_arr; }
Top-k without sorting: Sorting the whole array is somewhat wasteful if we only want the top 50 elements. There are various faster top-k algorithms that don't fully sort the array.
Standard C++ top-k libraries:
Note that the standard C++ libraries have support for sorting algorithms
in std::vector
, such as with std::sort
.
There is also a top-k version called std::partial_sort
,
which sorts the top k elements of an array, which can then be selected
for the top-k algorithm.
Presumably, the std::partial_sort
function is a faster algorithm than std::sort
,
by not fully sorting the whole array,
but I haven't tested it.
There is also std::nth_element
, which is similar to top-k.
• Next: • Up: Table of Contents |
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |