Aussie AI
Vectorized Softmax with AVX
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
Vectorized Softmax with AVX
The AVX intrinsics use x86 SIMD instructions to operate
on multiple float
values or integers at once (e.g. 4 float
values for AVX-1, 8 float
values for AVX-2, 16 float
values for AVX-512).
Surprisingly, there are AVX SIMD exponential function intrinsics, to apply “expf
” to multiple elements of a vector in parallel.
Example: Softmax with AVX exponential and summation.
We can vectorize both these loops separately using AVX intrinsics.
Vectorized versions of expf
and summation were examined in the hardware acceleration chapter.
The version for AVX1 becomes:
void aussie_vector_softmax_exponentiate_and_sum_AVX1(float v[], int n) { yassert(n % 4 == 0); aussie_vector_expf_AVX1(v, n); // AVX1-accelerated expf... float denom = aussie_vector_sum_AVX1(v, n); // AVX1-accelerated sum if (denom == 0.0) { yassert(denom != 0.0); return; // fail (should not occur) } float recip = 1.0f / denom; for (int i = 0; i < n; i++) { v[i] *= recip; // NOTE: v[i] is already expf'd } }
Actually, that's only vectorized two out of three loops. Here's the code with the third loop, multiply-by-scalar, also done with AVX, as was also shown in the vectorization chapter. This code is the AVX2 version:
void aussie_vector_softmax_fused_exp_sum_mult_AVX2(float v[], int n) { // Softmax with EXP and SUM and MULT in AVX2 yassert(n % 8 == 0); float denom = aussie_vector_fused_expf_sum_AVX2(v, n); // Element-wise expf... if (denom == 0.0) { yassert(denom != 0.0); return; // fail (should not occur) } float recip = 1.0f / denom; aussie_vector_multiply_scalar_AVX2(v, n, recip); }
• Next: • Up: Table of Contents |
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |