Aussie AI
Everythings Slower in AI
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
Everything's Slower in AI
All of the modern open source AI models are named after animals. There's falcons and llamas and alpacas. Really there should only be one species: the slug. Every AI engine is a squirming beast, moving along at the slowest pace, slurring out its words a few hundred milliseconds at a time.
How do we make it faster? Well, that's kind of the 64 thousand dollar question of the moment, except we might need to apply a scaling factor on that number. The answer is that nobody really has that answer, but everyone's trying a lot of things, up and down the tech stack from silicon to software, both in the labs of AI industry and in research departments of universities.
An AI engine is a special type of application and in many ways it is not a “typical” C++ program. Although all of the various C++ inefficiencies could infest an AI engine with slugs, there are also several major non-typical bottlenecks that are specific to the AI architecture:
- The multiplication operation, even if being done in the GPU (usually in a parallelized version of vector dot product, matrix multiplication, or tensor processing kernel).
- Memory access cost of reading all the model weight data and marshaling it ready for the GPU.
- Floating-point mathematical functions you'd happily forgotten existed (e.g.
logf
,expf
,sqrtf
, etc.) - High-level algorithmic inefficiencies in AI engines such as the “autoregressive” decoding algorithm.
• Next: • Up: Table of Contents |
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |