Aussie AI

Everythings Slower in AI

Book Excerpt from "Generative AI in C++"

by David Spuler, Ph.D.

Everything's Slower in AI

All of the modern open source AI models are named after animals. There's falcons and llamas and alpacas. Really there should only be one species: the slug. Every AI engine is a squirming beast, moving along at the slowest pace, slurring out its words a few hundred milliseconds at a time.

How do we make it faster? Well, that's kind of the 64 thousand dollar question of the moment, except we might need to apply a scaling factor on that number. The answer is that nobody really has that answer, but everyone's trying a lot of things, up and down the tech stack from silicon to software, both in the labs of AI industry and in research departments of universities.

An AI engine is a special type of application and in many ways it is not a “typical” C++ program. Although all of the various C++ inefficiencies could infest an AI engine with slugs, there are also several major non-typical bottlenecks that are specific to the AI architecture:

The multiplication operation, even if being done in the GPU (usually in a parallelized version of vector dot product, matrix multiplication, or tensor processing kernel).
Memory access cost of reading all the model weight data and marshaling it ready for the GPU.
Floating-point mathematical functions you'd happily forgotten existed (e.g. logf, expf, sqrtf, etc.)
High-level algorithmic inefficiencies in AI engines such as the “autoregressive” decoding algorithm.

• Next:

• Up: Table of Contents

• Buy: Generative AI in C++: Coding Transformers and LLMs

The new AI programming book by Aussie AI co-founders:

AI coding in C++
Transformer engine speedups
LLM models
Phone and desktop AI
Code examples
Research citations

Get your copy from Amazon: Generative AI in C++

Aussie AI

Everythings Slower in AI

Everything's Slower in AI

Quick Links

Product

New to Writing?

Writing Styles