Aussie AI

Faster AI Research

  • Book Excerpt from "Generative AI in C++"
  • by David Spuler, Ph.D.

Faster AI Research

Much of the focus on speeding up AI engines is about weaning the lazy bloated monsters off their happy juice from high-end multi-GPU platforms. People want AI applications on their phone and laptop, so there's much happening towards “AI Phones” and “AI PCs.” And even for the high-end platforms, the expense of those GPUs is so high that simply optimizing by a few percent can save millions.

Another reason for needing faster AI: multi-model engines. These are called “ensemble architectures” in the research literature, and there are already many papers on this area. Two models can be much smarter than one, and so we want faster execution so that we can run more engines at once.

What's hot in fast AI? Some of the newer areas of AI efficiency research include:

  • Phone and PC On-Device Inference (i.e. “AI PCs” and “AI Phones”)
  • Mixture-of-Experts multi-model ensemble architecture (e.g. GPT-4 and Gemini architectures)
  • Flash Attention (linearized attention)
  • Long context windows and length generalization (e.g. RoPE positional encoding)
  • Ensemble Multi-Model Architectures (various sub-areas)
  • Dynamic NAS

Still bubbling away on the cooker. Some areas of AI optimization have thousands of papers, and yet I still see new ones in my feeds every week:

  • Hardware optimizations (GPUs, CPUs, faster memory, AI-specific chips, etc.)
  • Quantization (always)
  • Pruning (always) (esp. types of dynamic structural pruning)
  • Distillation (always)
  • Early exit (still hot)
  • Dynamic inference (adaptive inference)

Trail gone cold. These are the areas where there has been so much successful research in prior decades that the number of papers has subsided, presumably because it's become harder to find a novelty. There are still ways to make a contribution by combining the older techniques with newer research, and some of these areas are so important they're probably one idea away from breakthrough status.

  • MatMul optimizations
  • Arithmetic optimizations (e.g. faster addition/multiplication algorithms)
  • Approximate arithmetic optimizations
  • Logarithmic number system models
  • Code transformations and loop optimizations
  • Compiler auto-optimizations

Deep AI research areas. There are several interesting areas of AI research that nevertheless have only a few papers each year, simply because they are somewhat demanding, and there are fewer researchers attempting anything. However, most of these areas offer the promise of performant AI if the problems can be cracked.

  • Advanced number system models (e.g. dyadic, posit, residue)
  • Zero-multiplication models (e.g. bit-shift, logarithmic, adder models, etc.)
  • Floating-point numeric representations (i.e. going beyond brain float)
  • Matrix algebra (e.g. factorization/decomposition, inverse matrices)

Further details about research on almost all of these topics are in subsequent chapters.

 

Next:

Up: Table of Contents

Buy: Generative AI in C++: Coding Transformers and LLMs

Generative AI in C++ The new AI programming book by Aussie AI co-founders:
  • AI coding in C++
  • Transformer engine speedups
  • LLM models
  • Phone and desktop AI
  • Code examples
  • Research citations

Get your copy from Amazon: Generative AI in C++