Aussie AI

Inference Optimization Techniques

  • Last Updated 24th August, 2024
  • by David Spuler, Ph.D.

This is a list of neural network and Transformer optimizations, with a specific focus on speeding up Transformer inference. Resources include:

Hot New Research Areas

Areas of inference efficiency research that have been recently getting attention:

Hot Old Research Areas

Longstanding research areas that are still seeing a continual stream of papers:

Model Compression

Pruning

Quantization

Distillation

Parameter Sharing

Attention Optimization

Transformer Component Optimizations

Transformer General Optimizations

KV Caching Optimizations

Non-Multiplication AI Models

Prefill Phase Optimizations

Computation Optimizations

General Coding Efficiency

Loop Optimizations

Memory Utilization Optimizations

Numeric Representation Optimizations

Advanced Number Systems

Faster Arithmetic

Low-Rank Matrices

Advanced Matrices

Data Structures

Multi-AI Architectures

Device Architectures

Parallel Programming Optimization Techniques

General Classes of Optimization Techniques

More AI Research

Read more about: