Aussie AI
Star Attention
-
Last Updated 18 April, 2026
-
by David Spuler, Ph.D.
What is Star Attention?
Star attention is an LLM attention optimization that reduces the cost of attention matrix computations on long token sequences. It is a type of "linear attention" that uses an approximation with "block sparsity" to avoid the quadratic complexity of LLM full attention algorithms. This helps the LLM known which blocks of tokens to pay attention to without needing to compute the QKV tensors for every single token.
Related LLM research areas for long context optimization of the attention methods include:
- Attention optimization
- Local attention
- Linear attention
- Sparse attention
- Multi-Head Attention (MHA)
- Muti-Query Attention (MQA)
- Group-Query Attention (GQA)
- Flash attention
- Paged attention
Other topics in attention research:
- Low-rank matrix attention
- Medusa attention
- Block attention
- Cross attention
- Fused head attention
- Hybrid local-global attention
- FFT attention
- QKV computation optimizations
- Additive attention
- Multiplicative attention
- Graph attention
- Chunked attention
- Attention sink
- Attention steering
- Bilinear attention
- Attention-free methods
- Mixture-of-Heads (MOH) Attention (MoE+MHA)
Star Attention: Book Excerpts and Blog Articles
Free online book excerpts with full text chapters online and free PDF downloads, and the Aussie AI blog, including related articles:
- David Spuler, Ph.D., March 3rd, 2025, What's Hot in LLM Inference Optimization in 2025? Aussie AI Blog, https://www.aussieai.com/blog/hot-inference-optimization-2025
Research on Star Attention
- Shantanu Acharya, Fei Jia, Boris Ginsburg, 26 Nov 2024, Star Attention: Efficient LLM Inference over Long Sequences, https://arxiv.org/abs/2411.17116
- Aswin Ak, November 28, 2024, NVIDIA AI Research Unveils ‘Star Attention’: A Novel AI Algorithm for Efficient LLM Long-Context Inference, https://www.marktechpost.com/2024/11/28/nvidia-ai-research-unveils-star-attention-a-novel-ai-algorithm-for-efficient-llm-long-context-inference/
- Wenyuan Yang, Zhongxu Li, Qihan He, 2025, Star-PMFI: Star-attention and pyramid multi-scale feature integration network for small object detection in drone imagery, Journal of Visual Communication and Image Representation, 104479, ISSN 1047-3203, https://doi.org/10.1016/j.jvcir.2025.104479 https://www.sciencedirect.com/science/article/abs/pii/S1047320325000938
- David Spuler, Ph.D., March 3rd, 2025, What's Hot in LLM Inference Optimization in 2025? Aussie AI Blog, https://www.aussieai.com/blog/hot-inference-optimization-2025
AI Books from Aussie AI
|
The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
Get your copy from Amazon: The Sweetest Lesson |
|
RAG Optimization: Accurate and Efficient LLM Applications:
new book on RAG architectures:
Get your copy from Amazon: RAG Optimization |
|
Generative AI Applications book:
Get your copy from Amazon: Generative AI Applications |
|
Generative AI programming book:
Get your copy from Amazon: Generative AI in C++ |
|
CUDA C++ Optimization book:
Get your copy from Amazon: CUDA C++ Optimization |
|
CUDA C++ Debugging book:
Get your copy from Amazon: CUDA C++ Debugging |
More AI Research
Read more about: