Aussie AI

Star Attention

Last Updated 11 June, 2025

by David Spuler, Ph.D.

Star attention is an LLM attention optimization that reduces the cost of attention matrix computations on long token sequences. It is a type of "linear attention" that uses an approximation with "block sparsity" to avoid the quadratic complexity of LLM full attention algorithms. This helps the LLM known which blocks of tokens to pay attention to without needing to compute the QKV tensors for every single token.

Related LLM research areas for long context optimization of the attention methods include:

Research on Star Attention

Shantanu Acharya, Fei Jia, Boris Ginsburg, 26 Nov 2024, Star Attention: Efficient LLM Inference over Long Sequences, https://arxiv.org/abs/2411.17116
Aswin Ak, November 28, 2024, NVIDIA AI Research Unveils ‘Star Attention’: A Novel AI Algorithm for Efficient LLM Long-Context Inference, https://www.marktechpost.com/2024/11/28/nvidia-ai-research-unveils-star-attention-a-novel-ai-algorithm-for-efficient-llm-long-context-inference/
Wenyuan Yang, Zhongxu Li, Qihan He, 2025, Star-PMFI: Star-attention and pyramid multi-scale feature integration network for small object detection in drone imagery, Journal of Visual Communication and Image Representation, 104479, ISSN 1047-3203, https://doi.org/10.1016/j.jvcir.2025.104479 https://www.sciencedirect.com/science/article/abs/pii/S1047320325000938