Aussie AI
Lightning Attention
-
Last Updated 16 January, 2025
-
by David Spuler, Ph.D.
Lightning attention is an LLM efficiency optimization for faster attention kernels. It is a type of linear attention, which allows it to be used in long context and even ultra-long context models over 1M tokens.
Research on Lightning Attention
- MiniMax, Aonian Li, Bangwei Gong, Bo Yang, Boji Shan, Chang Liu, Cheng Zhu, Chunhao Zhang, Congchao Guo, Da Chen, Dong Li, Enwei Jiao, Gengxin Li, Guojun Zhang, Haohai Sun, Houze Dong, Jiadai Zhu, Jiaqi Zhuang, Jiayuan Song, Jin Zhu, Jingtao Han, Jingyang Li, Junbin Xie, Junhao Xu, Junjie Yan, Kaishun Zhang, Kecheng Xiao, Kexi Kang, Le Han, Leyang Wang, Lianfei Yu, Liheng Feng, Lin Zheng, Linbo Chai, Long Xing, Meizhi Ju, Mingyuan Chi, Mozhi Zhang, Peikai Huang, Pengcheng Niu, Pengfei Li, Pengyu Zhao, Qi Yang, Qidi Xu, Qiexiang Wang, Qin Wang, Qiuhui Li, Ruitao Leng, Shengmin Shi, Shuqi Yu, Sichen Li, Songquan Zhu, Tao Huang, Tianrun Liang, Weigao Sun, Weixuan Sun, Weiyu Cheng, Wenkai Li, Xiangjun Song, Xiao Su, Xiaodong Han, Xinjie Zhang, Xinzhu Hou, Xu Min, Xun Zou, Xuyang Shen, Yan Gong, Yingjie Zhu, Yipeng Zhou, Yiran Zhong, Yongyi Hu, Yuanxiang Fan, Yue Yu, Yufeng Yang, Yuhao Li, Yunan Huang, Yunji Li, Yunpeng Huang, Yunzhi Xu, Yuxin Mao, Zehan Li, Zekang Li, Zewei Tao, Zewen Ying, Zhaoyang Cong, Zhen Qin, Zhenhua Fan, Zhihang Yu, Zhuo Jiang, Zijia Wu, 14 Jan 2025, MiniMax-01: Scaling Foundation Models with Lightning Attention, https://arxiv.org/abs/2501.08313 https://github.com/MiniMax-AI (Content window over 1 million tokens.)
- Zhen Qin, Weigao Sun, Dong Li, Xuyang Shen, Weixuan Sun, Yiran Zhong, 15 Jan 2024 (v2), Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models, https://arxiv.org/abs/2401.04658 https://github.com/OpenNLPLab/lightning-attention
- Zhen Qin, Dong Li, Weigao Sun, Weixuan Sun, Xuyang Shen, Xiaodong Han, Yunshen Wei, Baohong Lv, Xiao Luo, Yu Qiao, Yiran Zhong, 19 Jan 2024 (v2), TransNormerLLM: A Faster and Better Large Language Model with Improved TransNormer, https://arxiv.org/abs/2307.14995 https://github.com/OpenNLPLab/TransnormerLLM (Lightning attention first version.)
- MiniMax, Jan 2025, MiniMax-01: Scaling Foundation Models with Lightning Attention, https://filecdn.minimax.chat/_Arxiv_MiniMax_01_Report.pdf
- MiniMax, Jan 2025, MiniMax-01 is Now Open-Source: Scaling Lightning Attention for the AI Agent Era, https://www.minimaxi.com/en/news/minimax-01-series-2
More Attention Research Topics
Related LLM research areas for long context optimization of the attention methods include:
- Attention optimization (main page)
- Local attention
- Linear attention
- Sparse attention
- Multi-Head Attention (MHA)
- Muti-Query Attention (MQA)
- Group-Query Attention (GQA)
- Flash attention
- Paged attention
Other topics in attention research:
- Low-rank matrix attention
- Medusa attention
- Block attention
- Cross attention
- Fused head attention
- Hybrid local-global attention
- FFT attention
- QKV computation optimizations
- Additive attention
- Multiplicative attention
- Graph attention
- Chunked attention
- Attention sink
- Attention steering
- Bilinear attention
- Attention-free methods
- Mixture-of-Heads (MOH) Attention (MoE+MHA)
- Star attention
- Ring attention
More AI Research
Read more about: