Aussie AI

Skipping Optimizations

Last Updated 22 May, 2025

by David Spuler, Ph.D.

Skipping calculations is a powerful optimization whenever it can be achieved. And neural network inference is a morass of redundant calculation, so there is plenty to be skipped. There is a variety of different types of "skipping" that can be done to improve AI inference speed, from top to bottom of the AI stack.

Structural component-level skipping methods include:

COT path skipping (in reasoning models)
Layer skipping
Layer fusion
Early exit (dynamic layer skipping)
Layer pruning
Cascades (pathway skipping)

Transformer-specific types of structural "skipping" are possible:

Calculation skipping is possible at various levels, both structured and unstructured, and in various ways:

Zero skipping
Negative skipping
Conditional computation
Caching and calculation re-use
Vector dot product computation reuse
Zero padding calculation skipping
Loop perforation (probabilistic skipping of loop iterations)

Top-level skipping of a big model's inference phase entirely, in favor of a smaller model:

Inference cache (storing the whole kaboodle)
Big-little architecture (routing "easy" queries to the "small" model)
Speculative decoding (a small model "speculates" about the output)
Ensemble inference (e.g. swarms of small models)

General Papers on Skipping Optimizations

Papers with skipping algorithm theory include:

Sparsh Mittal. 2016. A survey of techniques for approximate computing. ACM Computing Surveys (CSUR) 48, 4 (2016), 1–33. https://dl.acm.org/doi/10.1145/2893356
Shujian Zhang, Korawat Tanwisuth, Chengyue Gong, Pengcheng He, Mingyuan Zhou, 7 May 2024, Switchable Decision: Dynamic Neural Generation Networks, https://arxiv.org/abs/2405.04513 (Switching and skipping sub-layer components such as attention heads, FFNs, or input token skipping, using decisions made based on allocating computation resources.)
You Zhou, Xiujing Lin, Xiang Zhang, Maolin Wang, Gangwei Jiang, Huakang Lu, Yupeng Wu, Kai Zhang, Zhe Yang, Kehang Wang, Yongduo Sui, Fengwei Jia, Zuoli Tang, Yao Zhao, Hongxuan Zhang, Tiannuo Yang, Weibo Chen, Yunong Mao, Yi Li, De Bao, Yu Li, Hongrui Liao, Ting Liu, Jingwen Liu, Jinchi Guo, Xiangyu Zhao, Ying WEI, Hong Qian, Qi Liu, Xiang Wang, Wai Kin (Victor)Chan, Chenliang Li, Yusen Li, Shiyu Yang, Jining Yan, Chao Mou, Shuai Han, Wuxia Jin, Guannan Zhang, Xiaodong Zeng, Nov 2023, On the Opportunities of Green Computing: A Survey, https://arxiv.org/abs/2311.00447 (Extensive survey of environmental and green AI issues, along with a survey of various optimization methods to reduce AI resource requirements in training and inference.)
Ajay Jaiswal, Bodun Hu, Lu Yin, Yeonju Ro, Shiwei Liu, Tianlong Chen, Aditya Akella, 5 Apr 2024, FFN-SkipLLM: A Hidden Gem for Autoregressive Decoding with Adaptive Feed Forward Skipping, https://arxiv.org/abs/2404.03865
Ren Zhuang, Ben Wang, Shuifa Sun, 17 May 2025 (v2), Accelerating Chain-of-Thought Reasoning: When Goal-Gradient Importance Meets Dynamic Skipping, https://arxiv.org/abs/2505.08392

Aussie AI

Skipping Optimizations

General Papers on Skipping Optimizations

More AI Research

Quick Links

Product

New to Writing?

Writing Styles