Aussie AI

Conditional Computation

Last Updated 11 June, 2025

by David Spuler, Ph.D.

Conditional computation is an optimization technique for AI model inference where simple computations are done first, so that more complicated and expensive computations are only done "conditionally" and often avoided completely. Other names for conditional computation as a programming optimization technique include "skipping", "lazy evaluation", "easy case first", "simple case first", and "common case first".

When applied to neural network inference, conditional computation is a type of dynamic inference (or "adaptive inference"), where the computations change dynamically based on the input sequence, and only parts of the full model are activated. Some examples of conditional computation algorithms for dynamic inference include:

Zero skipping (including skipping negatives sent to RELU)
Layer skipping
Dynamic sparsification
Dynamic pruning (e.g. channel pruning, filter pruning, head pruning)
Early exiting layers
Low-rank matrix factorization
Cascades
Speculative-decoding
Big-little architectures (dynamically selecting either a small or large model)

Research on Conditional Computation

Research papers on various types of conditional computation, with an initial cheap computation to avoid a larger subsequent computation, include:

Yuxiang Huan, Yifan Qin, Yantian You, Lirong Zheng, and Zhuo Zou. Sep 2016. A multiplication reduction technique with near-zero approximation for embedded learning in IoT devices. 2016 29th IEEE International System-on-Chip Conference (SOCC), 102–107. https://ieeexplore.ieee.org/abstract/document/7905445 (Avoids near-zero low multiplications on small values, by efficiently counting the number of prefix zeros in the floating point representation using bitwise arithmetic.)
Duvindu Piyasena, Rukshan Wickramasinghe, Debdeep Paul, Siew Kei Lam, and Meiqing Wu. 2019. Reducing dynamic power in streaming CNN hardware accelerators by exploiting computational redundancies. Proceedings 29th International Conference on Field-Programmable Logic and Applications, FPL 2019 (9 2019), 354–359, https://ieeexplore.ieee.org/document/8891989 PDF: https://siewkeilam.github.io/ei-research-group/Paper/2019H-Duvindu-FPL.pdf ("Negative skipping": Quickly estimates computed values, thereby avoiding entire computations that would be negative, since they would be reduced to zero by RELU activation.)
Giorgos Armeniakos, Georgios Zervakis, Dimitrios Soudris, Jörg Henkel, 2022, Hardware Approximate Techniques for Deep Neural Network Accelerators: A Survey, ACM Computing Surveys, Volume 55, Issue 4, No. 83, pp 1–36 https://doi.org/10.1145/3527156, https://dl.acm.org/doi/10.1145/3527156, https://arxiv.org/abs/2203.08737 (Extensive survey with a section on "Skipping" which discusses conditional computation.)
T. Ujiie, M. Hiromoto, and T. Sato. 2016. Approximated Prediction Strategy for Reducing Power Consumption of Convolutional Neural Network Processor. Conf. on Comp. Vision and Pattern Recog. Workshops (CVPRW), 870–876. https://ieeexplore.ieee.org/document/7789603 https://openaccess.thecvf.com/content_cvpr_2016_workshops/w14/papers/Ujiie_Approximated_Prediction_Strategy_CVPR_2016_paper.pdf ("Negative skipping": Uses fast logic with ternary weights to quickly approximate the value of a convolution, so as to skip it entirely if expected to be negative.)
JA Chen, W Niu, B Ren, Y Wang, X Shen, 2023, Survey: Exploiting data redundancy for optimization of deep learning, ACM Computing Surveys, https://dl.acm.org/doi/abs/10.1145/3564663, https://arxiv.org/pdf/2208.13363 (Survey paper covering various data redundancy optimizations such as skipping or reusing computations for similar data.)
Mingcong Song; Jiechen Zhao; Yang Hu; Jiaqi Zhang; Tao Li, 2018, Prediction based execution on deep neural networks, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), https://ieeexplore.ieee.org/abstract/document/8416870/, https://www.researchgate.net/profile/Mingcong-Song/publication/326566905_Prediction_Based_Execution_on_Deep_Neural_Networks/links/5bd68551a6fdcc3a8dad72ff/Prediction-Based-Execution-on-Deep-Neural-Networks.pdf
H Park, D Kim, J Ahn, S Yoo, 2016, Zero and data reuse-aware fast convolution for deep neural networks on GPU, 2016 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), https://dl.acm.org/doi/abs/10.1145/2968456.2968476, https://ieeexplore.ieee.org/document/7750981 (Zero-skipping by prediction of the results.)
Tianyang Lin, Yuxin Wang, Xiangyang Liu, Xipeng Qiu, June 2021, A Survey of Transformers, AI Open, https://arxiv.org/abs/2106.04554 (Examines some Transformer models with "Adapative Computation Transformer" (ACT) arcthitectures.)
Ankur Bapna, Naveen Arivazhagan, and Orhan Firat. 2020. Controlling Computation versus Quality for Neural Sequence Models. arXiv:2002.07106 [cs.LG], https://arxiv.org/abs/2002.07106
Mostafa Dehghani, Stephan Gouws, Oriol Vinyals, Jakob Uszkoreit, and Lukasz Kaiser. 2019. Universal Transformers. In Proceedings of ICLR. https://openreview.net/forum?id=HyzdRiR9Y7, PDF: https://openreview.net/pdf?id=HyzdRiR9Y7
Huang G., Chen D., Li T., Wu F., van der Maaten L., Weinberger K.Q., 2018, Multi-scale dense networks for resource efficient image classification, International conference on learning representations (2018), https://arxiv.org/abs/1703.09844
Wang Y., Lv K., Huang R., Song S., Yang L., Huang G., 2020, Glance and focus: a dynamic approach to reducing spatial redundancy in image classification, Advances in neural information processing systems, Vol. 33 (2020), pp. 2432-2444, https://arxiv.org/abs/2010.05300, Code: https://github.com/blackfeather-wang/GFNet-Pytorch (Focuses on a small subset of the input to speed up inference with early-exit based on confidence level.)
Hajin Shim, Sung Ju Hwang, and Eunho Yang. Joint active feature acquisition and classification with variable-size set encoding. NeurIPS, pages 1368–1378, 2018. https://papers.nips.cc/paper/2018/file/e5841df2166dd424a57127423d276bbe-Paper.pdf
Weizhe Hua, Yuan Zhou, Christopher M De Sa, Zhiru Zhang, and G Edward Suh. Channel gating neural networks. NeurIPS, pages 1884–1894, 2019, https://arxiv.org/abs/1805.12549
Zhenda Xie, Zheng Zhang, Xizhou Zhu, Gao Huang, and Stephen Lin. 2020. Spatially adaptive inference with stochastic feature sampling and interpolation. arXiv preprint arXiv:2003.08866, https://arxiv.org/abs/2003.08866
Yang L., Han Y., Chen X., Song S., Dai J., Huang G., 2020, Resolution adaptive networks for efficient inference, 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR) (2020), pp. 2369-2378, https://arxiv.org/abs/2003.07326
Bengio Y., Léonard N., Courville A., 2013, Estimating or propagating gradients through stochastic neurons for conditional computation, arXiv:1308.3432, https://arxiv.org/abs/1308.3432
Davis A., Arel I., 2013, Low-rank approximations for conditional feedforward computation in deep neural networks, arXiv:1312.4461, https://arxiv.org/abs/1312.4461
Ignacio de Gregorio, April 2024, Mixture-of-Depths, a Dazzling New AI Breakthrough: Conditional Computing is Finally Here, Medium, https://medium.com/@ignacio.de.gregorio.noblejas/mixture-of-depths-a-dazzling-new-ai-breakthrough-be958fc629b2 (Mixture of depths is a layer-wise per-token limit to attention head computations, which is like width pruning with dynamic depth.)
David Raposo, Sam Ritter, Blake Richards, Timothy Lillicrap, Peter Conway Humphreys, Adam Santoro, 2 Apr 2024, Mixture-of-Depths: Dynamically allocating compute in transformer-based language models, https://arxiv.org/abs/2404.02258 (Per-layer pruning of which tokens can be in the attention computations to give a type of mixed lengthwise pruning combined with a dynamic width pruning or slimmable network approach.)
Bartosz Wójcik, Alessio Devoto, Karol Pustelnik, Pasquale Minervini, Simone Scardapane, 15 Dec 2023, Adaptive Computation Modules: Granular Conditional Computation For Efficient Inference, https://arxiv.org/abs/2312.10193 (Modifies its computation depending on the difficulty of each input token.)
Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Hongyi Jin, Tianqi Chen, Zhihao Jia, 23 Dec 2023, Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems, https://arxiv.org/abs/2312.15234
Rafael Fão de Moura, Paulo C Santos, João Paulo C de Lima, Marco AZ Alves, Antonio CS Beck, and Luigi Carro. 2019. Skipping CNN convolutions through efficient memoization. In International Conference on Embedded Computer Systems. Springer, 65–76. https://link.springer.com/chapter/10.1007/978-3-030-27562-4_5
Weijie Chen, Yuan Zhang, Di Xie, and Shiliang Pu. 2019. A layer decomposition-recomposition framework for neuron pruning towards accurate lightweight networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 3355–3362. https://arxiv.org/abs/1812.06611 (Layerwise dynamic structural pruning of unimportant neurons.)
Taiji Suzuki, Hiroshi Abe, Tomoya Murata, Shingo Horiuchi, Kotaro Ito, Tokuma Wachi, So Hirai, Masatoshi Yukishima, and Tomoaki Nishimura. 2020. Spectral Pruning: Compressing Deep Neural Networks via Spectral Analysis and its Generalization Error. IJCAI. https://arxiv.org/abs/1808.08558 (A type of structured pruning based on information loss metrics.)
J Ainslie, T Lei, M de Jong, S Ontañón, 2023, Colt5: Faster long-range transformers with conditional computation, https://arxiv.org/abs/2303.09752
Denoyer, Ludovic and Gallinari, Patrick, 2014, Deep sequential neural network. CoRR, abs/1410.0510, http://arxiv.org/abs/1410.0510
Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, Dehao Chen, Orhan Firat, Yanping Huang, Maxim Krikun, Noam Shazeer, Zhifeng Chen, 2020, GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding, https://arxiv.org/abs/2006.16668
M Lin, J Fu, Y Bengio, 2019, Conditional computation for continual learning, arXiv preprint arXiv:1906.06635, https://arxiv.org/abs/1906.06635
Y Lou, F Xue, Z Zheng, Y You, 2022, Cross-token modeling with conditional computation, arXiv preprint arXiv:2109.02008, https://arxiv.org/abs/2109.02008
Simone Scardapane, Alessandro Baiocchi, Alessio Devoto, Valerio Marsocci, Pasquale Minervini, Jary Pomponi, 12 Mar 2024, Conditional computation in neural networks: principles and research trends, https://arxiv.org/abs/2403.07965 (Investigated three types of dynamic inference: MoE, early exit, and token selection.)
Hengyuan Hu. 2016. Papers with Code. Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures. https://paperswithcode.com/paper/network-trimming-a-data-driven-neuron-pruning (2021).
Xitong Gao. 2019. Papers with Code. Dynamic Channel Pruning: Feature Boosting and Suppression. 2021, https://paperswithcode.com/paper/dynamic-channel-pruning-feature-boosting-and
V Vanhoucke, A Senior, MZ Mao, 2011, Improving the speed of neural networks on CPUs, Google Research, https://research.google/pubs/pub37631.pdf
20 Mar 2023, Memorization Capacity of Neural Networks with Conditional Computation, Erdem Koyuncu, https://arxiv.org/abs/2303.11247
Folino, F., Folino, G., Pisani, F.S. et al., 2024, Efficiently approaching vertical federated learning by combining data reduction and conditional computation techniques. J Big Data 11, 77 (2024). https://doi.org/10.1186/s40537-024-00933-6 https://link.springer.com/article/10.1186/s40537-024-00933-6 https://link.springer.com/content/pdf/10.1186/s40537-024-00933-6.pdf
Weiqiao Shan, Yuhao Zhang, Yuchen Han, Bei Li, Xiaofeng Zhao, Yuang Li, Min Zhang, Hao Yang, Tong Xiao, Jingbo Zhu, 14 Jan 2025, Optimizing Speech Multi-View Feature Fusion through Conditional Computation, https://arxiv.org/abs/2501.08057

Aussie AI

Conditional Computation

Research on Conditional Computation

More AI Research

Quick Links

Product

New to Writing?

Writing Styles