Aussie AI

Early Exit Research

Book Excerpt from "Generative AI in C++"

by David Spuler, Ph.D.

Early Exit Research

There are numerous papers on “early exit” of the inference algorithm without processing all the layers, and show no sign of abating. This overall technique can be categorized as “dynamic layer pruning,” “dynamic depth pruning” or “dynamic depth models”.

Early exit is also one of multiple strategies for adaptive inference, where the engine changes execution depending on the user inputs. Some types of early exit, such as hierarchical early exit, are similar to research on cascades for DNNs and CNNs.

Survey papers on early exit (dynamic layer pruning) include:

Canwen Xu, Julian McAuley, 2022, A Survey on Model Compression and Acceleration for Pretrained Language Models, https://arxiv.org/abs/2202.07105
Y. Matsubara, M. Levorato, and F. Restuccia, 2022, Split computing and early exiting for deep learning applications: Survey and research challenges, ACM Comput. Surveys, Mar 2022, https://arxiv.org/abs/2103.04505
Stefanos Laskaridis, Alexandros Kouris, Nicholas D. Lane, 2021, Adaptive Inference through Early-Exit Networks: Design, Challenges and Directions, EMDL'21: Proceedings of the 5th International Workshop on Embedded and Mobile Deep Learning, June 2021, Pages 1–6, https://doi.org/10.1145/3469116.3470012, https://dl.acm.org/doi/abs/10.1145/3469116.3470012

Specific research papers on early exit (dynamic layer pruning):

Ji Xin, Raphael Tang, Jaejun Lee, Yaoliang Yu, and Jimmy Lin, 2020, DeeBERT: Dynamic early exiting for accelerating bert inference, arXiv preprint arXiv:2004.12993, 2020, https://arxiv.org/abs/2004.12993 (Code: https://github.com/castorini/DeeBERT
Angela Fan, Edouard Grave, and Armand Joulin, 2019, Reducing transformer depth on demand with structured dropout, arXiv:1909.11556, https://arxiv.org/abs/1909.11556
Surat Teerapittayanon, Bradley McDanel, and Hsiang Tsung Kung, BranchyNet: Fast inference via early exiting from deep neural networks, 2017, arXiv:1709.01686, https://arxiv.org/abs/1709.01686
S. Teerapittayanon, B. McDanel, H.T. Kung, 2017, Distributed deep neural networks over the cloud, the edge and end devices, IEEE, Atlanta, GA, USA, 2017, pp. 328–339, doi:10.1109/ICDCS.2017.226., 5–8 June, https://doi.org/10.1109/ICDCS.2017.226
Xiaonan Li, Yunfan Shao, Tianxiang Sun, Hang Yan, Xipeng Qiu, Xuanjing Huang, 2021, Accelerating BERT Inference for Sequence Labeling via Early-Exit, May 2021, https://arxiv.org/abs/2105.13878
Arian Bakhtiarnia, Qi Zhang, Alexandros Iosifidis, 2021, Multi-Exit Vision Transformer for Dynamic Inference, June 2021, https://arxiv.org/abs/2106.15183
Nikolaos Passalis, Jenni Raitoharju, Anastasios Tefas, Moncef Gabbouj, 2020, Efficient adaptive inference for deep convolutional neural networks using hierarchical early exits, Pattern Recognition Volume 105, September 2020, 107346, https://doi.org/10.1016/j.patcog.2020.107346
Xiangjie Li, Chenfei Lou, Yuchi Chen, Zhengping Zhu, Yingtao Shen, Yehan Ma, An Zou, 2022, Predictive Exit: Prediction of Fine-Grained Early Exits for Computation- and Energy-Efficient Inference, DOI: https://doi.org/10.1609/aaai.v37i7.26042, https://ojs.aaai.org/index.php/AAAI/article/view/26042
Kaiyuan Liao, Yi Zhang, Xuancheng Ren, Qi Su, Xu Sun, Bin He, 2022, A Global Past-Future Early Exit Method for Accelerating Inference of Pre-trained Language Models, Proceedings of the 29th International Conference on Computational Linguistics, October 2022, https://aclanthology.org/2021.naacl-main.162/, https://aclanthology.org/2021.naacl-main.162.pdf
Vanderlei Bonato and Christos Bouganis, 2021, Class-specific early exit design methodology for convolutional neural networks, Applied Soft Computing (2021), https://www.sciencedirect.com/science/article/abs/pii/S1568494621002398, https://doi.org/10.1016/j.asoc.2021.107316, https://spiral.imperial.ac.uk/bitstream/10044/1/90316/2/Paper___Early_Exit___Applied_Soft_Computing.pdf
E. Baccarelli, S. Scardapane, M. Scarpiniti, A. Momenzadeh, A. Uncini, 2020, Optimized training and scalable implementation of Conditional Deep Neural Networks with early exits for Fog-supported IoT applications, Information Sciences 521 (June 2020), 107–143, DOI: https://doi.org/10.1016/j.ins.2020.02.041, http://www.sciencedirect.com/science/article/pii/
S. Wang, T. Tuor, T. Salonidis, K.K. Leung, C. Makaya, T. He, K. Chan, 2018, When edge meets learning: Adaptive control for resource-constrained distributed machine learning, in: IEEE Conference on Computer Communications (IEEE INFOCOM 2018), 2018, pp. 63–71, doi:10.1109/INFOCOM.2018.8486403, https://doi.org/10.1109/INFOCOM.2018.8486403, Honolulu, HI, USA, 16–19 April, 2018.
Wangchunshu Zhou, Canwen Xu, Tao Ge, Julian McAuley, Ke Xu, Furu Wei, 2020, BERT Loses Patience: Fast and Robust Inference with Early Exit, https://doi.org/10.48550/arXiv.2006.04152, https://arxiv.org/abs/2006.04152
S. Scardapane, M. Scarpiniti, E. Baccarelli, A. Uncini, 2020, Why should we add early exits to neural networks?, Cognitive Computation 12 (5) (2020), 954–966, doi:10.1007/s12559-020-09734-4, http://dx.doi.org/10.1007/s12559-020-09734-4
Kaiyuan Liao, Yi Zhang, Xuancheng Ren, Qi Su, Xu Sun, Bin He, 2021, A Global Past-Future Early Exit Method for Accelerating Inference of Pre-trained Language Models, Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, June 2021, DOI: 10.18653/v1/2021.naacl-main.162 https://aclanthology.org/2021.naacl-main.162/
Zizhao Wang, Wei Bao, Dong Yuan, Liming Ge, Nguyen H. Tran, Albert Y. Zomaya, 2019, SEE: Scheduling Early Exit for Mobile DNN Inference during Service Outage, MSWIM '19: Proceedings of the 22nd International ACM Conference on Modeling, Analysis and Simulation of Wireless and Mobile Systems, November 2019, Pages 279–288, https://doi.org/10.1145/3345768.3355917, https://dl.acm.org/doi/abs/10.1145/3345768.3355917
Xinrui Tan, Hongjia Li, Liming Wang, Xueqing Huang, Zhen Xu, 2021, Empowering Adaptive Early-Exit Inference with Latency Awareness, DOI: https://doi.org/10.1609/aaai.v35i11.17181, PDF: https://ojs.aaai.org/index.php/AAAI/article/view/17181/16988
Tal Schuster, Adam Fisch, Jai Gupta, Mostafa Dehghani, Dara Bahri, Vinh Q. Tran, Yi Tay, and Donald Metzler. 2022, Confident adaptive language modeling, arXiv preprint arXiv:2207.07061, 2022, https://arxiv.org/abs/2104.08803
Gao Huang, Danlu Chen, Tianhong Li, Felix Wu, Laurens van der Maaten, and Kilian Weinberger. 2018. Multi-scale dense networks for resource efficient image classification, In International Conference on Learning Representations (ICLR), https://arxiv.org/abs/1703.09844
Yigitcan Kaya, Sanghyun Hong, and Tudor Dumitras. 2019. Shallow-deep networks: Understanding and mitigating network overthinking, In International Conference on Machine Learning (ICML), volume 97, pages 3301–3310. PMLR, https://arxiv.org/abs/1810.07052
Roy Schwartz, Gabriel Stanovsky, Swabha Swayamdipta, Jesse Dodge, and Noah A. Smith. 2020. The right tool for the job: Matching model and instance complexities, In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6640–6651, Association for Computational Linguistics, https://arxiv.org/abs/2004.07453
Ji Xin, Rodrigo Nogueira, Yaoliang Yu, and Jimmy Lin. 2020. Early exiting BERT for efficient document ranking, In Proceedings of SustaiNLP: Workshop on Simple and Efficient Natural Language Processing, pages 83–88, Online. Association for Computational Linguistics. PDF: https://cs.uwaterloo.ca/~jimmylin/publications/Xin_etal_SustaiNLP2020.pdf
Ji Xin, Raphael Tang, Yaoliang Yu, and Jimmy Lin. 2021. BERxiT: Early exiting for BERT with better fine-tuning and extension to regression, In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 91–104, Association for Computational Linguistics, https://aclanthology.org/2021.eacl-main.8/, Code: https://github.com/castorini/berxit
V. Akhlaghi, A. Yazdanbakhsh, K. Samadi, R. K. Gupta, and H. Esmaeilzadeh, 2018, SnaPEA: Predictive early activation for reducing computation in deep convolutional neural networks, In Proceedings of the 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA’18). IEEE, Los Alamitos, CA, 662–673, https://doi.org/10.1109/ISCA.2018.00061, https://ieeexplore.ieee.org/document/8416863
D Li, W Wu, L Zeng, K Li - Wentai and Zeng, Lan and Li, Keqin, Es, 2023, Es-Fedavg: Early-Exit Personalized Federated Learning with Sparse Weight for Adaptive Computation, January 2023, SSRN Electronic Journal, DOI:10.2139/ssrn.4361705, https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4361705, https://www.researchgate.net/publication/368592513_Es-Fedavg_Early-Exit_Personalized_Federated_Learning_with_Sparse_Weight_for_Adaptive_Computation
Ting-Kuei Hu, Tianlong Chen, Haotao Wang, and Zhangyang Wang. 2020, Triple wins: Boosting accuracy, robustness and efficiency together by enabling input-adaptive inference, In ICLR, Feb 2020, https://arxiv.org/abs/2002.10025
Xin Wang, Fisher Yu, Zi-Yi Dou, Trevor Darrell, and Joseph E. Gonzalez. 2018, Skipnet: Learning dynamic routing in convolutional networks, In ECCV, 2018, https://arxiv.org/abs/1711.09485
Weiyu Ju; Wei Bao; Dong Yuan; Liming Ge; Bing Bing Zhou, 2021, Learning Early Exit for Deep Neural Network Inference on Mobile Devices through Multi-Armed Bandits, 2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid), 10-13 May 2021, https://ieeexplore.ieee.org/abstract/document/9499356, https://doi.org/10.1109/CCGrid51090.2021.00011
Weiyu Ju, Wei Bao, Liming Ge, Dong Yuan, 2021, Dynamic Early Exit Scheduling for Deep Neural Network Inference through Contextual Bandits, CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management, October 2021, Pages 823–832, https://doi.org/10.1145/3459637.3482335, https://dl.acm.org/doi/abs/10.1145/3459637.3482335
Andong Li; Chengshi Zheng; Lu Zhang; Xiaodong Li, 2021, Learning to Inference with Early Exit in the Progressive Speech Enhancement, 2021 29th European Signal Processing Conference (EUSIPCO), 23-27 August 2021, https://ieeexplore.ieee.org/abstract/document/9616248, https://doi.org/10.23919/EUSIPCO54536.2021.9616248, https://arxiv.org/abs/2106.11730
Polina Karpikova, Ekaterina Radionova, Anastasia Yaschenko, Andrei Spiridonov, Leonid Kostyushko, Riccardo Fabbricatore, Aleksei Ivakhnenko; 2023, FIANCEE: Faster Inference of Adversarial Networks via Conditional Early Exits, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), April 2023, pp. 12032-12043 https://openaccess.thecvf.com/content/CVPR2023/html/Karpikova_FIANCEE_Faster_Inference_of_Adversarial_Networks_via_Conditional_Early_Exits_CVPR_2023_paper.html, https://arxiv.org/abs/2304.10306
Rongkang Dong, Yuyi Mao, and Jun Zhang. 2022, Resource-Constrained Edge AI with Early Exit Prediction, Journal of Communications and Information Networks, 7(2):122–134, June 2022, https://arxiv.org/abs/2206.07269
Qunliang Xing, Mai Xu, Tianyi Li, and Zhenyu Guan. 2020, Early exit or not: Resource-efficient blind quality enhancement for compressed images, In Computer Vision – ECCV 2020, pages 275–292. Springer International Publishing. 2020, https://arxiv.org/abs/2006.16581
M. Wołczyk et al., 2021, Zero time waste: Recycling predictions in early exit neural networks, in Proc. 35th Conf. Neural Inf. Process. Syst. (NeurIPS), Virtual Conference, Dec. 2021, https://arxiv.org/abs/2106.05409
M. Phuong and C. H. Lampert, 2019, Distillation-based training for multi-exit architectures, in Proc. IEEE/CVF Int. Conf. Comput. Vision (ICCV), Seoul, Korea (South), Oct.-Nov. 2019, https://ieeexplore.ieee.org/document/9009834
S. Laskaridis, S. I. Venieris, M. Almeida, I. Leontiadis, and N. D. Lane, 2020, SPINN: Synergistic progressive inference of neural networks over device and cloud, in Proc. Annu. Inf. Conf. Mobile Comput. Netw. (MobiCom), London, UK, Sep. 2020, https://arxiv.org/abs/2008.06402
M. Wang, J. Mo, J. Lin, Z. Wang, and L. Du, 2019, DynExit: A dynamic early-exit strategy for deep residual networks, in Proc. IEEE Int. Wkshop. Signal Process. Syst. (SiPS), Nanjing, China, Oct. 2019, https://ieeexplore.ieee.org/abstract/document/9020551
Qunliang Xing, Mai Xu, Tianyi Li, and Zhenyu Guan. 2020, Early exit or not: Resource-efficient blind quality enhancement for compressed images, In Computer Vision – ECCV 2020, pages 275–292. Springer International Publishing. 2020, https://arxiv.org/abs/2006.16581
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015, Going deeper with convolutions, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2015, https://arxiv.org/abs/1409.4842
Maciej Woł czyk, Bartosz Wojcik, Klaudia Bał azy, Igor T Podolak, Jacek Tabor, Marek Smieja, and Tomasz Trzcinski. 2021, Zero time waste: Recycling predictions in early exit neural networks, In Advances in Neural Information Processing Systems, volume 34, pages 2516–2528. Curran Associates, Inc. 2021, https://arxiv.org/abs/2106.05409
Enrique S. Marquez, Jonathon S. Hare, and Mahesan Niranjan. 2018, Deep cascade learning, IEEE Transactions on Neural Networks and Learning Systems, 29(11):5475–5485. 2018, https://ieeexplore.ieee.org/document/8307262
Simone Scardapane, Danilo Comminiello, Michele Scarpiniti, Enzo Baccarelli, and Aurelio Uncini. 2020, Differentiable branching in deep networks for fast inference, In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4167–4171. 2020, https://ieeexplore.ieee.org/document/9054209
Sam Leroux, Steven Bohez, Elias De Coninck, Tim Verbelen, Bert Vankeirsbilck, Pieter Simoens, and Bart Dhoedt, Feb 2017, The cascading neural network: building the internet of smart things, Knowledge and Information Systems, 52(3):791–814, https://link.springer.com/article/10.1007/s10115-017-1029-1
Xin Wang, Yujia Luo, Daniel Crankshaw, Alexey Tumanov, Fisher Yu, and Joseph E Gonzalez. 2017, Idk cascades: Fast deep learning by learning not to overthink, arXiv preprint arXiv:1706.00885. 2017, https://arxiv.org/abs/1706.00885
Simone Scardapane, Michele Scarpiniti, Enzo Baccarelli, and Aurelio Uncini. 2020, Why should we add early exits to neural networks?, Cognitive Computation, 12(5):954–966. 2020, https://arxiv.org/abs/2004.12814
Tolga Bolukbasi, Joseph Wang, Ofer Dekel, and Venkatesh Saligrama. 2017, Adaptive neural networks for efficient inference, In International Conference on Machine Learning, pages 527–536. PMLR. 2017, https://arxiv.org/abs/1702.07811
Xin Dai, Xiangnan Kong, and Tian Guo. 2020, Epnet: Learning to exit with flexible multi-branch network, In Proceedings of the 29th ACM International Conference on Information and Knowledge Management, CIKM ’20, page 235–244, New York, NY, USA. Association for Computing Machinery. 2020, https://dl.acm.org/doi/10.1145/3340531.3411973
Xinshi Chen, Hanjun Dai, Yu Li, Xin Gao, and Le Song. 2020, Learning to stop while learning to predict, In Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 1520–1530. PMLR. 2020, https://arxiv.org/abs/2006.05082
P. Panda, A. Sengupta, K. Roy, 2016, Conditional deep learning for energy-efficient and enhanced pattern recognition, in: 2016 Design, Automation Test in Europe Conference Exhibition (DATE), 2016, pp. 475–480, https://arxiv.org/abs/1509.08971
Francesco Busolin, Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Raffaele Perego, Salvatore Trani, May 2021, Learning Early Exit Strategies for Additive Ranking Ensembles, https://arxiv.org/abs/2105.02568
B. Barla Cambazoglu, Hugo Zaragoza, Olivier Chapelle, Jiang Chen, Ciya Liao, Zhaohui Zheng, and Jon Degenhardt. 2010. Early exit optimizations for additive machine learned ranking systems, In Proceedings of the Third ACM International Conference on Web Search and Data Mining (WSDM 2010), pages 411–420, New York, New York, https://dl.acm.org/doi/10.1145/1718487.1718538
Eunhyeok Park, Dongyoung Kim, Soobeom Kim, Yong-Deok Kim, Gunhee Kim, Sungroh Yoon, Sungjoo Yoo, 2015, Big/little deep neural network for ultra low power inference, 2015. In CODES '15: Proceedings of the 10th International Conference on Hardware/Software Codesign and System Synthesis, October 2015, Pages 124–132, https://dl.acm.org/doi/10.5555/2830840.2830854
Geng, S.; Gao, P.; Fu, Z.; and Zhang, Y., 2021, RomeBERT: Robust Training of Multi-Exit BERT, arXiv preprint arXiv:2101.09755, https://arxiv.org/abs/2101.09755
Zhou, W.; Xu, C.; and McAuley, J. J., 2022. BERT Learns to Teach: Knowledge Distillation with Meta Learning, In ACL. https://arxiv.org/abs/2106.04570
Tianxiang Sun, Yunhua Zhou, Xiangyang Liu, Xinyu Zhang, Hao Jiang, Zhao Cao, Xuanjing Huang, Xipeng Qiu, 2021. Early Exiting with Ensemble Internal Classifiers, arXiv preprint arXiv:2105.137, https://arxiv.org/abs/2105.13792
Zhu, W. 2021. LeeBERT: Learned Early Exit for BERT with cross-level optimization, In ACL-IJCNLP, PDF: https://aclanthology.org/2021.acl-long.231.pdf
Zhang, Z.; Zhu, W.; Zhang, J.; et al. 2022. PCEE-BERT: Accelerating BERT Inference via Patient and Confident Early Exiting, In NAACL-HLT (Findings), https://aclanthology.org/2022.findings-naacl.25/
Maha Elbayad, Jiatao Gu, Edouard Grave, and Michael Auli. 2020. Depth-adaptive transformer, ArXiv, abs/1910.10073, https://arxiv.org/abs/1910.10073
Tal Schuster, Adam Fisch, Tommi Jaakkola, Regina Barzilay, 2021. Consistent accelerated inference via confident adaptive transformers, arXiv preprint arXiv:2104.08803. https://arxiv.org/abs/2104.08803
Guan, Y.; Li, Z.; Leng, J.; et al. 2022. Transkimmer: Transformer Learns to Layer-wise Skim, In AC, https://arxiv.org/abs/2205.07324
H. Tann, S. Hashemi, R. I. Bahar, and S. Reda. Runtime configurable deep neural networks for energy-accuracy trade-off, In CODES + ISSS, pages 34:1–34:10, 2016. https://ieeexplore.ieee.org/document/9166549
G Li, X Ma, Q Yu, L Liu, H Liu, X Wang, 2023, CoAxNN: Optimizing on-device deep learning with conditional approximate neural networks, Journal of Systems Architecture, https://www.sciencedirect.com/science/article/abs/pii/S1383762123001571
X Gao, Y Liu, T Huang, Z Hou, 2023, PF-BERxiT: Early Exiting for BERT with Parameter-efficient Fine-tuning and Flexible early exiting strategy, Neurocomputing, https://www.sciencedirect.com/science/article/abs/pii/S0925231223008135
Z Zeng, Y Hong, H Dai, H Zhuang, C Chen, August 2023, ConsistentEE: A Consistent and Hardness-Guided Early Exiting Method for Accelerating Language Models Inference, PDF: https://www.researchgate.net/publication/373392419_ConsistentEE_A_Consistent_and_Hardness-Guided_Early_Exiting_Method_for_Accelerating_Language_Models_Inference
Duggal, R., Freitas, S., Dhamnani, S., Chau, D.H., Sun, J., 2020, ELF: an early-exiting framework for long-tailed classification, Arxiv Preprint Arxiv:2006.11979 (2020) https://arxiv.org/abs/2006.11979
H Yu, D Liu, Z Zhang, J Wang, 2023, A Dynamic Transformer Network with Early Exit Mechanism for Fast Detection of Multiscale Surface Defects, IEEE Transactions on Instrumentation and Measurement (Early Access), https://ieeexplore.ieee.org/document/10242087
A Zniber, O Karrakchou, M Ghogho, 2023, Dynamic Early Exiting Predictive Coding Neural Networks, arXiv preprint arXiv:2309.02022, https://arxiv.org/pdf/2309.02022.pdf
Y. Long, I. Chakraborty, and K. Roy, 2020, Conditionally deep hybrid neural networks across edge and cloud, arXiv:2005.10851, https://arxiv.org/abs/2005.10851
Berestizshevsky, K., Even, G., 2019, Dynamically sacrificing accuracy for reduced computation: Cascaded inference based on softmax confidence, In: Lecture Notes in Computer Science, pp. 306–320. Springer International Publishing (2019). https://doi.org/10.1007/978-3-030-30484-3_26
Huang, G., Chen, D., Li, T., Wu, F., van der Maaten, L., Weinberger, K.Q., 2018, Multi-scale dense networks for resource efficient image classification, In: 6th International Conference on Learning Representations, ICLR 2018 (2018). https://doi.org/10.48550/arXiv.1703.09844, https://arxiv.org/abs/1703.09844 (Has multiple models combined in an early-exit configuration.)
A Moos, 2023, Efficient Single Object Detection on Image Patches with Early Exit Enhanced High-Precision CNNs, arXiv preprint arXiv:2309.03530, https://arxiv.org/pdf/2309.03530.pdf (Fast inference for a soccer-playing robot with cascade-like hierarchical early exits.)
Francesco Daghero, Alessio Burrello, Daniele Jahier Pagliari, Luca Benini, Enrico Macii, Massimo Poncino, 2020, Energy-Efficient Adaptive Machine Learning on IoT End-Nodes With Class-Dependent Confidence, 2020 27th IEEE International Conference on Electronics, Circuits and Systems (ICECS), pp.1-4, 2020. https://ieeexplore.ieee.org/document/9294863, https://arxiv.org/abs/2204.03431v1 (An improved stopping policy for early exits on easy-input classification tasks.)
Kyungchul Park, Chanyoung Oh, Youngmin Yi, 2020, BPNet: Branch-pruned Conditional Neural Network for Systematic Time-accuracy Tradeoff, 2020 57th ACM/IEEE Design Automation Conference (DAC), pp.1-6, 2020. https://ieeexplore.ieee.org/document/9218545
T Shen, C Lee, V Narayanan, Oct 2023, Multi-Exit Vision Transformer with Custom Fine-Tuning for Fine-Grained Image Recognition, 2023 IEEE International Conference on Image Processing (ICIP), https://ieeexplore.ieee.org/abstract/document/10222298 (Early exit from multiple places, combined with self-distillation.)
Sehoon Kim, Karttikeya Mangalam, Suhong Moon, John Canny, Jitendra Malik, Michael W. Mahoney, Amir Gholami, Kurt Keutzer, Sep 2023, Speculative Decoding with Big Little Decoder, https://arxiv.org/abs/2302.07863 (Early exiting in the context of speculative decoder optimizations.)
Schwartz, R., Stanovsky, G., Swayamdipta, S., Dodge, J., and Smith, N. A., 2020, The right tool for the job: Matching model and instance complexities, In Annual Meeting of the Association for Computational Linguistics, 2020. https://arxiv.org/abs/2004.07453 (Early exit with “wisdom of committees” decisions.)
X Li, Y Shen, A Zou, Y Ma, 2023, EENet: Energy Efficient Neural Networks with Run-time Power Management, 2023 60th ACM/IEEE Design Automation Conference (DAC), https://ieeexplore.ieee.org/abstract/document/10247701 (Learns early exit characteristics and decision methods over time.)
K Liu, S Moon, 2023, Self-supervised efficient sample weighting for multi-exit networks, Knowledge-Based Systems, https://www.sciencedirect.com/science/article/abs/pii/S0950705123007530 (Early exiting during both training and inference to reduce the disparity.)
Divya J. Bajpai, Vivek K. Trivedi, Sohan L. Yadav, Manjesh K. Hanawal, 2023, SplitEE: Early Exit in Deep Neural Networks with Split Computing, arXiv preprint arXiv:2309.09195, https://arxiv.org/abs/2309.09195
George August Wright, Umberto Cappellazzo, Salah Zaiem, Desh Raj, Lucas Ondel Yang, Daniele Falavigna, Alessio Brutti, Sep 2023, Training dynamic models using early exits for automatic speech recognition on resource-constrained devices, https://arxiv.org/abs/2309.09546
X Xu, K Yan, S Han, B Wang, X Tao, P Zhang, 2023, Learning-Based Edge-Device Collaborative DNN Inference in IoVT Networks, IEEE Internet of Things Journal, https://ieeexplore.ieee.org/abstract/document/10258387
J Wang, B Li, GL Zhang, 2023, Early Classification for Dynamic Inference of Neural Networks, arXiv preprint arXiv:2309.13443, https://arxiv.org/pdf/2309.13443.pdf
S Tang, Y Wang, C Ding, Y Liang, Y Li, D Xu, 2023, DeeDiff: Dynamic Uncertainty-Aware Early Exiting for Accelerating Diffusion Model Generation, arXiv preprint arXiv:2309.17074, https://arxiv.org/pdf/2309.17074.pdf (Uses uncertainty-based confidence to decide on early-exit in diffusion models.)
S Bae, J Ko, H Song, SY Yun, Oct 2023, Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding, arXiv preprint arXiv:2310.05424, https://arxiv.org/pdf/2310.05424.pdf (Combination of early-exit with a “shallow-deep module” and parallel decoding.)
F Regol, J Chataoui, M Coates, Oct 2023, Jointly-Learned Exit and Inference for a Dynamic Neural Network: JEI-DNN, arXiv preprint arXiv:2310.09163, http://export.arxiv.org/abs/2310.09163
Wang Y., Lv K., Huang R., Song S., Yang L., Huang G., 2020, Glance and focus: a dynamic approach to reducing spatial redundancy in image classification, Advances in neural information processing systems, Vol. 33 (2020), pp. 2432-2444, https://arxiv.org/abs/2010.05300, Code: https://github.com/blackfeather-wang/GFNet-Pytorch (Focuses on a small subset of the input to speed up inference with early-exit based on confidence level.)
Hajin Shim, Sung Ju Hwang, and Eunho Yang. 2018, Joint active feature acquisition and classification with variable-size set encoding, NeurIPS, pages 1368–1378, 2018. https://papers.nips.cc/paper/2018/file/e5841df2166dd424a57127423d276bbe-Paper.pdf

For more research on the early exit, refer to https://www.aussieai.com/research/early-exit.

• Next:

• Up: Table of Contents

• Buy: Generative AI in C++: Coding Transformers and LLMs

The new AI programming book by Aussie AI co-founders:

AI coding in C++
Transformer engine speedups
LLM models
Phone and desktop AI
Code examples
Research citations

Get your copy from Amazon: Generative AI in C++

Aussie AI

Early Exit Research

Early Exit Research

Quick Links

Product

New to Writing?

Writing Styles