Aussie AI

Layer Skipping

Book Excerpt from "Generative AI in C++"

by David Spuler, Ph.D.

Layer Skipping

Layer skipping refers to bypassing the processing of a single layer and moving onto the next, rather than “early exiting” to skip all the layers. This is a form of dynamic depth pruning, because it reduces the number of layers that the model will execute, using some criteria.

Although much of the existing research is about early exit to skip all further layers, there is some research on choosing to skip a single layer. Note that layer skipping is a dynamic inference optimization, because static layer skipping is effectively the same as static layer pruning.

Research papers on layer skipping (selective dynamic layer pruning):

Xin Wang, Fisher Yu, Zi-Yi Dou, Trevor Darrell, and Joseph E. Gonzalez, 2018, Skipnet: Learning dynamic routing in convolutional networks, In ECCV, 2018, https://arxiv.org/abs/1711.09485
Hassan Sajjad, Fahim Dalvi, Nadir Durrani, and Preslav Nakov, 2020, On the Effect of Dropping Layers of Pre-trained Transformer Models, arXiv preprint arXiv:2004.03844 (2020), https://arxiv.org/pdf/2004.03844v2.pdf
Alex Graves. 2016, Adaptive computation time for recurrent neural networks, arXiv preprint arXiv:1603.08983, 2016, https://arxiv.org/abs/1603.08983
Jianghao Shen, Yue Wang, Pengfei Xu, Yonggan Fu, Zhangyang Wang, Yingyan Lin, 2020, Fractional Skipping: Towards Finer-Grained Dynamic CNN Inference, January 2020, DOI: https://doi.org/10.1609/aaai.v34i04.6025, https://arxiv.org/abs/2001.00705
YG Jiang, C Cheng, H Lin, Y Fu, 2020, Learning layer-skippable inference network, IEEE Transactions on Image Processing, Volume 29, pp. 8747-8759, 28 August 2020, https://ieeexplore.ieee.org/abstract/document/9180094
H. Li, Z. Lin, X. Shen, J. Brandt, and G. Hua, 2015, A convolutional neural network cascade for face detection, 2015, in CVPR, https://paperswithcode.com/paper/a-convolutional-neural-network-cascade-for
F. Yang, W. Choi, and Y. Lin, 2016, Exploit all the layers: Fast and accurate cnn object detector with scale dependent pooling and cascaded rejection classifiers, 2016, in CVPR, https://ieeexplore.ieee.org/document/7780603
Andreas Veit and Serge Belongie, 2018, Convolutional networks with adaptive inference graphs, In ECCV, 2018, https://arxiv.org/abs/1711.11503
X. Dong, J. Huang, Y. Yang, and S. Yan, 2017, More is less: A more complicated network with less inference complexity, in CVPR, 2017. https://arxiv.org/abs/1703.08651
Hassan Sajjad, Fahim Dalvi, Nadir Durrani, Preslav Nakov, 2020, On the Effect of Dropping Layers of Pre-trained Transformer Models, arXiv preprint arXiv:2004.03844, 2020 (revised Aug 2022), https://arxiv.org/abs/2004.03844 (Examined dropping alternative layers, layer fusion, and other layer pruning strategies.)
Andreas Veit and Serge Belongie. 2018, Convolutional networks with adaptive inference graphs, ECCV, pages 3–18, 2018. https://arxiv.org/abs/1711.11503

For more research on the layer skipping, refer to https://www.aussieai.com/research/layer-pruning#skipping.

• Next:

• Up: Table of Contents

• Buy: Generative AI in C++: Coding Transformers and LLMs

The new AI programming book by Aussie AI co-founders:

AI coding in C++
Transformer engine speedups
LLM models
Phone and desktop AI
Code examples
Research citations

Get your copy from Amazon: Generative AI in C++

Aussie AI

Layer Skipping

Layer Skipping

Quick Links

Product

New to Writing?

Writing Styles