Aussie AI
Layer Skipping
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
Layer Skipping
Layer skipping refers to bypassing the processing of a single layer and moving onto the next, rather than “early exiting” to skip all the layers. This is a form of dynamic depth pruning, because it reduces the number of layers that the model will execute, using some criteria.
Although much of the existing research is about early exit to skip all further layers, there is some research on choosing to skip a single layer. Note that layer skipping is a dynamic inference optimization, because static layer skipping is effectively the same as static layer pruning.
Research papers on layer skipping (selective dynamic layer pruning):
- Xin Wang, Fisher Yu, Zi-Yi Dou, Trevor Darrell, and Joseph E. Gonzalez, 2018, Skipnet: Learning dynamic routing in convolutional networks, In ECCV, 2018, https://arxiv.org/abs/1711.09485
- Hassan Sajjad, Fahim Dalvi, Nadir Durrani, and Preslav Nakov, 2020, On the Effect of Dropping Layers of Pre-trained Transformer Models, arXiv preprint arXiv:2004.03844 (2020), https://arxiv.org/pdf/2004.03844v2.pdf
- Alex Graves. 2016, Adaptive computation time for recurrent neural networks, arXiv preprint arXiv:1603.08983, 2016, https://arxiv.org/abs/1603.08983
- Jianghao Shen, Yue Wang, Pengfei Xu, Yonggan Fu, Zhangyang Wang, Yingyan Lin, 2020, Fractional Skipping: Towards Finer-Grained Dynamic CNN Inference, January 2020, DOI: https://doi.org/10.1609/aaai.v34i04.6025, https://arxiv.org/abs/2001.00705
- YG Jiang, C Cheng, H Lin, Y Fu, 2020, Learning layer-skippable inference network, IEEE Transactions on Image Processing, Volume 29, pp. 8747-8759, 28 August 2020, https://ieeexplore.ieee.org/abstract/document/9180094
- H. Li, Z. Lin, X. Shen, J. Brandt, and G. Hua, 2015, A convolutional neural network cascade for face detection, 2015, in CVPR, https://paperswithcode.com/paper/a-convolutional-neural-network-cascade-for
- F. Yang, W. Choi, and Y. Lin, 2016, Exploit all the layers: Fast and accurate cnn object detector with scale dependent pooling and cascaded rejection classifiers, 2016, in CVPR, https://ieeexplore.ieee.org/document/7780603
- Andreas Veit and Serge Belongie, 2018, Convolutional networks with adaptive inference graphs, In ECCV, 2018, https://arxiv.org/abs/1711.11503
- X. Dong, J. Huang, Y. Yang, and S. Yan, 2017, More is less: A more complicated network with less inference complexity, in CVPR, 2017. https://arxiv.org/abs/1703.08651
- Hassan Sajjad, Fahim Dalvi, Nadir Durrani, Preslav Nakov, 2020, On the Effect of Dropping Layers of Pre-trained Transformer Models, arXiv preprint arXiv:2004.03844, 2020 (revised Aug 2022), https://arxiv.org/abs/2004.03844 (Examined dropping alternative layers, layer fusion, and other layer pruning strategies.)
- Andreas Veit and Serge Belongie. 2018, Convolutional networks with adaptive inference graphs, ECCV, pages 3–18, 2018. https://arxiv.org/abs/1711.11503
For more research on the layer skipping, refer to https://www.aussieai.com/research/layer-pruning#skipping.
• Next: • Up: Table of Contents |
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |