Aussie AI

Depth Pruning

  • Last Updated 2 November, 2024
  • by David Spuler, Ph.D.

Depth pruning is cutting off layers in neural network inference to adjust the "depth" to which computation proceeds. A neural network can be "deep" with many layers, or "shallow" with only a few. The most common type of depth pruning is layer pruning, of which the dynamic form is called early exit inference.

There are multiple dimensions along which to prune a model. Depth pruning is orthogonal to pruning in other model dimensions: width pruning, length pruning. As such, depth pruning can be combined with other types of pruning, such as in dual pruning and triple pruning (generally called "multi-dimensional pruning").

Like all types of pruning, depth pruning can be performed statically or dynamically. Static depth pruning is a type of model compression, such as static layer pruning. When done dynamically, the general class of algorithms is called dynamic inference (or "adaptive inference").

Types of Depth Pruning

Various subtypes of depth pruning include:

Research Papers on Depth Pruning

Research papers on depth pruning:

  • Ignacio de Gregorio, April 2024, Mixture-of-Depths, a Dazzling New AI Breakthrough: Conditional Computing is Finally Here, Medium, https://medium.com/@ignacio.de.gregorio.noblejas/mixture-of-depths-a-dazzling-new-ai-breakthrough-be958fc629b2 (Mixture of depths is a layer-wise per-token limit to attention head computations, which is like width pruning with dynamic depth.)
  • David Raposo, Sam Ritter, Blake Richards, Timothy Lillicrap, Peter Conway Humphreys, Adam Santoro, 2 Apr 2024, Mixture-of-Depths: Dynamically allocating compute in transformer-based language models, https://arxiv.org/abs/2404.02258 (Per-layer pruning of which tokens can be in the attention computations to give a type of mixed lengthwise pruning combined with a dynamic width pruning or slimmable network approach.)
  • Jordan Dotzel, Yash Akhauri, Ahmed S. AbouElhamayed, Carly Jiang, Mohamed Abdelfattah, Zhiru Zhang, 7 Apr 2024, Radial Networks: Dynamic Layer Routing for High-Performance Large Language Models, https://arxiv.org/abs/2404.04900 (Token-specific layer routing is similar to layer skipping and dynamic depth pruning.)
  • Mojtaba Valipour, Mehdi Rezagholizadeh, Hossein Rajabzadeh, Marzieh Tahaei, Boxing Chen, Ali Ghodsi, 2023, SortedNet, a Place for Every Network and Every Network in its Place: Towards a Generalized Solution for Training Many-in-One Neural Networks https://arxiv.org/abs/2309.00255 (Generalization of multi-dimensional pruning, by training a large neural network with many sub-networks across different width and depth dimensions.)
  • Lu Hou, Lifeng Shang, Xin Jiang, and Qun Liu. 2020. Dynabert: Dynamic BERT with adaptive width and depth. arXiv preprint arXiv:2004.04037 https://arxiv.org/abs/2004.04037 Code: https://github.com/huawei-noah/Pretrained-Language-Model/tree/master/DynaBERT
  • Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550, 2014 https://arxiv.org/abs/1412.6550
  • Bo-Kyeong Kim, Geonmin Kim, Tae-Ho Kim, Thibault Castells, Shinkook Choi, Junho Shin, Hyoung-Kyu Song, 5 Feb 2024, Shortened LLaMA: A Simple Depth Pruning for Large Language Models, https://arxiv.org/abs/2402.02834 (Analysis of dual pruning with combined depth and width pruning.)
  • Ji Liu, Dehua Tang, Yuanxian Huang, Li Zhang, Xiaocheng Zeng, Dong Li, Mingjie Lu, Jinzhang Peng, Yu Wang, Fan Jiang, Lu Tian, Ashish Sirasao, 12 Jan 2024, UPDP: A Unified Progressive Depth Pruner for CNN and Vision Transformer, https://arxiv.org/abs/2401.06426 (Block pruning strategy gives a type of depth pruning.)
  • David Spuler, March 2024, Chapter 47. Early Exit and Layer Pruning, Generative AI in C++: Coding Transformers and LLMs, https://www.amazon.com/dp/B0CXJKCWX9
  • Yijin Liu, Fandong Meng, Jie Zhou, Yufeng Chen and Jinan Xu, Faster Depth-Adaptive Transformers, https://arxiv.org/pdf/2004.13542.pdf
  • Yifan Peng, Jaesong Lee, Shinji Watanabe, I3D: Transformer architectures with input-dependent dynamic depth for speech recognition, https://arxiv.org/abs/2303.07624
  • AnonymousAuthor, 2024, AdeeperlookatdepthpruningofLLMs, https://openreview.net/pdf?id=9B7ayWclwN
  • Basar Kutukcu, Sabur Baidya, Sujit Dey, 2024, SLEXNet: Adaptive Inference Using Slimmable Early Exit Neural Networks, https://doi.org/10.1145/3689632 https://dl.acm.org/doi/pdf/10.1145/3689632 (Combined width and depth pruning with slimmable and early exit networks.)
  • Ting Hu, Christoph Meinel, Haojin Yang, 2024, A flexible BERT model enabling width- and depth-dynamic inference, Computer Speech & Language 4 April 2024, 101646, https://www.sciencedirect.com/science/article/pii/S0885230824000299 (Dual pruning method with layerwise "neural grafting" that gives dynamic width models, and combined with early exit on the depth dimension.)
  • Saurav Muralidharan, Sharath Turuvekere Sreenivas, Raviraj Joshi, Marcin Chochowski, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro, Jan Kautz, Pavlo Molchanov, 19 Jul 2024, Compact Language Models via Pruning and Knowledge Distillation, https://arxiv.org/abs/2407.14679 https://github.com/NVlabs/Minitron (Combination of distillation and structured pruning on the depth and width dimensions.)
  • Sharath Turuvekere Sreenivas, Saurav Muralidharan, Raviraj Joshi, Marcin Chochowski, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro, Jan Kautz, Pavlo Molchanov, 26 Aug 2024 (v2), LLM Pruning and Distillation in Practice: The Minitron Approach, https://arxiv.org/abs/2408.11796
  • Rinor Cakaj, Jens Mehnert, Bin Yang, 25 Sep 2024, CNN Mixture-of-Depths, https://arxiv.org/abs/2409.17016
  • Theodore Glavas, Joud Chataoui, Florence Regol, Wassim Jabbour, Antonios Valkanas, Boris N. Oreshkin, Mark Coates, 26 Oct 2024, Dynamic layer selection in decoder-only transformers, https://arxiv.org/abs/2410.20022

For research papers on depth pruning, see the lists of papers for the various subtypes of depth pruning: layer pruning, early exit, dual pruning, cascades.

More Research on Pruning Types

More AI Research

Read more about: