Aussie AI

Magnitude Pruning

Last Updated 29 August, 2025

by David Spuler, Ph.D.

Magnitude pruning is zeroing weights that have a small magnitude, which means they have a small numeric absolute value. Another way to say it is that it is the removal of near-zero weights (positive and negative).

Magnitude pruning is the simplest type of unstructured pruning. In its pure form, any of the weights in the whole model may be pruned, regardless of what structure they are in. This can be combined with structural pruning by limiting to particular structural units of the model.

Magnitude pruning can be performed as part of training, or after training. Post-training magnitude pruning is conceptually similar to quantization, in that a new model with changed weights can be created. Sometimes post-pruning re-training may be required, or it also may be avoided.

Research on Magnitude Pruning

Research papers on magnitude pruning include:

Zhu, M. and Gupta, S. To prune, or not to prune: exploring the efficacy of pruning for model compression. CoRR, abs/1710.01878, 2017. https://arxiv.org/abs/1710.01878
Abigail See, Minh-Thang Luong, and Christopher D. Manning. Compression of neural machine translation models via pruning. In CoNLL, pages 291–301. ACL, 2016, https://arxiv.org/abs/1606.09274
Sharan Narang, Gregory F. Diamos, Shubho Sengupta, and Erich Elsen. Exploring sparsity in recurrent neural networks. CoRR, abs/1704.05119, 2017, https://arxiv.org/abs/1704.05119
Jaeho Lee, Sejun Park, Sangwoo Mo, Sungsoo Ahn, and Jinwoo Shin. Layer-adaptive sparsity for the magnitude-based pruning. In International Conference on Learning Representations, 2020. https://arxiv.org/abs/2010.07611
Song Han, Jeff Pool, John Tran, and William Dally. Learning both weights and connections for efficient neural network. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems 28, pages 1135–1143. Curran Associates, Inc., 2015, https://arxiv.org/abs/1506.02626
J Back, N Ahn, J Kim, 2023, Magnitude Attention-based Dynamic Pruning, arXiv preprint arXiv:2306.05056, https://arxiv.org/abs/2306.05056
Manas Gupta, Efe Camci, Vishandi Rudy Keneta, Abhishek Vaidyanathan, Ritwik Kanodia, Chuan-Sheng Foo, Wu Min, and Lin Jie. Is complexity required for neural network pruning? a case study on global magnitude pruning. arXiv preprint arXiv:2209.14624, 2022, https://arxiv.org/abs/2209.14624
U. Evci, T. Gale, J. Menick, P. S. Castro, and E. Elsen, “Rigging the lottery: Making all tickets winners,” in Proceedings of the 37th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, H. D. III and A. Singh, Eds., vol. 119. PMLR, 13–18 Jul 2020, pp. 2943–2952. https://proceedings.mlr.press/v119/evci20a.html, https://arxiv.org/abs/1911.11134
N. Strom, “Sparse connection and pruning in large dynamic artificial neural networks,” 1997. PDF: https://www.nikkostrom.com/publications/euro97/euro97.pdf
S. Park, J. Lee, S. Mo, and J. Shin, “Lookahead: a far-sighted alternative of magnitude-based pruning,” International Conference on Learning Representations, 2020. https://arxiv.org/abs/2002.04809, Code: https://github.com/alinlab/lookahead_pruning
Thimm G. & Fiesler E., 1995, “Evaluating pruning methods” , In 1995 International Symposium on Artificial Neural Networks, Proc. ISANN ‘95, pp. A2 20-25, National Chiao-Tung University, Hsinchu, Taiwan, 1995. https://www.semanticscholar.org/paper/Evaluating-pruning-methods-Thimm-Fiesler/80e02a91b0645d076e9584a266978fd322e35f6b
Z. Wang, Ce Zhu, Zhiqiang Xia, Qi Guo, Y. Liu, 2017, Towards thinner convolutional neural networks through gradually global pruning, Computer Science IEEE International Conference on Image Processing, https://arxiv.org/abs/1703.09916v1 (Pruning a percentage of weights across layers.)
Yu-xin Zhang, Mingbao Lin, +7 authors Rongrong Ji 2021, Efficient Weight Pruning using Pre-trained Lottery Jackpots, https://arxiv.org/abs/2104.08700v3
L. Prechelt 1997, Adaptive parameter pruning in neural networks https://www.researchgate.net/publication/2283202_Adaptive_Parameter_Pruning_in_Neural_Networks (Adapting the pruning method during training.)
S. Han, J. Pool, J. Tran, and W. J. Dally. Learning both weights and connections for efficient neural networks. In Advances in Neural Information Processing Systems, 2015. https://arxiv.org/abs/1506.02626 (Iterative pruning and re-training.)
T. Gale, E. Elsen, and S. Hooker. 2019. The state of sparsity in deep neural networks. arXiv preprint 1902.09574, https://arxiv.org/abs/1902.09574
Frantar, E.; and Alistarh, D. 2023. SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot. arXiv:2301.00774. https://arxiv.org/abs/2301.00774
F. Manessi, A. Rozza, S. Bianco, P. Napoletano and R. Schettini, "Automated pruning for deep neural network compression", Proc. 24th Int. Conf. Pattern Recognit. (ICPR), pp. 657-664, Aug. 2018. https://arxiv.org/abs/1712.01721 (Magnitude pruning with layerwise thresholds.)
You Zhou, Xiujing Lin, Xiang Zhang, Maolin Wang, Gangwei Jiang, Huakang Lu, Yupeng Wu, Kai Zhang, Zhe Yang, Kehang Wang, Yongduo Sui, Fengwei Jia, Zuoli Tang, Yao Zhao, Hongxuan Zhang, Tiannuo Yang, Weibo Chen, Yunong Mao, Yi Li, De Bao, Yu Li, Hongrui Liao, Ting Liu, Jingwen Liu, Jinchi Guo, Xiangyu Zhao, Ying WEI, Hong Qian, Qi Liu, Xiang Wang, Wai Kin (Victor)Chan, Chenliang Li, Yusen Li, Shiyu Yang, Jining Yan, Chao Mou, Shuai Han, Wuxia Jin, Guannan Zhang, Xiaodong Zeng, Nov 2023, On the Opportunities of Green Computing: A Survey, https://arxiv.org/abs/2311.00447 (Extensive survey of environmental and green AI issues, along with a survey of various optimization methods to reduce AI resource requirements in training and inference.)
Wei Niu, Xiaolong Ma, Sheng Lin, Shihao Wang, Xuehai Qian, Xue Lin, Yanzhi Wang, and Bin Ren. 2020. PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-Based Weight Pruning. Association for Computing Machinery, New York, NY, USA, 907–922. https://doi.org/10.1145/3373376.3378534 (Pattern-based pruning method.)
David Spuler, March 2024, Chapter 33. Pruning, Generative AI in C++: Coding Transformers and LLMs, https://www.amazon.com/dp/B0CXJKCWX9
Hongrong Cheng, Miao Zhang, Javen Qinfeng Shi, 16 Jul 2024, MINI-LLM: Memory-Efficient Structured Pruning for Large Language Models, https://arxiv.org/abs/2407.11681
Youngsuk Park, Kailash Budhathoki, Liangfu Chen, Jonas Kübler, Jiaji Huang, Matthäus Kleindessner, Jun Huan, Volkan Cevher, Yida Wang, George Karypis, 12 Jul 2024, Inference Optimization of Foundation Models on AI Accelerators, KDD’24, August 25–29, 2024, Barcelona, Spain, https://arxiv.org/abs/2407.09111
Wenxiao Wang, Wei Chen, Yicong Luo, Yongliu Long, Zhengkai Lin, Liye Zhang, Binbin Lin, Deng Cai, Xiaofei He, 15 Feb 2024, Model Compression and Efficient Inference for Large Language Models: A Survey, https://arxiv.org/abs/2402.09748
Y. Guo, A. Yao, and Y. Chen. 2016. Dynamic network surgery for efficient dnns. In Advances In Neural Information Processing Systems, pages 1379–1387, https://arxiv.org/abs/1608.04493 -
M. P. Véstias, R. P. Duarte, J. T. de Sousa, and H. C. Neto, 2019, “Fast convolutional neural networks in low density FPGAs using zero-skipping and weight pruning,” Electronics, vol. 8, no. 11, p. 1321, Nov. 2019. https://www.mdpi.com/2079-9292/8/11/1321
Arnav Chavan, Raghav Magazine, Shubham Kushwaha, Mérouane Debbah, Deepak Gupta, 24 Apr 2024 (v2), Faster and Lighter LLMs: A Survey on Current Challenges and Way Forward, https://arxiv.org/abs/2402.01799 Code: https://github.com/nyunAI/Faster-LLM-Survey
8 Jun 2024 (v2), A Survey on Efficient Inference for Large Language Models, Zixuan Zhou, Xuefei Ning, Ke Hong, Tianyu Fu, Jiaming Xu, Shiyao Li, Yuming Lou, Luning Wang, Zhihang Yuan, Xiuhong Li, Shengen Yan, Guohao Dai, Xiao-Ping Zhang, Yuhan Dong, Yu Wang, https://arxiv.org/abs/2404.14294
Haojun Xia, Zhen Zheng, Yuchao Li, Donglin Zhuang, Zhongzhu Zhou, Xiafei Qiu, Yong Li, Wei Lin, Shuaiwen Leon Song, 19 Sep 2023, Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity, https://arxiv.org/abs/2309.10285 Code: https://github.com/AlibabaResearch/flash-llm (Unstructured pruning on tensor cores in GPUs with sparse MatMul optimizations.)
18 Apr 2024 (v2), The Efficiency Spectrum of Large Language Models: An Algorithmic Survey, Tianyu Ding, Tianyi Chen, Haidong Zhu, Jiachen Jiang, Yiqi Zhong, Jinxin Zhou, Guangzhi Wang, Zhihui Zhu, Ilya Zharkov, Luming Liang, https://arxiv.org/abs/2312.00678
Guanqiao Qu, Qiyuan Chen, Wei Wei, Zheng Lin, Xianhao Chen, Kaibin Huang, July 2024, Mobile Edge Intelligence for Large Language Models: A Contemporary Survey, https://www.techrxiv.org/doi/pdf/10.36227/techrxiv.172115025.57884352
Leo Donisch, Sigurd Schacht, Carsten Lanquillon, 6 Aug 2024, Inference Optimizations for Large Language Models: Effects, Challenges, and Practical Considerations, https://arxiv.org/abs/2408.03130
Zhihang Yuan, Yuzhang Shang, Yang Zhou, Zhen Dong, Zhe Zhou, Chenhao Xue, Bingzhe Wu, Zhikai Li, Qingyi Gu, Yong Jae Lee, Yan Yan, Beidi Chen, Guangyu Sun, Kurt Keutzer, 1 May 2024 (v6), LLM Inference Unveiled: Survey and Roofline Model Insights, https://arxiv.org/abs/2402.16363 Code: https://github.com/hahnyuan/LLM-Viewer
Hongrong Cheng, Miao Zhang, Javen Qinfeng Shi, 9 Aug 2024 (v2), A Survey on Deep Neural Network Pruning-Taxonomy, Comparison, Analysis, and Recommendations, IEEE Transactions on Pattern Analysis and Machine Intelligence, doi: 10.1109/TPAMI.2024.3447085, https://arxiv.org/abs/2308.06767 https://ieeexplore.ieee.org/abstract/document/10643325
Jaxpruner: A Concise Library for Sparsity Research, Joo Hyung Lee, Wonpyo Park, Nicole Elyse Mitchell, Jonathan Pilault, Johan Samir Obando Ceron, Han-Byul Kim, Namhoon Lee, Elias Frantar, Yun Long, Amir Yazdanbakhsh, Woohyun Han, Shivani Agrawal, Suvinay Subramanian, Xin Wang, Sheng-Chun Kao, Xingyao Zhang, Trevor Gale, Aart J.C. Bik, Milen Ferev, Zhonglin Han, Hong-Seok Kim, Yann Dauphin, Gintare Karolina Dziugaite, Pablo Samuel Castro, Utku Evci, Conference on Parsimony and Learning, PMLR 234:515-528, 2024. https://proceedings.mlr.press/v234/lee24a.html https://proceedings.mlr.press/v234/lee24a/lee24a.pdf https://openreview.net/forum?id=H2rCZCfXkS https://openreview.net/pdf?id=H2rCZCfXkS
David Spuler, March 2024, Magnitude Pruning, in Generative AI in C++, https://www.aussieai.com/book/ch33-magnitude-pruning
Yue Zheng, Yuhao Chen, Bin Qian, Xiufang Shi, Yuanchao Shu, Jiming Chen, 29 Sep 2024, A Review on Edge Large Language Models: Design, Execution, and Applications, https://arxiv.org/abs/2410.11845
Fali Wang, Zhiwei Zhang, Xianren Zhang, Zongyu Wu, Tzuhao Mo, Qiuhao Lu, Wanjing Wang, Rui Li, Junjie Xu, Xianfeng Tang, Qi He, Yao Ma, Ming Huang, Suhang Wang, 4 Nov 2024, A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness, https://arxiv.org/abs/2411.03350
M Xu, D Cai, W Yin, S Wang, X Jin, X Liu - ACM Computing Surveys, 2024, Resource-efficient Algorithms and Systems of Foundation Models: A Survey, https://dl.acm.org/doi/pdf/10.1145/3706418
Ahmed F. AbouElhamayed, Jordan Dotzel, Yash Akhauri, Chi-Chih Chang, Sameh Gobriel, J. Pablo Muñoz, Vui Seng Chua, Nilesh Jain, Mohamed S. Abdelfattah, 18 Feb 2025, SparAMX: Accelerating Compressed LLMs Token Generation on AMX-powered CPUs, https://arxiv.org/abs/2502.12444
Shaibal Saha, Lanyu Xu, 26 Feb 2025, Vision Transformers on the Edge: A Comprehensive Survey of Model Compression and Acceleration Strategies, https://arxiv.org/abs/2503.02891
Jaeseong Lee, seung-won hwang, Aurick Qiao, Daniel F Campos, Zhewei Yao, Yuxiong He, 21 Jul 2025, STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning, https://arxiv.org/abs/2409.06211
Jaeheun Jung, Jaehyuk Lee, Yeajin Lee, Donghun Lee, 22 Jul 2025, IPPRO: Importance-based Pruning with PRojective Offset for Magnitude-indifferent Structural Pruning, https://arxiv.org/abs/2507.14171
Huanxuan Liao, Yixing Xu, Shizhu He, Guanchen Li, Xuanwu Yin, Dong Li, Emad Barsoum, Jun Zhao, Kang Liu, 21 Aug 2025, SparK: Query-Aware Unstructured Sparsity with Recoverable KV Cache Channel Pruning, https://arxiv.org/abs/2508.15212