Aussie AI

Funnel Transformer

Last Updated 11 June, 2025

by David Spuler, Ph.D.

The Funnel Transformer is an inference optimization method using dynamic embeddings vector pruning. The length of the embeddings vector is reduced at each layer by detecting embedding elements that are small and can be ignored. This is similar to dynamic reduction of the model's internal dimension, except that the items removed can be in random locations. Hence, this is sparsification of the tensor computations along the embedding dimension.

Related areas of LLM inference optimization include:

Research on Funnel Transformer

Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le, 5 Jun 2020, Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing, https://arxiv.org/abs/2006.03236 https://github.com/laiguokun/Funnel-Transformer
Piotr Nawrot, Szymon Tworkowski, Michał Tyrolski, Łukasz Kaiser, Yuhuai Wu, Christian Szegedy, and Henryk Michalewski. Hierarchical Transformers Are More Efficient Language Models. arxiv:2110.13711[cs], April 2022. URL http://arxiv.org/abs/2110.13711
Papers With Code, 2020, Funnel Transformer Explained, https://paperswithcode.com/method/funnel-transformer
Zihang Dai, Guokun Lai, Yiming Yang, and Quoc V. Le. 2020. Funnel-transformer: filtering out sequential redundancy for efficient language processing. In Proceedings of the 34th International Conference on Neural Information Processing Systems (NIPS '20). Curran Associates Inc., Red Hook, NY, USA, Article 359, 4271–4282. https://dl.acm.org/doi/10.5555/3495724.3496083 https://dl.acm.org/doi/pdf/10.5555/3495724.3496083 https://proceedings.neurips.cc/paper/2020/file/2cd2915e69546904e4e5d4a2ac9e1652-Paper.pdf
Connor Shorten, 2020, Funnel Transformer Explained! https://www.youtube.com/watch?v=QsIcEqGriGg
Sharan Narang, Hyung Won Chung, Yi Tay, William Fedus, Thibault Fevry, Michael Matena, Karishma Malkan, Noah Fiedel, Noam Shazeer, Zhenzhong Lan, Yanqi Zhou, Wei Li, Nan Ding, Jake Marcus, Adam Roberts, Colin Raffel, 10 Sep 2021 (v2), Do Transformer Modifications Transfer Across Implementations and Applications? https://arxiv.org/abs/2102.11972
Zihang Dai, Guokun Lai, Yiming Yang, and Quoc Le., Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing. In Advances in Neural Information Processing Systems, volume 33, pp. 4271–4282. Curran Associates, Inc., 2020. URL https://papers.nips.cc/paper/2020/hash/2cd2915e69546904e4e5d4a2ac9e1652-Abstract.html
Sandeep Subramanian, Ronan Collobert, Marc'Aurelio Ranzato, Y-Lan Boureau, 1 May 2020, Multi-scale Transformer Language Models, https://arxiv.org/abs/2005.00581 (This paper is cited by the Funnel Transformer paper as being similar.)
Ningning Ma, Xiangyu Zhang, Jian Sun, 24 Jul 2020 (v2), Funnel Activation for Visual Recognition, https://arxiv.org/abs/2007.11824 https://github.com/megvii-model/FunnelAct
Yujing Zhang, 2022, Funnel Vision Transformer for image classification, CVPR 2022 Submission, Stanford University, http://vision.stanford.edu/teaching/cs231n/reports/2022/pdfs/11.pdf (Funnel Transformer architecture applied to a ViT.)
DongHyun Choi, Lucas Spangher, Chris Hidey, Peter Grabowski, Ramy Eskander, 2 Apr 2025, Revisiting Funnel Transformers for Modern LLM Architectures with Comprehensive Ablations in Training and Inference Configurations, https://arxiv.org/abs/2504.02877