Aussie AI
Funnel Transformer
-
Last Updated 7 December, 2024
-
by David Spuler, Ph.D.
The Funnel Transformer is an inference optimization method using dynamic embeddings vector pruning. The length of the embeddings vector is reduced at each layer by detecting embedding elements that are small and can be ignored. This is similar to dynamic reduction of the model's internal dimension, except that the items removed can be in random locations. Hence, this is sparsification of the tensor computations along the embedding dimension.
Related areas of LLM inference optimization include:
- Embeddings
- Tokenization
- Vocabulary expansion
- Vocabulary trimming
- Token pruning
- Embeddings pruning
- Shortlisting
Research on Funnel Transformer
- Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le, 5 Jun 2020, Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing, https://arxiv.org/abs/2006.03236 https://github.com/laiguokun/Funnel-Transformer
- Piotr Nawrot, Szymon Tworkowski, Michał Tyrolski, Łukasz Kaiser, Yuhuai Wu, Christian Szegedy, and Henryk Michalewski. Hierarchical Transformers Are More Efficient Language Models. arxiv:2110.13711[cs], April 2022. URL http://arxiv.org/abs/2110.13711
- Papers With Code, 2020, Funnel Transformer Explained, https://paperswithcode.com/method/funnel-transformer
- Zihang Dai, Guokun Lai, Yiming Yang, and Quoc V. Le. 2020. Funnel-transformer: filtering out sequential redundancy for efficient language processing. In Proceedings of the 34th International Conference on Neural Information Processing Systems (NIPS '20). Curran Associates Inc., Red Hook, NY, USA, Article 359, 4271–4282. https://dl.acm.org/doi/10.5555/3495724.3496083 https://dl.acm.org/doi/pdf/10.5555/3495724.3496083 https://proceedings.neurips.cc/paper/2020/file/2cd2915e69546904e4e5d4a2ac9e1652-Paper.pdf
- Connor Shorten, 2020, Funnel Transformer Explained! https://www.youtube.com/watch?v=QsIcEqGriGg
- Sharan Narang, Hyung Won Chung, Yi Tay, William Fedus, Thibault Fevry, Michael Matena, Karishma Malkan, Noah Fiedel, Noam Shazeer, Zhenzhong Lan, Yanqi Zhou, Wei Li, Nan Ding, Jake Marcus, Adam Roberts, Colin Raffel, 10 Sep 2021 (v2), Do Transformer Modifications Transfer Across Implementations and Applications? https://arxiv.org/abs/2102.11972
- Zihang Dai, Guokun Lai, Yiming Yang, and Quoc Le., Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing. In Advances in Neural Information Processing Systems, volume 33, pp. 4271–4282. Curran Associates, Inc., 2020. URL https://papers.nips.cc/paper/2020/hash/2cd2915e69546904e4e5d4a2ac9e1652-Abstract.html
- Sandeep Subramanian, Ronan Collobert, Marc'Aurelio Ranzato, Y-Lan Boureau, 1 May 2020, Multi-scale Transformer Language Models, https://arxiv.org/abs/2005.00581 (This paper is cited by the Funnel Transformer paper as being similar.)
- Ningning Ma, Xiangyu Zhang, Jian Sun, 24 Jul 2020 (v2), Funnel Activation for Visual Recognition, https://arxiv.org/abs/2007.11824 https://github.com/megvii-model/FunnelAct
- Yujing Zhang, 2022, Funnel Vision Transformer for image classification, CVPR 2022 Submission, Stanford University, http://vision.stanford.edu/teaching/cs231n/reports/2022/pdfs/11.pdf (Funnel Transformer architecture applied to a ViT.)
More Research on Pruning Types
- Depth pruning (overview)
— Static layer pruning
— Layer pruning
— Early exit
— Dynamic layer pruning
— Layer skipping
— Layer approximation
— Shallow decoder architecture
— Layer reordering
— Layer Importance - Width pruning (overview)
— Attention head pruning
— Slimmable networks (width pruning)
— FFN pruning
— Channel pruning
— Filter pruning - Length pruning (longitudinal/input/end-to-end):
— Token pruning (input pruning)
— Dynamic token pruning
— Prompt compression
— Context compression
— Token merging
— Token skipping
— Token dropping
— Zero padding removal - Embedding-dimension pruning
— Embedding pruning
— Embedding matrix compression (embedding pruning)
— Embedding low-rank matrix factorization
— Unembedding matrix (output embeddings) - Multi-dimensional pruning
— Dual pruning
— Triple pruning
— Quadruple pruning
— 3D CNN model pruning
More AI Research
Read more about: