Aussie AI
Embeddings Matrix Pruning
-
Book Excerpt from "Generative AI in C++"
-
by David Spuler, Ph.D.
Embeddings Matrix Pruning
Pruning of embeddings doesn't receive much research attention, because it isn't a bottleneck in most Transformers. Most of the research on pruning embeddings has been on compacting the space of the embedding matrices for use on smaller devices, rather than for speeding it up. The conversion into an embeddings vector uses a single embedding matrix, which can be large if the model's vocabulary size is large. Various pruning approaches exist using matrix compression techniques such as sparsity or hashing.
Research papers on embeddings matrix pruning:
- Daochen Zha, Louis Feng, Bhargav Bhushanam, Dhruv Choudhary, Jade Nie, Yuandong Tian, Jay Chae, Yinbin Ma, Arun Kejariwal, Xia Hu, 2022, AutoShard: Automated Embedding Table Sharding for Recommender Systems, https://dl.acm.org/doi/abs/10.1145/3534678.3539034, https://arxiv.org/abs/2208.06399
- A Desai, L Chou, A Shrivastava, 2022, Random Offset Block Embedding (ROBE) for compressed embedding tables in deep learning recommendation systems, Conference on Machine Learning and Systems, https://arxiv.org/abs/2108.02191
- Xiangyu Zhao, Haochen Liu, Hui Liu, Jiliang Tang, Weiwei Guo, Jun Shi, Sida Wang, Huiji Gao, and Bo Long. 2020. Memory-efficient embedding for recommendations, arXiv preprint arXiv:2006.14827 (2020), https://arxiv.org/abs/2006.14827
- Antonio Ginart, Maxim Naumov, Dheevatsa Mudigere, Jiyan Yang, and James Zou. 2019. Mixed Dimension Embeddings with Application to Memory-Efficient Recommendation Systems, arXiv preprint arXiv:1909.11810 (2019), https://arxiv.org/abs/1909.11810
- Nicola Tonellotto, Craig Macdonald, 2021, Query Embedding Pruning for Dense Retrieval, CIKM ’21, November 1–5, 2021, Virtual Event, QLD, Australia, https://arxiv.org/abs/2108.10341
- IamAdiSri, 2021, Pruning a model embedding matrix for memory efficiency, April 2021, Hugging Face discussion board, https://discuss.huggingface.co/t/pruning-a-model-embedding-matrix-for-memory-efficiency/5502/7
- Raphael Shu and Hideki Nakayama. 2017, Compressing word embeddings via deep compositional code learning, In ICLR (Poster). OpenReview.net, 2018, https://arxiv.org/abs/1711.01068
- Hao-Jun Michael Shi, Dheevatsa Mudigere, Maxim Naumov, and Jiyan Yang. 2020, Compositional embeddings using complementary partitions for memory-efficient recommendation systems, In KDD, pp. 165-175. ACM, 2020, https://arxiv.org/abs/1909.02107
- Valentin Khrulkov, Oleksii Hrinchuk, Leyla Mirvakhabova, and Ivan Oseledets. 2019. Tensorized Embedding Layers for Efficient Model Compression, arXiv preprint arXiv:1901.10787 (2019), updated Feb 2020, https://arxiv.org/abs/1901.10787v1
- Manaal Faruqui, Yulia Tsvetkov, Dani Yogatama, Chris Dyer, and Noah A. Smith. 2015, Sparse overcomplete word vector representations, In ACL (1), pp. 1491-1500. The Association for Computer Linguistics, 2015, https://arxiv.org/abs/1506.02004
- Yunchuan Chen, Lili Mou, Yan Xu, Ge Li, and Zhi Jin. 2016, Compressing neural language models by sparse word representations, In ACL (1). The Association for Computer Linguistics, 2016, https://arxiv.org/abs/1610.03950 (Sparse matrix via common and rare word embeddings)
- Jingjing Xu, Wangchunshu Zhou, Zhiyi Fu, Hao Zhou, Lei Li, 2021, A Survey on Green Deep Learning, Nov 2021, https://arxiv.org/abs/2111.05193 (Extensive survey paper with section on “Compact Embeddings”.)
- Wei Deng, Junwei Pan, Tian Zhou, Deguang Kong, Aaron Flores, and Guang Lin. 2021. DeepLight: Deep Lightweight Feature Interactions for Accelerating CTR Predictions in Ad Serving, In Proceedings of the 14th ACM international conference on Web search and data mining. 922–930, https://arxiv.org/abs/2002.06987
- Jun Suzuki and Masaaki Nagata. 2016. Learning Compact Neural Word Embeddings by Parameter Space Sharing, In IJCAI. 2046–2052, https://dl.acm.org/doi/10.5555/3060832.3060907
- Aliakbar Panahi, Seyran Saeedi, and Tom Arodz. 2019. word2ket: Space-efficient Word Embeddings inspired by Quantum Entanglement, In ICLR. https://arxiv.org/abs/1911.04975
- Weiping Song, Chence Shi, Zhiping Xiao, Zhijian Duan, Yewen Xu, Ming Zhang, and Jian Tang. 2019, AutoInt: Automatic feature interaction learning via self-attentive neural networks, In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 1161–1170, 2019, https://arxiv.org/abs/1810.11921, Code: https://github.com/DeepGraphLearning/RecommenderSystems
- Antonio Ginart, Maxim Naumov, Dheevatsa Mudigere, Jiyan Yang, and James Zou. 2019, Mixed dimension embeddings with application to memory-efficient recommendation systems, arXiv preprint arXiv:1909.11810, 2019 (preprint revised Feb 2021), https://arxiv.org/abs/1909.11810
- Xiaorui Wu, Hong Xu, Honglin Zhang, Huaming Chen, and Jian Wang. 2019, Saec: Similarity-aware embedding compression in recommendation systems, CoRR, abs/1903.00103, 2019, https://arxiv.org/abs/1903.00103
- Martin Andrews. 2016, Compressing word embeddings, CoRR, abs/1511.06397, 2015 (revised May 2016), https://arxiv.org/abs/1511.06397v2
- Lili Mou, Ran Jia, Yan Xu, Ge Li, Lu Zhang, and Zhi Jin. 2016, Distilling word embeddings: An encoding approach, In CIKM, pp. 1977–1980. ACM, 2016. https://arxiv.org/abs/1506.04488 (Distillation of embeddings.)
- Patrick H. Chen, Si Si, Yang Li, Ciprian Chelba, and Cho-Jui Hsieh. 2018, GroupReduce: Block-wise low-rank approximation for neural language model shrinking, In NeurIPS, pp. 11011–11021, 2018. https://arxiv.org/abs/1806.06950 (Using low-rank matrices for vocabulary and embeddings.)
- Maximilian Lam. 2018, Word2bits - quantized word vectors, CoRR, abs/1803.05651, 2018, https://arxiv.org/abs/1803.05651 (Quantization ideas leads to compression of word vectors and embeddings.)
- Manaal Faruqui, Yulia Tsvetkov, Dani Yogatama, Chris Dyer, and Noah A. Smith. 2015, Sparse overcomplete word vector representations, In ACL (1), pp. 1491–1500. The Association for Computer Linguistics, 2015. https://arxiv.org/abs/1506.02004 (Binary quantization in relation to word vector embeddings.)
- Alexei Baevski and Michael Auli. 2019, Adaptive input representations for neural language modeling, In ICLR, 2019, https://arxiv.org/abs/1809.10853 (Faster training with adaptive embeddings size.)
For more research papers on embeddings matrix pruning and optimizations, see https://www.aussieai.com/research/embeddings.
• Next: • Up: Table of Contents |
The new AI programming book by Aussie AI co-founders:
Get your copy from Amazon: Generative AI in C++ |