Aussie AI
Shortlisting
-
Last Updated 24 February, 2025
-
by David Spuler, Ph.D.
Shortlisting is a type of vocabulary trimming for reducing the size of the token vocabulary in LLMs. This reduces the size of the vocabulary, thereby reducing both the computation cost and the memory size of model weights.
Shortlisting, also called lexical shortlisting, has been examined mostly in the research on Neural Machine Translation (NMT). Hence, there is a need for more research on LLM shortlisting of the vocabulary.
Related areas of LLM inference optimization include:
- Embeddings
- Tokenization
- Vocabulary expansion
- Vocabulary trimming
- Token pruning
- Embeddings pruning
- Funnel transformer
Research on Shortlisting
Research papers on lexical shortlisting in LLMs:
- Nikolay Bogoychev, Pinzhen Chen, Barry Haddow, Alexandra Birch, Nov 2023, Large Language Model Inference with Lexical Shortlisting, https://arxiv.org/abs/2311.09709 (Shortlisting the vocabulary to common words for reduced tokens and embedding matrix size.)
- Y Wang, K Chen, H Tan, K Guo, 2023, Tabi: An Efficient Multi-Level Inference System for Large Language Models, EuroSys '23: Proceedings of the Eighteenth European Conference on Computer Systems, Rome, Italy May 2023, Pages 233–248, https://doi.org/10.1145/3552326.3587438 https://dl.acm.org/doi/10.1145/3552326.3587438 PDF: https://cse.hkust.edu.hk/~kaichen/papers/tabi-eurosys23.pdf (Dynamic routing to small or large LLMs based on the query.)
- Nikolay Bogoychev, Pinzhen Chen, Barry Haddow, Alexandra Birch, June 20, 2024, The Ups and Downs of Large Language Model Inference, with Vocabulary Trimming by Language Heuristics, School of Informatics, University of Edinburgh, Proceedings of the Fifth Workshop on Insights from Negative Results in NLP, pages 148–153 https://aclanthology.org/2024.insights-1.17.pdf
- J Hong, G Lee, J Cho, Accelerating Multilingual Language Model for Excessively Tokenized Languages, Findings of the Association for Computational Linguistics: ACL 2024, pages 11095–11111 August 11-16, 2024, https://arxiv.org/abs/2401.10660 https://aclanthology.org/2024.findings-acl.660/ https://aclanthology.org/2024.findings-acl.660.pdf
- Nikolay Bogoychev, Pinzhen Chen, 21 Sep 2021 (v3), The Highs and Lows of Simple Lexical Domain Adaptation Approaches for Neural Machine Translation, https://arxiv.org/abs/2101.00421 https://aclanthology.org/2021.insights-1.12/
- Tobias Domhan, Eva Hasler, Ke Tran, Sony Trenous, Bill Byrne, Felix Hieber, July 2022, The Devil is in the Details: On the Pitfalls of Vocabulary Selection in Neural Machine Translation, Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Seattle, United States, https://aclanthology.org/2022.naacl-main.136/ https://aclanthology.org/2022.naacl-main.136.pdf
- Yuta Nozaki, Dai Nakashima, Ryo Sato, Naoki Asaba, Shintaro Kawamura, Efficient Vocabulary Reduction for Small Language Models, Jan 2025, Proceedings of the 31st International Conference on Computational Linguistics: Industry Track, pages 771–783, January 19–24, 2025, Association for Computational Linguistic, https://aclanthology.org/2025.coling-industry.64.pdf
- Weilin Zhao, Tengyu Pan, Xu Han, Yudi Zhang, Ao Sun, Yuxiang Huang, Kaihuo Zhang, Weilun Zhao, Yuxuan Li, Jianyong Wang, Zhiyuan Liu, Maosong Sun, 20 Feb 2025, FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling, https://arxiv.org/abs/2502.14856 (Limiting the draft model in speculative decoding to frequently-used tokens.)
More AI Research
Read more about: