Aussie AI
Shortlisting
-
Last Updated 7 December, 2024
-
by David Spuler, Ph.D.
Shortlisting is a type of vocabulary trimming for reducing the size of the token vocabulary in LLMs. This reduces the size of the vocabulary, thereby reducing both the computation cost and the memory size of model weights.
Shortlisting, also called lexical shortlisting, has been examined mostly in the research on Neural Machine Translation (NMT). Hence, there is a need for more research on LLM shortlisting of the vocabulary.
Related areas of LLM inference optimization include:
- Embeddings
- Tokenization
- Vocabulary expansion
- Vocabulary trimming
- Token pruning
- Embeddings pruning
- Funnel transformer
Research on Shortlisting
Research papers on lexical shortlisting in LLMs:
- Nikolay Bogoychev, Pinzhen Chen, Barry Haddow, Alexandra Birch, Nov 2023, Large Language Model Inference with Lexical Shortlisting, https://arxiv.org/abs/2311.09709 (Shortlisting the vocabulary to common words for reduced tokens and embedding matrix size.)
- Y Wang, K Chen, H Tan, K Guo, 2023, Tabi: An Efficient Multi-Level Inference System for Large Language Models, EuroSys '23: Proceedings of the Eighteenth European Conference on Computer Systems, Rome, Italy May 2023, Pages 233–248, https://doi.org/10.1145/3552326.3587438 https://dl.acm.org/doi/10.1145/3552326.3587438 PDF: https://cse.hkust.edu.hk/~kaichen/papers/tabi-eurosys23.pdf (Dynamic routing to small or large LLMs based on the query.)
- Nikolay Bogoychev, Pinzhen Chen, Barry Haddow, Alexandra Birch, June 20, 2024, The Ups and Downs of Large Language Model Inference, with Vocabulary Trimming by Language Heuristics, School of Informatics, University of Edinburgh, Proceedings of the Fifth Workshop on Insights from Negative Results in NLP, pages 148–153 https://aclanthology.org/2024.insights-1.17.pdf
- J Hong, G Lee, J Cho, Accelerating Multilingual Language Model for Excessively Tokenized Languages, Findings of the Association for Computational Linguistics: ACL 2024, pages 11095–11111 August 11-16, 2024, https://arxiv.org/abs/2401.10660 https://aclanthology.org/2024.findings-acl.660/ https://aclanthology.org/2024.findings-acl.660.pdf
- Nikolay Bogoychev, Pinzhen Chen, 21 Sep 2021 (v3), The Highs and Lows of Simple Lexical Domain Adaptation Approaches for Neural Machine Translation, https://arxiv.org/abs/2101.00421 https://aclanthology.org/2021.insights-1.12/
- Tobias Domhan, Eva Hasler, Ke Tran, Sony Trenous, Bill Byrne, Felix Hieber, July 2022, The Devil is in the Details: On the Pitfalls of Vocabulary Selection in Neural Machine Translation, Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Seattle, United States, https://aclanthology.org/2022.naacl-main.136/ https://aclanthology.org/2022.naacl-main.136.pdf
More AI Research
Read more about: