Aussie AI

Vocabulary Trimming

  • Last Updated 7 December, 2024
  • by David Spuler, Ph.D.

Vocabulary trimming in LLMs is reducing the size of the token vocabulary for inference optimization. This reduces the size of the embedding dimension, thereby reducing both the computation cost and the memory size of model weights.

On the downside, vocabulary size reduction generally means that texts may need to be expressed in more tokens. This means that the token sequence length increases for some input prompts, so this dimension of LLM layer processing is worse, whereas the embedding dimension is improved. Hence, there are important tradeoffs in this approach.

Vocabulary trimming and lexical shortlisting have been use in Neural Machine Translation (NMT) for the translation of foreign languages. This research predates much of the LLM research, with many NMT techniques using other types of AI models, rather than LLMs and Transformers. The use of vocabulary trimming in LLMs remains largely unexplored and is an area warranting further research.

Related areas of LLM inference optimization include:

Research on Vocabulary Trimming

Research papers on reducing the size of an LLM vocabulary:

More Research on Pruning Types

More AI Research

Read more about: