Aussie AI
Greedy Decoding
-
Last Updated 3 September, 2024
-
by David Spuler, Ph.D.
What is Greedy Decoding?
Greedy decoding is a simplistic type of decoding, where the decoder simply picks the output token with the highest predicted probability. This algorithm is very efficient, but regarded as unreliable in terms of accuracy and the quality of output text. More reliable alternative decoding algorithms include top-k decoding, top-p decoding, and beam search decoding.
Problems with Greedy Decoding
One problem is that greedy decoding is deterministic, which means that the AI engine is certain to always output the same token from the same inputs. This leads to an unimpressive lack of creativity and variety, and in the worst cases, can lead to loops of repetitive phrasing and poor flow of natural language text. This repetition is called "neural text degeneration."
Greedy decoding doesn't look ahead by even one word, so it can get tripped up by two-word phrases or longer sequences of more accurate text. It is also an autoregressive decoding model, because the single output token is added to the inputs for the next phase of decoding.
Optimizing Greedy Decoding
Speed is a major advantage of greedy decoding. The algorithm to choose a token is simply the "max" function on the vector of logits. And this can be further optimized by noticing that there's no longer any need to use Softmax to convert the logit outputs from log-scale to full probabilities. The exponential function is a monotonically increasing function, so the token with the highest logit value will also have the highest probability, so the incoming logit vector can simply be scanned for its maximum, skipping the entire Softmax calculation.
Research on Greedy Decoding
Articles and papers on greedy decoding:
- Jiatao Gu, Kyunghyun Cho, Victor O.K. Li, 2017, Trainable Greedy Decoding for Neural Machine Translation, arXiv preprint arXiv:1702.02429, https://arxiv.org/abs/1702.02429
- James Briggs Feb 25, 2021, The Three Decoding Methods For NLP, Towards Data Science https://towardsdatascience.com/the-three-decoding-methods-for-nlp-23ca59cb1e9d
- Y Chen, VOK Li, K Cho, SR Bowman, 2018, A stable and effective learning strategy for trainable greedy decoding, https://arxiv.org/abs/1804.07915
- Gian Wiher, Clara Meister, Ryan Cotterell, Mar 2022, On Decoding Strategies for Neural Text Generators, https://arxiv.org/abs/2203.15721 (An evaluation of a variety of decoding algorithms including beam search, top-k, and top-p.)
- Chufan Shi, Haoran Yang, Deng Cai, Zhisong Zhang, Yifan Wang, Yujiu Yang, Wai Lam, 10 Feb 2024, A Thorough Examination of Decoding Methods in the Era of LLMs, https://arxiv.org/abs/2402.06925 (Evaluates a number of decoding algorithms with several 7B models including Llama2-7B, and also with 4-bit and 8-bit quantization.)
- Yunsheng Ni, Chuanjian Liu, Yehui Tang, Kai Han, Yunhe Wang, 13 May 2024, EMS-SD: Efficient Multi-sample Speculative Decoding for Accelerating Large Language Models, https://arxiv.org/abs/2405.07542 Code: https://github.com/niyunsheng/EMS-SD (Speculative decoding across multiple queries by avoiding padding tokens and optimizing the KV cache.)
- David Spuler, March 2024, Chapter 26. Decoding Algorithms, Generative AI in C++: Coding Transformers and LLMs, https://www.amazon.com/dp/B0CXJKCWX9
- Chengbo Liu, Yong Zhu, 1 Apr 2024 (v2), SDSAT: Accelerating LLM Inference through Speculative Decoding with Semantic Adaptive Tokens, https://arxiv.org/abs/2403.18647 Code: https://github.com/hasuoshenyun/SDSAT
- Sean Welleck, Amanda Bertsch, Matthew Finlayson, Hailey Schoelkopf, Alex Xie, Graham Neubig, Ilia Kulikov, Zaid Harchaoui, 24 Jun 2024, From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models, https://arxiv.org/abs/2406.16838 (Survey and theoretical analysis of many different decoding algorithms, along with various ways to speed them up such as speculative decoding and KV caches.)
- Domas Grigaliūnas, Mantas Lukoševičius, 29 Jul 2024, Inference acceleration for large language models using "stairs" assisted greedy generation, https://arxiv.org/abs/2407.19947 (A draft-and-verify method that is similar to speculative decoding.)
- Seungrok Jung., 15, Mar 2024, Large language model inference optimizations on AMD GPUs, ROCm blogs, https://rocm.blogs.amd.com/artificial-intelligence/llm-inference-optimize/README.html
- David Spuler, March 2024, Greedy Decoding, in Generative AI in C++, https://www.aussieai.com/book/ch26-greedy-decoding
Neural Text Degeneration
The problem of repetitious and looping text decoding is called neural text degeneration in research papers. This occurs primarily in deterministic decoding algorithms such as greedy decoding, and is largely resolved by stochastic methods such as top-k decoding.
Research on Neural Text Degeneration: Research papers include:
- Zihao Fu, Wai Lam, Anthony Man-Cho So, and Bei Shi. 2021. A theoretical analysis of the repetition problem in text generation. In Thirty-Fifth AAAI Conference on Artificial Intelligence. https://arxiv.org/abs/2012.14660
- Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi, 2019, The curious case of neural text degeneration, International Conference on Learning Representations, https://arxiv.org/abs/1904.09751
- Krishna Pillutla, Swabha Swayamdipta, Rowan Zellers, John Thickstun, Sean Welleck, Yejin Choi, and Zaid Harchaoui. 2021. MAUVE: Measuring the gap between neural text and human text using divergence frontiers. Advances in Neural Information Processing Systems. https://proceedings.neurips.cc/paper/2021/file/260c2432a0eecc28ce03c10dadc078a4-Paper.pdf
More Research on Decoding Algorithms
- Decoding algorithms (overview)
— Non-autoregressive decoding
— Greedy decoding
— Top-k decoding
— Top-p decoding
— Min-P Sampling
— Flash decoding
— Beam search decoding
— Edit decoding
— Contrastive decoding
— Constrained decoding - Parallel decoding (overview)
— Blockwise parallel decoding
— n-gram parallel decoding
— Lookahead decoding
— Medusa decoding
— Consensus decoding - Speculative decoding (overview)
— Generalized speculative decoding
— Aggressive decoding
— Lookup decoding
— Retrieval lookup decoding
— Prompt lookup decoding
— Self speculative decoding
— Tree speculative decoding
— Superposed decoding
— Hierarchical speculative decoding
— Heuristic speculative decoding
— Multi-token speculative decoding
— Sequential speculative decoding
More AI Research
Read more about: