Aussie AI
Edit Decoding
-
Last Updated 7 December, 2024
-
by David Spuler, Ph.D.
What is Edit Decoding?
Edit decoding is a specific decoding algorithm for "editing" or Grammatical Error Correction (GEC), that uses the input prompt as a template to speed up the output decoding. It is similar to aggressive decoding and speculative decoding, which are types of parallel decoding. Edit decoding can be used sequentially or in parallel.
The main advantage of edit decoding over standard autoregressive decoding is that it can start at the beginning of the context, and is specialized for editing. Hence, there is no need to prepend an instruction to the prompt, because the engine knows to do editing automatically. This allows fewer tokens in the input prompt, and also improves inference efficiency by not needing to encode a large input prompt at the start.
Edit Decoding Algorithm
Transformers have been used for editing since shortly after they were discovered in 2017. The general field is called "Grammatical Error Correction" or GEC. The inference algorithm needs to be modified to perform editing.
Non-editing decoding tasks tend to extend an input prompt, known as a "completion." However, editing tasks need to modify the input prompt, rather than extend it. Hence, the decoding algorithm needs to be changed. The basic "edit decoding" algorithm is not aggressive decoding, which is a parallelization optimization of this simpler algorithm.
The most basic difference between edit decoding and standard autoregressive decoding is a simple observation: the prompt is changed. Non-edit inference will typically emit the input prompt tokens unchanged, and then add the "completion" tokens afterwards. Edit decoding differs in that it can modify the initial input tokens, rather than always emit them verbatim.
It is worth considering a simplistic attempt to do edit decoding like this pseudocode:
tok = predict(1, n); // Predict n+1 token if (tok == input[n+1]) { // correct prediction matching input emit input[n+1]; } else { // Incorrect input text, override input[n+1] emit tok; }
But this is silly coding because if you look carefully, you can see that this is identical to simpler pseudocode:
tok = predict(1, n); // Predict n+1 token emit tok;
This method is not really using the input text in the prediction. And it will actually struggle to get started, because the initial prediction after the first token likely to diverge, because there will be many possibilities. The output likely won't be similar to the input. Instead, edit decoding needs to prefer the input tokens unless the model has high confidence in a correction.
Edit decoding research papers
References for edit-specific use of Transformer inference:
- Ning Shi, Ziheng Zeng, Haotian Zhang, Yichen Gong, 30 Sep 2020 (v2), Recurrent Inference in Text Editing, https://arxiv.org/abs/2009.12643 (Multi-pass method that makes one edit transformation at a time, and then recurses with the new sequence to make another edit correction, repeating until finished.)
- Dimitrios Alikaniotis, Vipul Raheja, 4 Jun 2019, The Unreasonable Effectiveness of Transformer Language Models in Grammatical Error Correction, https://arxiv.org/abs/1906.01733 (Examines BERT, GPT and GPT-2 Transformers using a simple decoding method with a threshold to decide when to make an edit.)
- Jared Lichtarge, Christopher Alberti, Shankar Kumar, Noam Shazeer, Niki Parmar, 31 Oct 2018, Weakly Supervised Grammatical Error Correction using Iterative Decoding, https://arxiv.org/abs/1811.01710 (Iteratively makes edits until no more can be found.)
- Daniel Dahlmeier, Hwee Tou Ng, 2012, A Beam-Search Decoder for Grammatical Error Correction, Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 568–578, Jeju Island, Korea, 12–14 July 2012, https://aclanthology.org/D12-1052.pdf
- Weijia Xu, Marine Carpuat, March 31 2021, EDITOR: An Edit-Based Transformer with Repositioning for Neural Machine Translation with Soft Lexical Constraints, Transactions of the Association for Computational Linguistics (2021) 9: 311–328, https://doi.org/10.1162/tacl_a_00368, https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00368/98622/EDITOR-An-Edit-Based-Transformer-with (Non-autoregressive use of correction operators such as insert or delete, including a novel way to do deletions using repositioning.)
- Mohammad Bavarian, Angela Jiang, Heewoo Jun, Henrique Pondé, OpenAI, March 15, 2022, New GPT-3 capabilities: Edit & insert, Open AI Blog, https://openai.com/blog/gpt-3-edit-insert (It is unclear whether OpenAI's edit endpoint "text-davinci-edit-001" is a specialized model or a version of edit decoding, or both, or something else.)
- Jonathan Mallinson, Jakub Adamek, Eric Malmi, Aliaksei Severyn, Oct 2022, EdiT5: Semi-Autoregressive Text-Editing with T5 Warm-Start, https://arxiv.org/abs/2205.12209
- Marcin Junczys-Dowmunt, Roman Grundkiewicz, Shubha Guha, and Kenneth Heafield. 2018. Approaching neural grammatical error correction as a low-resource machine translation task. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 595–606, New Orleans, Louisiana. https://aclanthology.org/N18-1055/, PDF: https://aclanthology.org/N18-1055.pdf
- Ottokar Tilk and Tanel Alum¨ae. 2016. Bidirectional recurrent neural network with attention mechanism for punctuation restoration. In Interspeech, pages 3047–3051. PDF: https://www.researchgate.net/profile/Ottokar-Tilk/publication/307889284_Bidirectional_Recurrent_Neural_Network_with_Attention_Mechanism_for_Punctuation_Restoration/links/57ed346708ae26b51b395be1/Bidirectional-Recurrent-Neural-Network-with-Attention-Mechanism-for-Punctuation-Restoration.pdf
- Abhijeet Awasthi, Sunita Sarawagi, Rasna Goyal, Sabyasachi Ghosh, Vihari Piratla, 15 May 2020 (v2), Parallel Iterative Edit Models for Local Sequence Transduction, https://arxiv.org/abs/1910.02893 (Transforms the input text into a sequence of edits and then does parallel optimizations.)
- Jared Lichtarge, Christopher Alberti, Shankar Kumar, Noam Shazeer, and Niki Parmar. 2018. Weakly supervised grammatical error correction using iterative decoding. CoRR, abs/1811.01710. https://arxiv.org/abs/1811.01710 (Beam search decoding with a high threshold to emit corrections, but where it does multiple passes over the text with few corrections each iteration..)
- OpenAI Staff, March 2022, Introducing Insert and Edits Capabilities, OpenAI Developer Forum, https://community.openai.com/t/introducing-insert-and-edits-capabilities/15993 (Unclear whether the GPT "edit" innovation for "text-davinci-edit-001" is a GEC model or a decoding algorithm, such as an optimization to edit decoding, or both, or something else.)
- Milton Pividori and Casey S. Greene, 2023, A publishing infrastructure for AI-assisted academic authoring, doi: 10.1101/2023.01.21.525030, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9900745/ (Uses OpenAI edit model "text-davinci-edit-001" was evaluated but not useed for editing parts of the process.)
- Chenming Tang, Fanyi Qu, Yunfang Wu, 28 Mar 2024, Ungrammatical-syntax-based In-context Example Selection for Grammatical Error Correction, https://arxiv.org/abs/2403.19283 (Syntax tree comparison for GEC.)
- Jindrich Libovicky, Jindrich Helcl, Marek Tlusty, Ondrej Bojar, and Pavel Pecina. 2016. CUNI system for WMT16 automatic post-editing and multimodal translation tasks. In Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers, pages 646–654, Berlin, Germany. https://arxiv.org/abs/1606.07481 (Post-editing of machine translation.)
- Sergiu Nisioi, Sanja Stajner, Simone Paolo Ponzetto, and Liviu P Dinu. 2017. Exploring neural text simplification models. In Proceedings of the 55th An nual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 85–91. https://aclanthology.org/P17-2014/ PDF: https://aclanthology.org/P17-2014.pdf (Text simplification.)
- Abigail See, Peter J. Liu, and Christopher D. Manning. 2017. Get to the point: Summarization with pointer generator networks. In Proceedings of the 55th An nual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1073 1083, Vancouver, Canada. Association for Computa tional Linguistics. https://arxiv.org/abs/1704.04368 https://aclanthology.org/P17-1099/ (Text summarization.)
- Jiwei Tan, Xiaojun Wan, and Jianguo Xiao. 2017. Abstractive document summarization with a graph based attentional neural model. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1171–1181. https://aclanthology.org/P17-1108/ PDF: https://aclanthology.org/P17-1108.pdf
- Wei Zhao, Liang Wang, Kewei Shen, Ruoyu Jia, and Jingming Liu. 2019. Improving grammatical error correction via pre-training a copy-augmented architecture with unlabeled data. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Hu man Language Technologies, Volume 1 (Long and Short Papers), pages 156–165, Minneapolis, Min nesota. https://arxiv.org/abs/1903.00138 https://aclanthology.org/N19-1014/
- David Spuler, June 2024, Aussie AI, Edit Decoding With Early Exit for Optimized Transformer On-Device Inference: IP Australia, https://ipsearch.ipaustralia.gov.au/patents/2024901641
- Shwetha Somasundaram, Anirudh Phukan, Apoorv Saxena, 2 Dec 2024, PLD+: Accelerating LLM inference by leveraging Language Model Artifacts, https://arxiv.org/abs/2412.01447
Grammatical error correction general research. For additional general research papers on the broader area of GEC, see Research on Grammatical Error Correction.
More Research on Decoding Algorithms
- Decoding algorithms (overview)
— Non-autoregressive decoding
— Greedy decoding
— Top-k decoding
— Top-p decoding
— Min-P Sampling
— Flash decoding
— Beam search decoding
— Edit decoding
— Contrastive decoding
— Constrained decoding - Parallel decoding (overview)
— Blockwise parallel decoding
— n-gram parallel decoding
— Lookahead decoding
— Medusa decoding
— Consensus decoding - Speculative decoding (overview)
— Generalized speculative decoding
— Aggressive decoding
— Lookup decoding
— Retrieval lookup decoding
— Prompt lookup decoding
— Self speculative decoding
— Tree speculative decoding
— Superposed decoding
— Hierarchical speculative decoding
— Heuristic speculative decoding
— Multi-token speculative decoding
— Sequential speculative decoding
More AI Research
Read more about:
- Decoding Algorithms
- Aggressive Decoding
- Inference Optimizations
- Loop Optimizations
- Code Optimizations
- « Research Home