Aussie AI

Grammatical Error Correction

  • Last Updated 7 December, 2024
  • by David Spuler, Ph.D.

Grammatical Error Correction (GEC) is the research term for correcting errors in written text. The everyday terms are words like "editing" and "proofreading."

GEC research aims to use computers, especially LLMs and Transformers, to automatically proofread and correct written text. Modern Transformers have been used to perform GEC tasks since the earliest days of Transformers in 2017.

Use cases for GEC include:

  • Proofreading documents (i.e., revisions, copy editing, etc.)
  • Autocorrect
  • Language learning instructions (i.e. teaching English to weaker speakers, such as children or non-English speaking adults).

LLMs for Grammar Correction

The use of LLMs for GEC has been a mixed success. They are technically strong at finding errors, but have a tendency to "over-correct" and are also slow and inefficient to run. Overall, in the literature, although LLMs are popular in GEC papers, some of the non-LLM methods are still regarded as "state-of-the-art" rather than the use of LLMs like ChatGPT.

Pros. On the positive side, a strong LLM like ChatGPT can do a lot of things well:

  • Grammatical corrections. It can fix a lot of basic spelling and grammatical errors.
  • Advanced improvements. LLMs can make many advanced edits for fluency and creativity. The ability of an LLM to output fluent, grammatically correct English is one of their strengths.
  • Multilingual corrections. Another strength is that this capabiity exists in many languages, not only English, as several of the top models are multilingual.

Cons. On the downside, the problems include:

  • Over-correction: tendency to make major changes for fluency or creativity, rather than a minimal set of edits for correctness. Having too many corrections made on a document can be discouraging for teaching young children or non-English speakers. For example, correcting a phrase to a more eloquent way of writing is not helpful for these audiences.
  • Inefficiency. A large LLM is required for accuracy (e.g. GPT), but this is running a lot of GPU commands behind-the-scenes. This is usually worked-around by sending the request over the network to an LLM engine running in the cloud.
  • Many tokens. Correction of written text requires a large number of tokens, twice. The original document is the input text, and it must be "encoded" by GPU (or via "prefill"). Then the answer also has about the same number of words, so it is also a long response, requiring lots of computation. So it can be doubly inefficient in terms of cost and GPU processing requirements.
  • Stateless. LLMs operate in a stateless manner, requiring the context re-analyzed for every query. Hence, running an "autocorrect" via an LLM with a Transformer engine will require re-analysis by the LLM for every keystroke, or at least for every word typed, which further increases the number of tokens and the GPU power required.
  • On-device GEC difficult. Because of the inefficiency and many-tokens problems, trying to run an LLM locally on a phone, PC/laptop, or other low-resource device is difficult. For example, the "autocorrect" on your iPhone while you type a text is certainly not running an LLM in the background.

Research on LLM GEC: Research papers on using LLMs for GEC:

  • Robert Östling, Katarina Gillholm, Murathan Kurfalı, Marie Mattson, Mats Wirén, 17 Aug 2023, Evaluation of really good grammatical error correction, https://arxiv.org/abs/2308.08982 (Examines GPT-3 use in GEC and finds it effective.)
  • Maria Carolina Penteado, Fábio Perez, 18 Jul 2023 (v2), Evaluating GPT-3.5 and GPT-4 on Grammatical Error Correction for Brazilian Portuguese, https://arxiv.org/abs/2306.15788
  • Yinghui Li, Shang Qin, Jingheng Ye, Shirong Ma, Yangning Li, Libo Qin, Xuming Hu, Wenhao Jiang, Hai-Tao Zheng, Philip S. Yu, 18 Feb 2024, Rethinking the Roles of Large Language Models in Chinese Grammatical Error Correction, https://arxiv.org/abs/2402.11420
  • Masamune Kobayashi, Masato Mita, Mamoru Komachi, 26 Mar 2024, Large Language Models Are State-of-the-Art Evaluator for Grammatical Error Correction, https://arxiv.org/abs/2403.17540
  • 24 Feb 2024, Evaluating Prompting Strategies for Grammatical Error Correction Based on Language Proficiency, Min Zeng, Jiexin Kuang, Mengyang Qiu, Jayoung Song, Jungyeul Park, https://arxiv.org/abs/2402.15930
  • Steven Coyne, Keisuke Sakaguchi, Diana Galvan-Sosa, Michael Zock, Kentaro Inui, 30 May 2023 (v2), Analyzing the Performance of GPT-3.5 and GPT-4 in Grammatical Error Correction, https://arxiv.org/abs/2303.14342
  • Haoran Wu, Wenxuan Wang, Yuxuan Wan, Wenxiang Jiao, and Michael Lyu. 2023. ChatGPT or Grammarly? Evaluating ChatGPT on grammatical error correction benchmark. arXiv:2303.13648. https://arxiv.org/abs/2303.13648
  • Jared Lichtarge, Christopher Alberti, Shankar Kumar, Noam Shazeer, and Niki Parmar. 2018. Weakly supervised grammatical error correction using iterative decoding. CoRR, abs/1811.01710. https://arxiv.org/abs/1811.01710 (Beam search decoding with a high threshold to emit corrections.)
  • Jindrich Libovicky, Jindrich Helcl, Marek Tlusty, Ondrej Bojar, and Pavel Pecina. 2016. CUNI system for WMT16 automatic post-editing and multimodal translation tasks. In Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers, pages 646–654, Berlin, Germany. https://arxiv.org/abs/1606.07481 (Post-editing of machine translation.)
  • Alexandre Berard, Laurent Besacier, Olivier Pietquin. 2017. LIG-CRIStAL submission for the WMT2017automatic post-editing task. In Proceed ings of the Second Conference on Machine Transla tion, pages 623–629, Copenhagen, Denmark. Asso ciation for Computational Linguistics. https://aclanthology.org/W17-4772.pdf (Post-editing of machine translation using a simpler method that should be closer to spelling correction.)
  • Sergiu Nisioi, Sanja Stajner, Simone Paolo Ponzetto, and Liviu P Dinu. 2017. Exploring neural text simplification models. In Proceedings of the 55th An nual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 85–91. https://aclanthology.org/P17-2014/ PDF: https://aclanthology.org/P17-2014.pdf (Text simplification.)
  • Abigail See, Peter J. Liu, and Christopher D. Manning. 2017. Get to the point: Summarization with pointer generator networks. In Proceedings of the 55th An nual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1073 1083, Vancouver, Canada. Association for Computa tional Linguistics. https://arxiv.org/abs/1704.04368 https://aclanthology.org/P17-1099/ (Text summarization.)
  • Jiwei Tan, Xiaojun Wan, and Jianguo Xiao. 2017. Abstractive document summarization with a graph based attentional neural model. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1171–1181. https://aclanthology.org/P17-1108/ PDF: https://aclanthology.org/P17-1108.pdf
  • Marcin Junczys-Dowmunt, Roman Grundkiewicz, Shubha Guha, and Kenneth Heafield. 2018. Approaching neural grammatical error correction as a low-resource machine translation task. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 595–606, New Orleans, Louisiana. https://aclanthology.org/N18-1055/ PDF: https://aclanthology.org/N18-1055.pdf
  • Ottokar Tilk and Tanel Alum¨ae. 2016. Bidirectional recurrent neural network with attention mechanism for punctuation restoration. In Interspeech, pages 3047–3051. PDF: https://www.researchgate.net/profile/Ottokar-Tilk/publication/307889284_Bidirectional_Recurrent_Neural_Network_with_Attention_Mechanism_for_Punctuation_Restoration/links/57ed346708ae26b51b395be1/Bidirectional-Recurrent-Neural-Network-with-Attention-Mechanism-for-Punctuation-Restoration.pdf
  • Wei Zhao, Liang Wang, Kewei Shen, Ruoyu Jia, and Jingming Liu. 2019. Improving grammatical error correction via pre-training a copy-augmented architecture with unlabeled data. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Hu man Language Technologies, Volume 1 (Long and Short Papers), pages 156–165, Minneapolis, Min nesota. https://arxiv.org/abs/1903.00138 https://aclanthology.org/N19-1014/
  • Ning Shi, Ziheng Zeng, Haotian Zhang, Yichen Gong, 30 Sep 2020 (v2), Recurrent Inference in Text Editing, https://arxiv.org/abs/2009.12643
  • Yiwei Wang, Muhao Chen, Nanyun Peng, Kai-Wei Chang, 1 Apr 2024 (v2), DeepEdit: Knowledge Editing as Decoding with Constraints, https://arxiv.org/abs/2401.10471
  • Kostiantyn Omelianchuk, Vitaliy Atrasevych, Artem Chernodub, Oleksandr Skurzhanskyi, 29 May 2020 (v2), GECToR -- Grammatical Error Correction: Tag, Not Rewrite, https://arxiv.org/abs/2005.12592 (GEC using the encoder of a Transformer.)
  • Dimitrios Alikaniotis, Vipul Raheja, 4 Jun 2019, The Unreasonable Effectiveness of Transformer Language Models in Grammatical Error Correction, https://arxiv.org/abs/1906.01733 (Examines BERT, GPT and GPT-2 Transformers using a simple decoding method with a threshold to decide when to make an edit.)
  • Tao Fang, Shu Yang, Kaixin Lan, Derek F. Wong, Jinpeng Hu, Lidia S. Chao, Yue Zhang, 4 Apr 2023, Is ChatGPT a Highly Fluent Grammatical Error Correction System? A Comprehensive Evaluation, https://arxiv.org/abs/2304.01746
  • Mengsay Loem, Masahiro Kaneko, Sho Takase, and Naoaki Okazaki. 2023. Exploring effectiveness of GPT-3 in grammatical error correction: A study on performance and controllability in prompt-based methods. In Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023), pages 205–219, Toronto, Canada. Association for Computational Linguistics https://aclanthology.org/2023.bea-1.18/
  • Roman Grundkiewicz, Marcin Junczys-Dowmunt, and Kenneth Heafield. 2019. Neural grammatical error correction systems with unsupervised pre-training on synthetic data. In Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 252–263, Florence, Italy. Association for Computational Linguistics. https://aclanthology.org/W19-4427/
  • Michihiro Yasunaga, Jure Leskovec, and Percy Liang. 2021. LM-Critic: language models for unsupervised grammatical error correction. arXiv:2109.06822. https://arxiv.org/abs/2109.06822 https://aclanthology.org/2021.emnlp-main.611/

Edit Decoding

Many of the LLM GEC approaches do not change the decoding algorithm used in editing, but rely on prompt engineering using the default decoding algorithms (e.g. greedy, top-k, top-p, beam, etc.). This is inherently inefficient because it does not specialize the decoding algorithm to the GEC use case, and thereby wastes an opportunity to go faster.

However, there are ways to modify the decoding algorithm called "Edit decoding". The idea is to process the token "logits" in a way that understands the process is to do editing, rather than elongation or "completion" of the prompt.

For research papers on "edit decoding" ideas, see edit decoding research.

Aggressive Decoding

Aggressive decoding is an optimization that runs decoding in parallel, using the original input text as a kind of lookahead template. It can speed up latency and response time for a user, at the cost of additional GPU computation in parallel. Hence, it is a candidate for speeding up GEC in a large data center with many GPUs available, but not for on-device inference on resource-contrained edge devices like phones or PCs.

See research on aggressive decoding.

General Research Papers on Grammatical Error Correction (GEC)

Much of the research on Grammatical Error Correction (GEC) is not using edit decoding or aggressive decoding, but involves training an edit-specific model. The Seq2Edit approach involves training a model on a data set specific to editing actions (e.g. insert, delete, keep) and then running a BERT-like encoder-only model efficiently on input texts to perform editing.

More AI Research

Read more about: