Revisiting Grammatical Error Correction Evaluation and Beyond

Peiyuan Gong, Xuebo Liu*, Heyan Huang, Min Zhang

*此作品的通讯作者

科研成果: 会议稿件论文同行评审

7 引用 (Scopus)

摘要

Pretraining-based (PT-based) automatic evaluation metrics (e.g., BERTScore and BARTScore) have been widely used in several sentence generation tasks (e.g., machine translation and text summarization) due to their better correlation with human judgments over traditional overlap-based methods. Although PT-based methods have become the de facto standard for training grammatical error correction (GEC) systems, GEC evaluation still does not benefit from pretrained knowledge. This paper takes the first step towards understanding and improving GEC evaluation with pretraining. We first find that arbitrarily applying PT-based metrics to GEC evaluation brings unsatisfactory correlation results because of the excessive attention to inessential systems outputs (e.g., unchanged parts). To alleviate the limitation, we propose a novel GEC evaluation metric to achieve the best of both worlds, namely PT-M2, which only uses PT-based metrics to score those corrected parts. Experimental results on the CoNLL14 evaluation task show that PT-M2 significantly outperforms existing methods, achieving a new state-of-the-art result of 0.949 Pearson correlation. Further analysis reveals that PT-M2 is robust to evaluate competitive GEC systems. Source code and scripts are freely available at https://github.com/pygongnlp/PT-M2.

源语言英语
6891-6902
页数12
出版状态已出版 - 2022
活动2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 - Abu Dhabi, 阿拉伯联合酋长国
期限: 7 12月 202211 12月 2022

会议

会议2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022
国家/地区阿拉伯联合酋长国
Abu Dhabi
时期7/12/2211/12/22

指纹

探究 'Revisiting Grammatical Error Correction Evaluation and Beyond' 的科研主题。它们共同构成独一无二的指纹。

引用此