英语科技论文摘要语步结构语料库构建研究

Translated title of the contribution: Research on Construction of Corpus for Move Structures in Abstracts of English Scientific Research Articles

Hongzheng Li, Ruojin Wang, Chong Feng*, Fang Liu

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Move structures are discourse units in research articles (RA) and is of great value in move analysis, essay writing, etc. Although there is abundant research on move structures in academic articles, there are still relatively few move annotation data resources. This research developed and constructed a specific corpus for annotating move structures in English RA abstracts. Currently, nearly 34,000 Move structures have been annotated, covering the fields of Natural Language Processing (NLP), Computer Vision (CV), Communication Engineering and Mechanical Engineering. We also presented annotation statistics and analysis. The first stage of corpus construction relies on manual annotation to form high-quality corpus data. In the second and main stage, an automatic recognition and annotation model based on BERT is adopted, which can improve the annotation speed and expand the annotation scale while ensuring the annotation quality. We conducted move structure recognition experiments based on the constructed corpus, and compared the performance of our model with large language models(LLM) including ChatGPT and Claude3. The experimental results show that the F1 scores of move structure recognition achieved by our model outperformed those of LLM, indicating the effectiveness of the proposed model. This corpus is currently publicly available and can provide necessary data resources for NLP related tasks such as scientific paper information extraction and English writing intelligent assistance, it is also beneficial to foreign language teaching and research such as English for Academic Purposes, it can effectively promote the digital transformation of foreign language education.

Translated title of the contributionResearch on Construction of Corpus for Move Structures in Abstracts of English Scientific Research Articles
Original languageChinese (Traditional)
Title of host publicationMain Conference
EditorsMaosong Sun, Jiye Liang, Xianpei Han, Zhiyuan Liu, Yulan He
PublisherChinese National Conference on Computational Linguistic (CCL)
Pages841-852
Number of pages12
ISBN (Electronic)9780000000002
Publication statusPublished - 2024
Externally publishedYes
Event23rd Chinese National Conference on Computational Linguistics, CCL 2024 - Taiyuan, China
Duration: 24 Jul 202428 Jul 2024

Publication series

NameCCL 2024 - 23rd Chinese National Conference on Computational Linguistics
Volume1

Conference

Conference23rd Chinese National Conference on Computational Linguistics, CCL 2024
Country/TerritoryChina
CityTaiyuan
Period24/07/2428/07/24

Fingerprint

Dive into the research topics of 'Research on Construction of Corpus for Move Structures in Abstracts of English Scientific Research Articles'. Together they form a unique fingerprint.

Cite this