Abstract
Move structures are discourse units in research articles (RA) and is of great value in move analysis, essay writing, etc. Although there is abundant research on move structures in academic articles, there are still relatively few move annotation data resources. This research developed and constructed a specific corpus for annotating move structures in English RA abstracts. Currently, nearly 34,000 Move structures have been annotated, covering the fields of Natural Language Processing (NLP), Computer Vision (CV), Communication Engineering and Mechanical Engineering. We also presented annotation statistics and analysis. The first stage of corpus construction relies on manual annotation to form high-quality corpus data. In the second and main stage, an automatic recognition and annotation model based on BERT is adopted, which can improve the annotation speed and expand the annotation scale while ensuring the annotation quality. We conducted move structure recognition experiments based on the constructed corpus, and compared the performance of our model with large language models(LLM) including ChatGPT and Claude3. The experimental results show that the F1 scores of move structure recognition achieved by our model outperformed those of LLM, indicating the effectiveness of the proposed model. This corpus is currently publicly available and can provide necessary data resources for NLP related tasks such as scientific paper information extraction and English writing intelligent assistance, it is also beneficial to foreign language teaching and research such as English for Academic Purposes, it can effectively promote the digital transformation of foreign language education.
| Translated title of the contribution | Research on Construction of Corpus for Move Structures in Abstracts of English Scientific Research Articles |
|---|---|
| Original language | Chinese (Traditional) |
| Pages | 841-852 |
| Number of pages | 12 |
| Publication status | Published - 2024 |
| Externally published | Yes |
| Event | 23rd Chinese National Conference on Computational Linguistics, CCL 2024 - Taiyuan, China Duration: 24 Jul 2024 → 28 Jul 2024 |
Conference
| Conference | 23rd Chinese National Conference on Computational Linguistics, CCL 2024 |
|---|---|
| Country/Territory | China |
| City | Taiyuan |
| Period | 24/07/24 → 28/07/24 |