SNER-CS: Self-training Named Entity Recognition in Computer Science

Jing Jing Zhu, Xian Ling Mao*, Heyan Huang

*Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

1 Citation (Scopus)

Abstract

As the number of scientific publications grows, especially in computer science domain (CS), it is important to extract scientific entities from a large number of CS publications. Distantly supervised methods, generating distantly annotated training data by string match with external dictionary automatically, have been widely used in named entity recognition task. However, there are two challenges to use distantly supervised methods in computer science NER task. One is that more and more new tasks, methods and datasets in CS are proposed rapidly, which makes it difficult to build a computer science entity knowledge base with high coverage. The other is noisy annotation, because there is no uniform entity representation standard in computer science domain. To alleviate the two problems above, we propose a novel self-training method based pretraining language model with a distantly supervised label automatic construction system in CS (SNER-CS). Experimental results show that the proposed model SNER-CS performs previous state-of-the-art methods in computer science NER task.

Original languageEnglish
Article number012007
JournalJournal of Physics: Conference Series
Volume2506
Issue number1
DOIs
Publication statusPublished - 2023
Event2022 International Joint Conference on Robotics and Artificial Intelligence, JCRAI 2022 - Virtual, Online
Duration: 14 Oct 202217 Oct 2022

Fingerprint

Dive into the research topics of 'SNER-CS: Self-training Named Entity Recognition in Computer Science'. Together they form a unique fingerprint.

Cite this