TY - JOUR
T1 - SNER-CS
T2 - 2022 International Joint Conference on Robotics and Artificial Intelligence, JCRAI 2022
AU - Zhu, Jing Jing
AU - Mao, Xian Ling
AU - Huang, Heyan
N1 - Publisher Copyright:
© Published under licence by IOP Publishing Ltd.
PY - 2023
Y1 - 2023
N2 - As the number of scientific publications grows, especially in computer science domain (CS), it is important to extract scientific entities from a large number of CS publications. Distantly supervised methods, generating distantly annotated training data by string match with external dictionary automatically, have been widely used in named entity recognition task. However, there are two challenges to use distantly supervised methods in computer science NER task. One is that more and more new tasks, methods and datasets in CS are proposed rapidly, which makes it difficult to build a computer science entity knowledge base with high coverage. The other is noisy annotation, because there is no uniform entity representation standard in computer science domain. To alleviate the two problems above, we propose a novel self-training method based pretraining language model with a distantly supervised label automatic construction system in CS (SNER-CS). Experimental results show that the proposed model SNER-CS performs previous state-of-the-art methods in computer science NER task.
AB - As the number of scientific publications grows, especially in computer science domain (CS), it is important to extract scientific entities from a large number of CS publications. Distantly supervised methods, generating distantly annotated training data by string match with external dictionary automatically, have been widely used in named entity recognition task. However, there are two challenges to use distantly supervised methods in computer science NER task. One is that more and more new tasks, methods and datasets in CS are proposed rapidly, which makes it difficult to build a computer science entity knowledge base with high coverage. The other is noisy annotation, because there is no uniform entity representation standard in computer science domain. To alleviate the two problems above, we propose a novel self-training method based pretraining language model with a distantly supervised label automatic construction system in CS (SNER-CS). Experimental results show that the proposed model SNER-CS performs previous state-of-the-art methods in computer science NER task.
UR - http://www.scopus.com/inward/record.url?scp=85169611587&partnerID=8YFLogxK
U2 - 10.1088/1742-6596/2506/1/012007
DO - 10.1088/1742-6596/2506/1/012007
M3 - Conference article
AN - SCOPUS:85169611587
SN - 1742-6588
VL - 2506
JO - Journal of Physics: Conference Series
JF - Journal of Physics: Conference Series
IS - 1
M1 - 012007
Y2 - 14 October 2022 through 17 October 2022
ER -