SNER-CS: Self-training Named Entity Recognition in Computer Science

Jing Jing Zhu; Xian Ling Mao; Heyan Huang

doi:10.1088/1742-6596/2506/1/012007

SNER-CS: Self-training Named Entity Recognition in Computer Science

Jing Jing Zhu, Xian Ling Mao^*, Heyan Huang

^*此作品的通讯作者

计算机学院

Beijing Institute of Technology

科研成果: 期刊稿件 › 会议文章 › 同行评审

1 引用（Scopus）

摘要

As the number of scientific publications grows, especially in computer science domain (CS), it is important to extract scientific entities from a large number of CS publications. Distantly supervised methods, generating distantly annotated training data by string match with external dictionary automatically, have been widely used in named entity recognition task. However, there are two challenges to use distantly supervised methods in computer science NER task. One is that more and more new tasks, methods and datasets in CS are proposed rapidly, which makes it difficult to build a computer science entity knowledge base with high coverage. The other is noisy annotation, because there is no uniform entity representation standard in computer science domain. To alleviate the two problems above, we propose a novel self-training method based pretraining language model with a distantly supervised label automatic construction system in CS (SNER-CS). Experimental results show that the proposed model SNER-CS performs previous state-of-the-art methods in computer science NER task.

源语言	英语
文章编号	012007
期刊	Journal of Physics: Conference Series
卷	2506
期	1
DOI	https://doi.org/10.1088/1742-6596/2506/1/012007
出版状态	已出版 - 2023
活动	2022 International Joint Conference on Robotics and Artificial Intelligence, JCRAI 2022 - Virtual, Online 期限: 14 10月 2022 → 17 10月 2022

访问文件

10.1088/1742-6596/2506/1/012007

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{5ed6c3d136d048dabed3c85f52cf8f0f,

title = "SNER-CS: Self-training Named Entity Recognition in Computer Science",

abstract = "As the number of scientific publications grows, especially in computer science domain (CS), it is important to extract scientific entities from a large number of CS publications. Distantly supervised methods, generating distantly annotated training data by string match with external dictionary automatically, have been widely used in named entity recognition task. However, there are two challenges to use distantly supervised methods in computer science NER task. One is that more and more new tasks, methods and datasets in CS are proposed rapidly, which makes it difficult to build a computer science entity knowledge base with high coverage. The other is noisy annotation, because there is no uniform entity representation standard in computer science domain. To alleviate the two problems above, we propose a novel self-training method based pretraining language model with a distantly supervised label automatic construction system in CS (SNER-CS). Experimental results show that the proposed model SNER-CS performs previous state-of-the-art methods in computer science NER task.",

author = "Zhu, {Jing Jing} and Mao, {Xian Ling} and Heyan Huang",

note = "Publisher Copyright: {\textcopyright} Published under licence by IOP Publishing Ltd.; 2022 International Joint Conference on Robotics and Artificial Intelligence, JCRAI 2022 ; Conference date: 14-10-2022 Through 17-10-2022",

year = "2023",

doi = "10.1088/1742-6596/2506/1/012007",

language = "English",

volume = "2506",

journal = "Journal of Physics: Conference Series",

issn = "1742-6588",

publisher = "IOP Publishing Ltd.",

number = "1",

}

TY - JOUR

T1 - SNER-CS

T2 - 2022 International Joint Conference on Robotics and Artificial Intelligence, JCRAI 2022

AU - Zhu, Jing Jing

AU - Mao, Xian Ling

AU - Huang, Heyan

PY - 2023

Y1 - 2023

N2 - As the number of scientific publications grows, especially in computer science domain (CS), it is important to extract scientific entities from a large number of CS publications. Distantly supervised methods, generating distantly annotated training data by string match with external dictionary automatically, have been widely used in named entity recognition task. However, there are two challenges to use distantly supervised methods in computer science NER task. One is that more and more new tasks, methods and datasets in CS are proposed rapidly, which makes it difficult to build a computer science entity knowledge base with high coverage. The other is noisy annotation, because there is no uniform entity representation standard in computer science domain. To alleviate the two problems above, we propose a novel self-training method based pretraining language model with a distantly supervised label automatic construction system in CS (SNER-CS). Experimental results show that the proposed model SNER-CS performs previous state-of-the-art methods in computer science NER task.

AB - As the number of scientific publications grows, especially in computer science domain (CS), it is important to extract scientific entities from a large number of CS publications. Distantly supervised methods, generating distantly annotated training data by string match with external dictionary automatically, have been widely used in named entity recognition task. However, there are two challenges to use distantly supervised methods in computer science NER task. One is that more and more new tasks, methods and datasets in CS are proposed rapidly, which makes it difficult to build a computer science entity knowledge base with high coverage. The other is noisy annotation, because there is no uniform entity representation standard in computer science domain. To alleviate the two problems above, we propose a novel self-training method based pretraining language model with a distantly supervised label automatic construction system in CS (SNER-CS). Experimental results show that the proposed model SNER-CS performs previous state-of-the-art methods in computer science NER task.

UR - http://www.scopus.com/inward/record.url?scp=85169611587&partnerID=8YFLogxK

U2 - 10.1088/1742-6596/2506/1/012007

DO - 10.1088/1742-6596/2506/1/012007

M3 - Conference article

AN - SCOPUS:85169611587

SN - 1742-6588

VL - 2506

JO - Journal of Physics: Conference Series

JF - Journal of Physics: Conference Series

IS - 1

M1 - 012007

Y2 - 14 October 2022 through 17 October 2022

ER -

SNER-CS: Self-training Named Entity Recognition in Computer Science

摘要

访问文件

其它文件与链接

指纹

引用此