TY - GEN
T1 - FGCS
T2 - 12th National CCF Conference on Natural Language Processing and Chinese Computing, NLPCC 2023
AU - Wang, Hao
AU - Zhu, Jing Jing
AU - Wei, Wei
AU - Huang, Heyan
AU - Mao, Xian Ling
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023.
PY - 2023
Y1 - 2023
N2 - As scientific communities grow and evolve, more and more papers are published, especially in computer science field (CS). It is important to organize scientific information into structured knowledge bases extracted from a large corpus of CS papers, which usually requires Information Extraction (IE) about scientific entities and their relationships. In order to construct high-quality structured scientific knowledge bases by supervised learning way, as far as we know, in computer science field, there have been several handcrafted annotated entity-relation datasets like SciERC and SciREX, which are used to train supervised extracted algorithms. However, almost all these datasets ignore the annotation of following fine-grained named entities: nested entities, discontinuous entities and minimal independent semantics entities. To solve this problem, this paper will present a novel Fine-Grained entity-relation Extraction dataset in Computer Science field (FGCS), which contains rich fine-grained entities and their relationships. The proposed dataset includes 1,948 sentences of 6 entity types with up to 7 layers of nesting and 5 relation types. Extensive experiments show that the proposed dataset is a good benchmark for measuring an information extraction model’s ability of recognizing fine-grained entities and their relations. Our dataset is publicly available at https://github.com/broken-dream/FGCS.
AB - As scientific communities grow and evolve, more and more papers are published, especially in computer science field (CS). It is important to organize scientific information into structured knowledge bases extracted from a large corpus of CS papers, which usually requires Information Extraction (IE) about scientific entities and their relationships. In order to construct high-quality structured scientific knowledge bases by supervised learning way, as far as we know, in computer science field, there have been several handcrafted annotated entity-relation datasets like SciERC and SciREX, which are used to train supervised extracted algorithms. However, almost all these datasets ignore the annotation of following fine-grained named entities: nested entities, discontinuous entities and minimal independent semantics entities. To solve this problem, this paper will present a novel Fine-Grained entity-relation Extraction dataset in Computer Science field (FGCS), which contains rich fine-grained entities and their relationships. The proposed dataset includes 1,948 sentences of 6 entity types with up to 7 layers of nesting and 5 relation types. Extensive experiments show that the proposed dataset is a good benchmark for measuring an information extraction model’s ability of recognizing fine-grained entities and their relations. Our dataset is publicly available at https://github.com/broken-dream/FGCS.
KW - Datasets
KW - Fine-grained Entities
KW - Information Extraction
UR - http://www.scopus.com/inward/record.url?scp=85174684437&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-44696-2_51
DO - 10.1007/978-3-031-44696-2_51
M3 - Conference contribution
AN - SCOPUS:85174684437
SN - 9783031446955
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 653
EP - 665
BT - Natural Language Processing and Chinese Computing - 12th National CCF Conference, NLPCC 2023, Proceedings
A2 - Liu, Fei
A2 - Duan, Nan
A2 - Xu, Qingting
A2 - Hong, Yu
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 12 October 2023 through 15 October 2023
ER -