FGCS: A Fine-Grained Scientific Information Extraction Dataset in Computer Science Domain

Hao Wang, Jing Jing Zhu, Wei Wei, Heyan Huang, Xian Ling Mao*

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

As scientific communities grow and evolve, more and more papers are published, especially in computer science field (CS). It is important to organize scientific information into structured knowledge bases extracted from a large corpus of CS papers, which usually requires Information Extraction (IE) about scientific entities and their relationships. In order to construct high-quality structured scientific knowledge bases by supervised learning way, as far as we know, in computer science field, there have been several handcrafted annotated entity-relation datasets like SciERC and SciREX, which are used to train supervised extracted algorithms. However, almost all these datasets ignore the annotation of following fine-grained named entities: nested entities, discontinuous entities and minimal independent semantics entities. To solve this problem, this paper will present a novel Fine-Grained entity-relation Extraction dataset in Computer Science field (FGCS), which contains rich fine-grained entities and their relationships. The proposed dataset includes 1,948 sentences of 6 entity types with up to 7 layers of nesting and 5 relation types. Extensive experiments show that the proposed dataset is a good benchmark for measuring an information extraction model’s ability of recognizing fine-grained entities and their relations. Our dataset is publicly available at https://github.com/broken-dream/FGCS.

源语言英语
主期刊名Natural Language Processing and Chinese Computing - 12th National CCF Conference, NLPCC 2023, Proceedings
编辑Fei Liu, Nan Duan, Qingting Xu, Yu Hong
出版商Springer Science and Business Media Deutschland GmbH
653-665
页数13
ISBN(印刷版)9783031446955
DOI
出版状态已出版 - 2023
活动12th National CCF Conference on Natural Language Processing and Chinese Computing, NLPCC 2023 - Foshan, 中国
期限: 12 10月 202315 10月 2023

出版系列

姓名Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
14303 LNAI
ISSN(印刷版)0302-9743
ISSN(电子版)1611-3349

会议

会议12th National CCF Conference on Natural Language Processing and Chinese Computing, NLPCC 2023
国家/地区中国
Foshan
时期12/10/2315/10/23

指纹

探究 'FGCS: A Fine-Grained Scientific Information Extraction Dataset in Computer Science Domain' 的科研主题。它们共同构成独一无二的指纹。

引用此