A Semi-supervised Transfer Learning Framework for Low Resource Entity and Relation Extraction in Scientific Domain

Hao Wang; Xian Ling Mao; Heyan Huang

A Semi-supervised Transfer Learning Framework for Low Resource Entity and Relation Extraction in Scientific Domain

Hao Wang, Xian Ling Mao, Heyan Huang

计算机学院

Beijing Institute of Technology

科研成果: 期刊稿件 › 会议文章 › 同行评审

1 引用（Scopus）

摘要

With the development of scientific communities, the amount of papers increases quickly. It's important to convert the unstructured scientific papers into structured knowledge base, which relies on Information Extraction (IE) to extract entities and their relationships. Most existing IE methods require abundant annotated data, which is time-consuming and expensive to obtain, especially in scientific domain because it requires annotators with domain knowledge. Recently, several works have been proposed to solve the problem by semi-supervised learning. However, these methods require the input sentence to contain only two entities and simply classify the relationship between these two entities. Obviously, it is far from the realistic application scenarios that both entities and relations need to be extracted from raw text. In this paper, we propose a Semi-supervised Transfer Learning (STL) framework to tackle joint entity and relation extraction problem in a low resource situation. Specifically, STL adopts two main strategies: a rebalancing strategy for alleviating the bias to the majority class during semi-supervised learning, and a transfer learning strategy for transferring knowledge from domains with relatively rich annotation to domains that lack annotated data. Experiment results on two public scientific IE datasets show the effectiveness of the proposed method.

源语言	英语
页（从-至）	41-47
页数	7
期刊	CEUR Workshop Proceedings
卷	3210
出版状态	已出版 - 2022
活动	3rd Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents, EEKE 2022 - Virtual, Online, 德国期限: 23 6月 2022 → 24 6月 2022

其它文件与链接

链接到 Scopus 的出版物

引用此

Wang, H., Mao, X. L., & Huang, H. (2022). A Semi-supervised Transfer Learning Framework for Low Resource Entity and Relation Extraction in Scientific Domain. CEUR Workshop Proceedings, 3210, 41-47.

@article{7b7700ad66234138bbdb4da6f25ea460,

title = "A Semi-supervised Transfer Learning Framework for Low Resource Entity and Relation Extraction in Scientific Domain",

abstract = "With the development of scientific communities, the amount of papers increases quickly. It's important to convert the unstructured scientific papers into structured knowledge base, which relies on Information Extraction (IE) to extract entities and their relationships. Most existing IE methods require abundant annotated data, which is time-consuming and expensive to obtain, especially in scientific domain because it requires annotators with domain knowledge. Recently, several works have been proposed to solve the problem by semi-supervised learning. However, these methods require the input sentence to contain only two entities and simply classify the relationship between these two entities. Obviously, it is far from the realistic application scenarios that both entities and relations need to be extracted from raw text. In this paper, we propose a Semi-supervised Transfer Learning (STL) framework to tackle joint entity and relation extraction problem in a low resource situation. Specifically, STL adopts two main strategies: a rebalancing strategy for alleviating the bias to the majority class during semi-supervised learning, and a transfer learning strategy for transferring knowledge from domains with relatively rich annotation to domains that lack annotated data. Experiment results on two public scientific IE datasets show the effectiveness of the proposed method.",

keywords = "Information Extraction, Semi-supervised Learning, Transfer Learning",

author = "Hao Wang and Mao, {Xian Ling} and Heyan Huang",

note = "Publisher Copyright: {\textcopyright} 2022 Copyright for this paper by its author.; 3rd Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents, EEKE 2022 ; Conference date: 23-06-2022 Through 24-06-2022",

year = "2022",

language = "English",

volume = "3210",

pages = "41--47",

journal = "CEUR Workshop Proceedings",

issn = "1613-0073",

publisher = "CEUR-WS",

}

TY - JOUR

T1 - A Semi-supervised Transfer Learning Framework for Low Resource Entity and Relation Extraction in Scientific Domain

AU - Wang, Hao

AU - Mao, Xian Ling

AU - Huang, Heyan

PY - 2022

Y1 - 2022

N2 - With the development of scientific communities, the amount of papers increases quickly. It's important to convert the unstructured scientific papers into structured knowledge base, which relies on Information Extraction (IE) to extract entities and their relationships. Most existing IE methods require abundant annotated data, which is time-consuming and expensive to obtain, especially in scientific domain because it requires annotators with domain knowledge. Recently, several works have been proposed to solve the problem by semi-supervised learning. However, these methods require the input sentence to contain only two entities and simply classify the relationship between these two entities. Obviously, it is far from the realistic application scenarios that both entities and relations need to be extracted from raw text. In this paper, we propose a Semi-supervised Transfer Learning (STL) framework to tackle joint entity and relation extraction problem in a low resource situation. Specifically, STL adopts two main strategies: a rebalancing strategy for alleviating the bias to the majority class during semi-supervised learning, and a transfer learning strategy for transferring knowledge from domains with relatively rich annotation to domains that lack annotated data. Experiment results on two public scientific IE datasets show the effectiveness of the proposed method.

AB - With the development of scientific communities, the amount of papers increases quickly. It's important to convert the unstructured scientific papers into structured knowledge base, which relies on Information Extraction (IE) to extract entities and their relationships. Most existing IE methods require abundant annotated data, which is time-consuming and expensive to obtain, especially in scientific domain because it requires annotators with domain knowledge. Recently, several works have been proposed to solve the problem by semi-supervised learning. However, these methods require the input sentence to contain only two entities and simply classify the relationship between these two entities. Obviously, it is far from the realistic application scenarios that both entities and relations need to be extracted from raw text. In this paper, we propose a Semi-supervised Transfer Learning (STL) framework to tackle joint entity and relation extraction problem in a low resource situation. Specifically, STL adopts two main strategies: a rebalancing strategy for alleviating the bias to the majority class during semi-supervised learning, and a transfer learning strategy for transferring knowledge from domains with relatively rich annotation to domains that lack annotated data. Experiment results on two public scientific IE datasets show the effectiveness of the proposed method.

KW - Information Extraction

KW - Semi-supervised Learning

KW - Transfer Learning

UR - http://www.scopus.com/inward/record.url?scp=85138353661&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:85138353661

SN - 1613-0073

VL - 3210

SP - 41

EP - 47

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

T2 - 3rd Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents, EEKE 2022

Y2 - 23 June 2022 through 24 June 2022

ER -

A Semi-supervised Transfer Learning Framework for Low Resource Entity and Relation Extraction in Scientific Domain

摘要

其它文件与链接

指纹

引用此