基于深度学习的文本中细粒度知识元抽取方法研究

Li Yu; Li Qian; Changlei Fu; Huaming Zhao

doi:10.11925/infotech.2096-3467.2018.1352

基于深度学习的文本中细粒度知识元抽取方法研究

Li Yu, Li Qian^*, Changlei Fu, Huaming Zhao

^*此作品的通讯作者

科研成果: 期刊稿件 › 文章 › 同行评审

8 引用（Scopus）

摘要

[Objective] This paper tries to extract fine-grained knowledge units from texts with a deep learning model based on the modified bootstrapping method. [Methods] First, we built the lexicon for each type of knowledge unit with the help of search engine and keywords from Elsevier. Second, we created a large annotated corpus based on the bootstrapping method. Third, we controlled the quality of annotation with the estimation models of patterns and knowledge units. Finally, we trained the proposed LSTM-CRF model with the annotated corpus, and extracted new knowledge units from texts. [Results] We retrieved four types of knowledge units (study scope, research method, experimental data, as well as evaluation criteria and their values) from 17,756 ACL papers. The average precision was 91%, which was calculated manually. [Limitations] The parameters of models were pre-defined and modified by human. More research is needed to evaluate the performance of this method with texts from other domains. [Conclusions] The proposed model effectively addresses the issue of semantic drifting. It could extract knowledge units precisely, which is an effective solution for the big data acquisition process of intelligence analysis.

投稿的翻译标题	Extracting Fine-grained Knowledge Units from Texts with Deep Learning
源语言	繁体中文
页（从-至）	38-45
页数	8
期刊	Data Analysis and Knowledge Discovery
卷	3
期	1
DOI	https://doi.org/10.11925/infotech.2096-3467.2018.1352
出版状态	已出版 - 1月 2019
已对外发布	是

关键词

Bootstrapping
Deep Learning
Knowledge Unit Extraction
LSTM-CRF
Named Entity Recognition

访问文件

10.11925/infotech.2096-3467.2018.1352

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{b1305ab266844a029a49adf4f9d1d306,

title = "基于深度学习的文本中细粒度知识元抽取方法研究",

abstract = "[Objective] This paper tries to extract fine-grained knowledge units from texts with a deep learning model based on the modified bootstrapping method. [Methods] First, we built the lexicon for each type of knowledge unit with the help of search engine and keywords from Elsevier. Second, we created a large annotated corpus based on the bootstrapping method. Third, we controlled the quality of annotation with the estimation models of patterns and knowledge units. Finally, we trained the proposed LSTM-CRF model with the annotated corpus, and extracted new knowledge units from texts. [Results] We retrieved four types of knowledge units (study scope, research method, experimental data, as well as evaluation criteria and their values) from 17,756 ACL papers. The average precision was 91%, which was calculated manually. [Limitations] The parameters of models were pre-defined and modified by human. More research is needed to evaluate the performance of this method with texts from other domains. [Conclusions] The proposed model effectively addresses the issue of semantic drifting. It could extract knowledge units precisely, which is an effective solution for the big data acquisition process of intelligence analysis.",

keywords = "Bootstrapping, Deep Learning, Knowledge Unit Extraction, LSTM-CRF, Named Entity Recognition",

author = "Li Yu and Li Qian and Changlei Fu and Huaming Zhao",

year = "2019",

month = jan,

doi = "10.11925/infotech.2096-3467.2018.1352",

language = "繁体中文",

volume = "3",

pages = "38--45",

journal = "Data Analysis and Knowledge Discovery",

issn = "2096-3467",

publisher = "Chinese Academy of Sciences",

number = "1",

}

TY - JOUR

T1 - 基于深度学习的文本中细粒度知识元抽取方法研究

AU - Yu, Li

AU - Qian, Li

AU - Fu, Changlei

AU - Zhao, Huaming

PY - 2019/1

Y1 - 2019/1

N2 - [Objective] This paper tries to extract fine-grained knowledge units from texts with a deep learning model based on the modified bootstrapping method. [Methods] First, we built the lexicon for each type of knowledge unit with the help of search engine and keywords from Elsevier. Second, we created a large annotated corpus based on the bootstrapping method. Third, we controlled the quality of annotation with the estimation models of patterns and knowledge units. Finally, we trained the proposed LSTM-CRF model with the annotated corpus, and extracted new knowledge units from texts. [Results] We retrieved four types of knowledge units (study scope, research method, experimental data, as well as evaluation criteria and their values) from 17,756 ACL papers. The average precision was 91%, which was calculated manually. [Limitations] The parameters of models were pre-defined and modified by human. More research is needed to evaluate the performance of this method with texts from other domains. [Conclusions] The proposed model effectively addresses the issue of semantic drifting. It could extract knowledge units precisely, which is an effective solution for the big data acquisition process of intelligence analysis.

AB - [Objective] This paper tries to extract fine-grained knowledge units from texts with a deep learning model based on the modified bootstrapping method. [Methods] First, we built the lexicon for each type of knowledge unit with the help of search engine and keywords from Elsevier. Second, we created a large annotated corpus based on the bootstrapping method. Third, we controlled the quality of annotation with the estimation models of patterns and knowledge units. Finally, we trained the proposed LSTM-CRF model with the annotated corpus, and extracted new knowledge units from texts. [Results] We retrieved four types of knowledge units (study scope, research method, experimental data, as well as evaluation criteria and their values) from 17,756 ACL papers. The average precision was 91%, which was calculated manually. [Limitations] The parameters of models were pre-defined and modified by human. More research is needed to evaluate the performance of this method with texts from other domains. [Conclusions] The proposed model effectively addresses the issue of semantic drifting. It could extract knowledge units precisely, which is an effective solution for the big data acquisition process of intelligence analysis.

KW - Bootstrapping

KW - Deep Learning

KW - Knowledge Unit Extraction

KW - LSTM-CRF

KW - Named Entity Recognition

UR - http://www.scopus.com/inward/record.url?scp=85099079511&partnerID=8YFLogxK

U2 - 10.11925/infotech.2096-3467.2018.1352

DO - 10.11925/infotech.2096-3467.2018.1352

M3 - 文章

AN - SCOPUS:85099079511

SN - 2096-3467

VL - 3

SP - 38

EP - 45

JO - Data Analysis and Knowledge Discovery

JF - Data Analysis and Knowledge Discovery

IS - 1

ER -

基于深度学习的文本中细粒度知识元抽取方法研究

摘要

关键词

访问文件

其它文件与链接

指纹

引用此