TY - JOUR
T1 - 基于深度学习的文本中细粒度知识元抽取方法研究
AU - Yu, Li
AU - Qian, Li
AU - Fu, Changlei
AU - Zhao, Huaming
N1 - Publisher Copyright:
© 2019 Chinese Academy of Sciences. All rights reserved.
PY - 2019/1
Y1 - 2019/1
N2 - [Objective] This paper tries to extract fine-grained knowledge units from texts with a deep learning model based on the modified bootstrapping method. [Methods] First, we built the lexicon for each type of knowledge unit with the help of search engine and keywords from Elsevier. Second, we created a large annotated corpus based on the bootstrapping method. Third, we controlled the quality of annotation with the estimation models of patterns and knowledge units. Finally, we trained the proposed LSTM-CRF model with the annotated corpus, and extracted new knowledge units from texts. [Results] We retrieved four types of knowledge units (study scope, research method, experimental data, as well as evaluation criteria and their values) from 17,756 ACL papers. The average precision was 91%, which was calculated manually. [Limitations] The parameters of models were pre-defined and modified by human. More research is needed to evaluate the performance of this method with texts from other domains. [Conclusions] The proposed model effectively addresses the issue of semantic drifting. It could extract knowledge units precisely, which is an effective solution for the big data acquisition process of intelligence analysis.
AB - [Objective] This paper tries to extract fine-grained knowledge units from texts with a deep learning model based on the modified bootstrapping method. [Methods] First, we built the lexicon for each type of knowledge unit with the help of search engine and keywords from Elsevier. Second, we created a large annotated corpus based on the bootstrapping method. Third, we controlled the quality of annotation with the estimation models of patterns and knowledge units. Finally, we trained the proposed LSTM-CRF model with the annotated corpus, and extracted new knowledge units from texts. [Results] We retrieved four types of knowledge units (study scope, research method, experimental data, as well as evaluation criteria and their values) from 17,756 ACL papers. The average precision was 91%, which was calculated manually. [Limitations] The parameters of models were pre-defined and modified by human. More research is needed to evaluate the performance of this method with texts from other domains. [Conclusions] The proposed model effectively addresses the issue of semantic drifting. It could extract knowledge units precisely, which is an effective solution for the big data acquisition process of intelligence analysis.
KW - Bootstrapping
KW - Deep Learning
KW - Knowledge Unit Extraction
KW - LSTM-CRF
KW - Named Entity Recognition
UR - http://www.scopus.com/inward/record.url?scp=85099079511&partnerID=8YFLogxK
U2 - 10.11925/infotech.2096-3467.2018.1352
DO - 10.11925/infotech.2096-3467.2018.1352
M3 - 文章
AN - SCOPUS:85099079511
SN - 2096-3467
VL - 3
SP - 38
EP - 45
JO - Data Analysis and Knowledge Discovery
JF - Data Analysis and Knowledge Discovery
IS - 1
ER -