TY - JOUR
T1 - APIE
T2 - An information extraction module designed based on the pipeline method
AU - Jiang, Xu
AU - Cheng, Yurong
AU - Zhang, Siyi
AU - Wang, Juan
AU - Ma, Baoquan
N1 - Publisher Copyright:
© 2023 The Author(s)
PY - 2024/3
Y1 - 2024/3
N2 - Information extraction (IE) aims to discover and extract valuable information from unstructured text. This problem can be decomposed into two subtasks: named entity recognition (NER) and relation extraction (RE). Although the IE problem has been studied for years, most work efforts focused on jointly modeling these two subtasks, either by casting them into a structured prediction framework or by performing multitask learning through shared representations. However, since the contextual representations of entity and relation models inherently capture different feature information, sharing a single encoder to capture the information required by both subtasks in the same space would harm the accuracy of the model. Recent research (Zhong and Chen, 2020) has also proved that using two separate encoders for NER and RE tasks respectively through pipeline method are effective, with the model surpassing all previous joint models in accuracy. Thus, in this paper, we design An Pipeline method Information Extraction module called APIE, APIE combines the advantages of both pipeline methods and joint methods, demonstrating higher accuracy and powerful reasoning abilities. Specifically, we design a multi-level feature NER model based on attention mechanism and a document-level RE model based on local context pooling. To demonstrate the effectiveness of our proposed approach, we conducted tests on multiple datasets. Extensive experimental results have shown that our proposed model outperforms state-of-the-art methods and improves both accuracy and reasoning abilities.
AB - Information extraction (IE) aims to discover and extract valuable information from unstructured text. This problem can be decomposed into two subtasks: named entity recognition (NER) and relation extraction (RE). Although the IE problem has been studied for years, most work efforts focused on jointly modeling these two subtasks, either by casting them into a structured prediction framework or by performing multitask learning through shared representations. However, since the contextual representations of entity and relation models inherently capture different feature information, sharing a single encoder to capture the information required by both subtasks in the same space would harm the accuracy of the model. Recent research (Zhong and Chen, 2020) has also proved that using two separate encoders for NER and RE tasks respectively through pipeline method are effective, with the model surpassing all previous joint models in accuracy. Thus, in this paper, we design An Pipeline method Information Extraction module called APIE, APIE combines the advantages of both pipeline methods and joint methods, demonstrating higher accuracy and powerful reasoning abilities. Specifically, we design a multi-level feature NER model based on attention mechanism and a document-level RE model based on local context pooling. To demonstrate the effectiveness of our proposed approach, we conducted tests on multiple datasets. Extensive experimental results have shown that our proposed model outperforms state-of-the-art methods and improves both accuracy and reasoning abilities.
KW - Information extraction
KW - Knowledge graph
KW - Named Entity Recognition
KW - Relation extraction
KW - Representation learning
UR - http://www.scopus.com/inward/record.url?scp=85179000461&partnerID=8YFLogxK
U2 - 10.1016/j.array.2023.100331
DO - 10.1016/j.array.2023.100331
M3 - Article
AN - SCOPUS:85179000461
SN - 2590-0056
VL - 21
JO - Array
JF - Array
M1 - 100331
ER -