APIE: An information extraction module designed based on the pipeline method

Xu Jiang; Yurong Cheng; Siyi Zhang; Juan Wang; Baoquan Ma

doi:10.1016/j.array.2023.100331

APIE: An information extraction module designed based on the pipeline method

Xu Jiang, Yurong Cheng^*, Siyi Zhang, Juan Wang, Baoquan Ma

^*此作品的通讯作者

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

5 引用（Scopus）

摘要

Information extraction (IE) aims to discover and extract valuable information from unstructured text. This problem can be decomposed into two subtasks: named entity recognition (NER) and relation extraction (RE). Although the IE problem has been studied for years, most work efforts focused on jointly modeling these two subtasks, either by casting them into a structured prediction framework or by performing multitask learning through shared representations. However, since the contextual representations of entity and relation models inherently capture different feature information, sharing a single encoder to capture the information required by both subtasks in the same space would harm the accuracy of the model. Recent research (Zhong and Chen, 2020) has also proved that using two separate encoders for NER and RE tasks respectively through pipeline method are effective, with the model surpassing all previous joint models in accuracy. Thus, in this paper, we design An Pipeline method Information Extraction module called APIE, APIE combines the advantages of both pipeline methods and joint methods, demonstrating higher accuracy and powerful reasoning abilities. Specifically, we design a multi-level feature NER model based on attention mechanism and a document-level RE model based on local context pooling. To demonstrate the effectiveness of our proposed approach, we conducted tests on multiple datasets. Extensive experimental results have shown that our proposed model outperforms state-of-the-art methods and improves both accuracy and reasoning abilities.

源语言	英语
文章编号	100331
期刊	Array
卷	21
DOI	https://doi.org/10.1016/j.array.2023.100331
出版状态	已出版 - 3月 2024

访问文件

10.1016/j.array.2023.100331

其它文件与链接

链接到 Scopus 的出版物

引用此

Jiang, X., Cheng, Y., Zhang, S., Wang, J., & Ma, B. (2024). APIE: An information extraction module designed based on the pipeline method. Array, 21, 文章 100331. https://doi.org/10.1016/j.array.2023.100331

@article{e091e442260644848bffd8eebaddfddc,

title = "APIE: An information extraction module designed based on the pipeline method",

abstract = "Information extraction (IE) aims to discover and extract valuable information from unstructured text. This problem can be decomposed into two subtasks: named entity recognition (NER) and relation extraction (RE). Although the IE problem has been studied for years, most work efforts focused on jointly modeling these two subtasks, either by casting them into a structured prediction framework or by performing multitask learning through shared representations. However, since the contextual representations of entity and relation models inherently capture different feature information, sharing a single encoder to capture the information required by both subtasks in the same space would harm the accuracy of the model. Recent research (Zhong and Chen, 2020) has also proved that using two separate encoders for NER and RE tasks respectively through pipeline method are effective, with the model surpassing all previous joint models in accuracy. Thus, in this paper, we design An Pipeline method Information Extraction module called APIE, APIE combines the advantages of both pipeline methods and joint methods, demonstrating higher accuracy and powerful reasoning abilities. Specifically, we design a multi-level feature NER model based on attention mechanism and a document-level RE model based on local context pooling. To demonstrate the effectiveness of our proposed approach, we conducted tests on multiple datasets. Extensive experimental results have shown that our proposed model outperforms state-of-the-art methods and improves both accuracy and reasoning abilities.",

keywords = "Information extraction, Knowledge graph, Named Entity Recognition, Relation extraction, Representation learning",

author = "Xu Jiang and Yurong Cheng and Siyi Zhang and Juan Wang and Baoquan Ma",

note = "Publisher Copyright: {\textcopyright} 2023 The Author(s)",

year = "2024",

month = mar,

doi = "10.1016/j.array.2023.100331",

language = "English",

volume = "21",

journal = "Array",

issn = "2590-0056",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - APIE

T2 - An information extraction module designed based on the pipeline method

AU - Jiang, Xu

AU - Cheng, Yurong

AU - Zhang, Siyi

AU - Wang, Juan

AU - Ma, Baoquan

PY - 2024/3

Y1 - 2024/3

N2 - Information extraction (IE) aims to discover and extract valuable information from unstructured text. This problem can be decomposed into two subtasks: named entity recognition (NER) and relation extraction (RE). Although the IE problem has been studied for years, most work efforts focused on jointly modeling these two subtasks, either by casting them into a structured prediction framework or by performing multitask learning through shared representations. However, since the contextual representations of entity and relation models inherently capture different feature information, sharing a single encoder to capture the information required by both subtasks in the same space would harm the accuracy of the model. Recent research (Zhong and Chen, 2020) has also proved that using two separate encoders for NER and RE tasks respectively through pipeline method are effective, with the model surpassing all previous joint models in accuracy. Thus, in this paper, we design An Pipeline method Information Extraction module called APIE, APIE combines the advantages of both pipeline methods and joint methods, demonstrating higher accuracy and powerful reasoning abilities. Specifically, we design a multi-level feature NER model based on attention mechanism and a document-level RE model based on local context pooling. To demonstrate the effectiveness of our proposed approach, we conducted tests on multiple datasets. Extensive experimental results have shown that our proposed model outperforms state-of-the-art methods and improves both accuracy and reasoning abilities.

AB - Information extraction (IE) aims to discover and extract valuable information from unstructured text. This problem can be decomposed into two subtasks: named entity recognition (NER) and relation extraction (RE). Although the IE problem has been studied for years, most work efforts focused on jointly modeling these two subtasks, either by casting them into a structured prediction framework or by performing multitask learning through shared representations. However, since the contextual representations of entity and relation models inherently capture different feature information, sharing a single encoder to capture the information required by both subtasks in the same space would harm the accuracy of the model. Recent research (Zhong and Chen, 2020) has also proved that using two separate encoders for NER and RE tasks respectively through pipeline method are effective, with the model surpassing all previous joint models in accuracy. Thus, in this paper, we design An Pipeline method Information Extraction module called APIE, APIE combines the advantages of both pipeline methods and joint methods, demonstrating higher accuracy and powerful reasoning abilities. Specifically, we design a multi-level feature NER model based on attention mechanism and a document-level RE model based on local context pooling. To demonstrate the effectiveness of our proposed approach, we conducted tests on multiple datasets. Extensive experimental results have shown that our proposed model outperforms state-of-the-art methods and improves both accuracy and reasoning abilities.

KW - Information extraction

KW - Knowledge graph

KW - Named Entity Recognition

KW - Relation extraction

KW - Representation learning

UR - http://www.scopus.com/inward/record.url?scp=85179000461&partnerID=8YFLogxK

U2 - 10.1016/j.array.2023.100331

DO - 10.1016/j.array.2023.100331

M3 - Article

AN - SCOPUS:85179000461

SN - 2590-0056

VL - 21

JO - Array

JF - Array

M1 - 100331

ER -

APIE: An information extraction module designed based on the pipeline method

摘要

访问文件

其它文件与链接

指纹

引用此