Leveraging Concept-Enhanced Pre-Training Model and Masked-Entity Language Model for Named Entity Disambiguation

Zizheng Ji; Lin Dai; Jin Pang; Tingting Shen

doi:10.1109/ACCESS.2020.2994247

Leveraging Concept-Enhanced Pre-Training Model and Masked-Entity Language Model for Named Entity Disambiguation

Zizheng Ji^*, Lin Dai, Jin Pang, Tingting Shen

^*此作品的通讯作者

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

6 引用（Scopus）

摘要

Named Entity Disambiguation (NED) refers to the task of resolving multiple named entity mentions in an input-text sequence to their correct references in a knowledge graph. We tackle NED problem by leveraging two novel objectives for pre-training framework, and propose a novel pre-training NED model. Especially, the proposed pre-training NED model consists of: (i) concept-enhanced pre-training, aiming at identifying valid lexical semantic relations with the concept semantic constraints derived from external resource Probase; and (ii) masked entity language model, aiming to train the contextualized embedding by predicting randomly masked entities based on words and non-masked entities in the given input-text. Therefore, the proposed pre-training NED model could merge the advantage of pre-training mechanism for generating contextualized embedding with the superiority of the lexical knowledge (e.g., concept knowledge emphasized here) for understanding language semantic. We conduct experiments on the CoNLL dataset and TAC dataset, and various datasets provided by GERBIL platform. The experimental results demonstrate that the proposed model achieves significantly higher performance than previous models.

源语言	英语
文章编号	9091850
页（从-至）	100469-100484
页数	16
期刊	IEEE Access
卷	8
DOI	https://doi.org/10.1109/ACCESS.2020.2994247
出版状态	已出版 - 2020

访问文件

10.1109/ACCESS.2020.2994247

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{ff0794438a5c43f0864255f43bd36239,

title = "Leveraging Concept-Enhanced Pre-Training Model and Masked-Entity Language Model for Named Entity Disambiguation",

abstract = "Named Entity Disambiguation (NED) refers to the task of resolving multiple named entity mentions in an input-text sequence to their correct references in a knowledge graph. We tackle NED problem by leveraging two novel objectives for pre-training framework, and propose a novel pre-training NED model. Especially, the proposed pre-training NED model consists of: (i) concept-enhanced pre-training, aiming at identifying valid lexical semantic relations with the concept semantic constraints derived from external resource Probase; and (ii) masked entity language model, aiming to train the contextualized embedding by predicting randomly masked entities based on words and non-masked entities in the given input-text. Therefore, the proposed pre-training NED model could merge the advantage of pre-training mechanism for generating contextualized embedding with the superiority of the lexical knowledge (e.g., concept knowledge emphasized here) for understanding language semantic. We conduct experiments on the CoNLL dataset and TAC dataset, and various datasets provided by GERBIL platform. The experimental results demonstrate that the proposed model achieves significantly higher performance than previous models.",

keywords = "Named entity disambiguation, lexical knowledge, pre-training",

author = "Zizheng Ji and Lin Dai and Jin Pang and Tingting Shen",

note = "Publisher Copyright: {\textcopyright} 2020 IEEE.",

year = "2020",

doi = "10.1109/ACCESS.2020.2994247",

language = "English",

volume = "8",

pages = "100469--100484",

journal = "IEEE Access",

issn = "2169-3536",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Leveraging Concept-Enhanced Pre-Training Model and Masked-Entity Language Model for Named Entity Disambiguation

AU - Ji, Zizheng

AU - Dai, Lin

AU - Pang, Jin

AU - Shen, Tingting

PY - 2020

Y1 - 2020

N2 - Named Entity Disambiguation (NED) refers to the task of resolving multiple named entity mentions in an input-text sequence to their correct references in a knowledge graph. We tackle NED problem by leveraging two novel objectives for pre-training framework, and propose a novel pre-training NED model. Especially, the proposed pre-training NED model consists of: (i) concept-enhanced pre-training, aiming at identifying valid lexical semantic relations with the concept semantic constraints derived from external resource Probase; and (ii) masked entity language model, aiming to train the contextualized embedding by predicting randomly masked entities based on words and non-masked entities in the given input-text. Therefore, the proposed pre-training NED model could merge the advantage of pre-training mechanism for generating contextualized embedding with the superiority of the lexical knowledge (e.g., concept knowledge emphasized here) for understanding language semantic. We conduct experiments on the CoNLL dataset and TAC dataset, and various datasets provided by GERBIL platform. The experimental results demonstrate that the proposed model achieves significantly higher performance than previous models.

AB - Named Entity Disambiguation (NED) refers to the task of resolving multiple named entity mentions in an input-text sequence to their correct references in a knowledge graph. We tackle NED problem by leveraging two novel objectives for pre-training framework, and propose a novel pre-training NED model. Especially, the proposed pre-training NED model consists of: (i) concept-enhanced pre-training, aiming at identifying valid lexical semantic relations with the concept semantic constraints derived from external resource Probase; and (ii) masked entity language model, aiming to train the contextualized embedding by predicting randomly masked entities based on words and non-masked entities in the given input-text. Therefore, the proposed pre-training NED model could merge the advantage of pre-training mechanism for generating contextualized embedding with the superiority of the lexical knowledge (e.g., concept knowledge emphasized here) for understanding language semantic. We conduct experiments on the CoNLL dataset and TAC dataset, and various datasets provided by GERBIL platform. The experimental results demonstrate that the proposed model achieves significantly higher performance than previous models.

KW - Named entity disambiguation

KW - lexical knowledge

KW - pre-training

UR - http://www.scopus.com/inward/record.url?scp=85086635606&partnerID=8YFLogxK

U2 - 10.1109/ACCESS.2020.2994247

DO - 10.1109/ACCESS.2020.2994247

M3 - Article

AN - SCOPUS:85086635606

SN - 2169-3536

VL - 8

SP - 100469

EP - 100484

JO - IEEE Access

JF - IEEE Access

M1 - 9091850

ER -

Leveraging Concept-Enhanced Pre-Training Model and Masked-Entity Language Model for Named Entity Disambiguation

摘要

访问文件

其它文件与链接

指纹

引用此