Disambiguating author names with embedding heterogeneous information and attentive rnn clustering parameters

Wang Ruolin; Niu Zhendong; Lin Qika; Zhu Yifan; Qiu Ping; Lu Hao; Liu Donglei

doi:10.11925/infotech.2096-3467.2021.0253

Disambiguating author names with embedding heterogeneous information and attentive rnn clustering parameters

Wang Ruolin, Niu Zhendong^*, Lin Qika, Zhu Yifan, Qiu Ping, Lu Hao, Liu Donglei

^*此作品的通讯作者

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

3 引用（Scopus）

摘要

[Objective] This paper proposes a name disambiguation method for scientific literature, aiming to distinguish scholars with the same name. The existing solutions utilizes document feature extraction or relationship between documents and co-authors, which loses higher-order attributes. [Methods] First, we established a unified feature extraction framework of Paper Embedding Network (PaperEmbNet), which combined content and relationship to build an academic heterogeneous information network for each author. Then, we designed a Clustering Parameters Method (AR4CPM) based on the Attentive Recurrent Neural Network to estimate the clustering number directly. Finally, we used the Hierarchical agglomerative clustering algorithm (HAC) to disambiguate author names with the predicted number as the preset parameter. [Results] We examined the proposed model with the AMiner-AND dataset and found the macro-F1 score was up to 4.75% higher than the suboptimal model, and the average training time was 5-10 minutes shorter than the existing baselines. [Limitations] We need to evaluate the performance of the proposed method with multilingual environment. [Conclusions] The proposed approach could effectively conduct the name disambiguation tasks.

源语言	英语
页（从-至）	13-24
页数	12
期刊	Data Analysis and Knowledge Discovery
卷	5
期	8
DOI	https://doi.org/10.11925/infotech.2096-3467.2021.0253
出版状态	已出版 - 2021

访问文件

10.11925/infotech.2096-3467.2021.0253

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{dcbc397a0b19407bbd65d79c073428c5,

title = "Disambiguating author names with embedding heterogeneous information and attentive rnn clustering parameters",

abstract = "[Objective] This paper proposes a name disambiguation method for scientific literature, aiming to distinguish scholars with the same name. The existing solutions utilizes document feature extraction or relationship between documents and co-authors, which loses higher-order attributes. [Methods] First, we established a unified feature extraction framework of Paper Embedding Network (PaperEmbNet), which combined content and relationship to build an academic heterogeneous information network for each author. Then, we designed a Clustering Parameters Method (AR4CPM) based on the Attentive Recurrent Neural Network to estimate the clustering number directly. Finally, we used the Hierarchical agglomerative clustering algorithm (HAC) to disambiguate author names with the predicted number as the preset parameter. [Results] We examined the proposed model with the AMiner-AND dataset and found the macro-F1 score was up to 4.75% higher than the suboptimal model, and the average training time was 5-10 minutes shorter than the existing baselines. [Limitations] We need to evaluate the performance of the proposed method with multilingual environment. [Conclusions] The proposed approach could effectively conduct the name disambiguation tasks.",

keywords = "Academic Heterogeneous Information Network, Clustering, Graph Embedding, Name Disambiguation",

author = "Wang Ruolin and Niu Zhendong and Lin Qika and Zhu Yifan and Qiu Ping and Lu Hao and Liu Donglei",

year = "2021",

doi = "10.11925/infotech.2096-3467.2021.0253",

language = "English",

volume = "5",

pages = "13--24",

journal = "Data Analysis and Knowledge Discovery",

issn = "2096-3467",

publisher = "Chinese Academy of Sciences",

number = "8",

}

TY - JOUR

T1 - Disambiguating author names with embedding heterogeneous information and attentive rnn clustering parameters

AU - Ruolin, Wang

AU - Zhendong, Niu

AU - Qika, Lin

AU - Yifan, Zhu

AU - Ping, Qiu

AU - Hao, Lu

AU - Donglei, Liu

PY - 2021

Y1 - 2021

N2 - [Objective] This paper proposes a name disambiguation method for scientific literature, aiming to distinguish scholars with the same name. The existing solutions utilizes document feature extraction or relationship between documents and co-authors, which loses higher-order attributes. [Methods] First, we established a unified feature extraction framework of Paper Embedding Network (PaperEmbNet), which combined content and relationship to build an academic heterogeneous information network for each author. Then, we designed a Clustering Parameters Method (AR4CPM) based on the Attentive Recurrent Neural Network to estimate the clustering number directly. Finally, we used the Hierarchical agglomerative clustering algorithm (HAC) to disambiguate author names with the predicted number as the preset parameter. [Results] We examined the proposed model with the AMiner-AND dataset and found the macro-F1 score was up to 4.75% higher than the suboptimal model, and the average training time was 5-10 minutes shorter than the existing baselines. [Limitations] We need to evaluate the performance of the proposed method with multilingual environment. [Conclusions] The proposed approach could effectively conduct the name disambiguation tasks.

AB - [Objective] This paper proposes a name disambiguation method for scientific literature, aiming to distinguish scholars with the same name. The existing solutions utilizes document feature extraction or relationship between documents and co-authors, which loses higher-order attributes. [Methods] First, we established a unified feature extraction framework of Paper Embedding Network (PaperEmbNet), which combined content and relationship to build an academic heterogeneous information network for each author. Then, we designed a Clustering Parameters Method (AR4CPM) based on the Attentive Recurrent Neural Network to estimate the clustering number directly. Finally, we used the Hierarchical agglomerative clustering algorithm (HAC) to disambiguate author names with the predicted number as the preset parameter. [Results] We examined the proposed model with the AMiner-AND dataset and found the macro-F1 score was up to 4.75% higher than the suboptimal model, and the average training time was 5-10 minutes shorter than the existing baselines. [Limitations] We need to evaluate the performance of the proposed method with multilingual environment. [Conclusions] The proposed approach could effectively conduct the name disambiguation tasks.

KW - Academic Heterogeneous Information Network

KW - Clustering

KW - Graph Embedding

KW - Name Disambiguation

UR - http://www.scopus.com/inward/record.url?scp=85120372368&partnerID=8YFLogxK

U2 - 10.11925/infotech.2096-3467.2021.0253

DO - 10.11925/infotech.2096-3467.2021.0253

M3 - Article

AN - SCOPUS:85120372368

SN - 2096-3467

VL - 5

SP - 13

EP - 24

JO - Data Analysis and Knowledge Discovery

JF - Data Analysis and Knowledge Discovery

IS - 8

ER -

Disambiguating author names with embedding heterogeneous information and attentive rnn clustering parameters

摘要

访问文件

其它文件与链接

指纹

引用此