Disambiguating author names with embedding heterogeneous information and attentive rnn clustering parameters

Wang Ruolin; Niu Zhendong; Lin Qika; Zhu Yifan; Qiu Ping; Lu Hao; Liu Donglei

doi:10.11925/infotech.2096-3467.2021.0253

Disambiguating author names with embedding heterogeneous information and attentive rnn clustering parameters

Wang Ruolin, Niu Zhendong^*, Lin Qika, Zhu Yifan, Qiu Ping, Lu Hao, Liu Donglei

^*Corresponding author for this work

School of Computer Science and Technology

Research output: Contribution to journal › Article › peer-review

4 Citations (Scopus)

Abstract

[Objective] This paper proposes a name disambiguation method for scientific literature, aiming to distinguish scholars with the same name. The existing solutions utilizes document feature extraction or relationship between documents and co-authors, which loses higher-order attributes. [Methods] First, we established a unified feature extraction framework of Paper Embedding Network (PaperEmbNet), which combined content and relationship to build an academic heterogeneous information network for each author. Then, we designed a Clustering Parameters Method (AR4CPM) based on the Attentive Recurrent Neural Network to estimate the clustering number directly. Finally, we used the Hierarchical agglomerative clustering algorithm (HAC) to disambiguate author names with the predicted number as the preset parameter. [Results] We examined the proposed model with the AMiner-AND dataset and found the macro-F1 score was up to 4.75% higher than the suboptimal model, and the average training time was 5-10 minutes shorter than the existing baselines. [Limitations] We need to evaluate the performance of the proposed method with multilingual environment. [Conclusions] The proposed approach could effectively conduct the name disambiguation tasks.

Original language	English
Pages (from-to)	13-24
Number of pages	12
Journal	Data Analysis and Knowledge Discovery
Volume	5
Issue number	8
DOIs	https://doi.org/10.11925/infotech.2096-3467.2021.0253
Publication status	Published - 2021

Keywords

Academic Heterogeneous Information Network
Clustering
Graph Embedding
Name Disambiguation

Access to Document

10.11925/infotech.2096-3467.2021.0253

Cite this

Ruolin, W., Zhendong, N., Qika, L., Yifan, Z., Ping, Q., Hao, L., & Donglei, L. (2021). Disambiguating author names with embedding heterogeneous information and attentive rnn clustering parameters. Data Analysis and Knowledge Discovery, 5(8), 13-24. https://doi.org/10.11925/infotech.2096-3467.2021.0253

@article{dcbc397a0b19407bbd65d79c073428c5,

title = "Disambiguating author names with embedding heterogeneous information and attentive rnn clustering parameters",

abstract = "[Objective] This paper proposes a name disambiguation method for scientific literature, aiming to distinguish scholars with the same name. The existing solutions utilizes document feature extraction or relationship between documents and co-authors, which loses higher-order attributes. [Methods] First, we established a unified feature extraction framework of Paper Embedding Network (PaperEmbNet), which combined content and relationship to build an academic heterogeneous information network for each author. Then, we designed a Clustering Parameters Method (AR4CPM) based on the Attentive Recurrent Neural Network to estimate the clustering number directly. Finally, we used the Hierarchical agglomerative clustering algorithm (HAC) to disambiguate author names with the predicted number as the preset parameter. [Results] We examined the proposed model with the AMiner-AND dataset and found the macro-F1 score was up to 4.75% higher than the suboptimal model, and the average training time was 5-10 minutes shorter than the existing baselines. [Limitations] We need to evaluate the performance of the proposed method with multilingual environment. [Conclusions] The proposed approach could effectively conduct the name disambiguation tasks.",

keywords = "Academic Heterogeneous Information Network, Clustering, Graph Embedding, Name Disambiguation",

author = "Wang Ruolin and Niu Zhendong and Lin Qika and Zhu Yifan and Qiu Ping and Lu Hao and Liu Donglei",

year = "2021",

doi = "10.11925/infotech.2096-3467.2021.0253",

language = "English",

volume = "5",

pages = "13--24",

journal = "Data Analysis and Knowledge Discovery",

issn = "2096-3467",

publisher = "Chinese Academy of Sciences",

number = "8",

}

TY - JOUR

T1 - Disambiguating author names with embedding heterogeneous information and attentive rnn clustering parameters

AU - Ruolin, Wang

AU - Zhendong, Niu

AU - Qika, Lin

AU - Yifan, Zhu

AU - Ping, Qiu

AU - Hao, Lu

AU - Donglei, Liu

PY - 2021

Y1 - 2021

N2 - [Objective] This paper proposes a name disambiguation method for scientific literature, aiming to distinguish scholars with the same name. The existing solutions utilizes document feature extraction or relationship between documents and co-authors, which loses higher-order attributes. [Methods] First, we established a unified feature extraction framework of Paper Embedding Network (PaperEmbNet), which combined content and relationship to build an academic heterogeneous information network for each author. Then, we designed a Clustering Parameters Method (AR4CPM) based on the Attentive Recurrent Neural Network to estimate the clustering number directly. Finally, we used the Hierarchical agglomerative clustering algorithm (HAC) to disambiguate author names with the predicted number as the preset parameter. [Results] We examined the proposed model with the AMiner-AND dataset and found the macro-F1 score was up to 4.75% higher than the suboptimal model, and the average training time was 5-10 minutes shorter than the existing baselines. [Limitations] We need to evaluate the performance of the proposed method with multilingual environment. [Conclusions] The proposed approach could effectively conduct the name disambiguation tasks.

AB - [Objective] This paper proposes a name disambiguation method for scientific literature, aiming to distinguish scholars with the same name. The existing solutions utilizes document feature extraction or relationship between documents and co-authors, which loses higher-order attributes. [Methods] First, we established a unified feature extraction framework of Paper Embedding Network (PaperEmbNet), which combined content and relationship to build an academic heterogeneous information network for each author. Then, we designed a Clustering Parameters Method (AR4CPM) based on the Attentive Recurrent Neural Network to estimate the clustering number directly. Finally, we used the Hierarchical agglomerative clustering algorithm (HAC) to disambiguate author names with the predicted number as the preset parameter. [Results] We examined the proposed model with the AMiner-AND dataset and found the macro-F1 score was up to 4.75% higher than the suboptimal model, and the average training time was 5-10 minutes shorter than the existing baselines. [Limitations] We need to evaluate the performance of the proposed method with multilingual environment. [Conclusions] The proposed approach could effectively conduct the name disambiguation tasks.

KW - Academic Heterogeneous Information Network

KW - Clustering

KW - Graph Embedding

KW - Name Disambiguation

UR - http://www.scopus.com/inward/record.url?scp=85120372368&partnerID=8YFLogxK

U2 - 10.11925/infotech.2096-3467.2021.0253

DO - 10.11925/infotech.2096-3467.2021.0253

M3 - Article

AN - SCOPUS:85120372368

SN - 2096-3467

VL - 5

SP - 13

EP - 24

JO - Data Analysis and Knowledge Discovery

JF - Data Analysis and Knowledge Discovery

IS - 8

ER -

Disambiguating author names with embedding heterogeneous information and attentive rnn clustering parameters

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this