Disambiguating author names with embedding heterogeneous information and attentive rnn clustering parameters

Wang Ruolin, Niu Zhendong*, Lin Qika, Zhu Yifan, Qiu Ping, Lu Hao, Liu Donglei

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

3 Citations (Scopus)

Abstract

[Objective] This paper proposes a name disambiguation method for scientific literature, aiming to distinguish scholars with the same name. The existing solutions utilizes document feature extraction or relationship between documents and co-authors, which loses higher-order attributes. [Methods] First, we established a unified feature extraction framework of Paper Embedding Network (PaperEmbNet), which combined content and relationship to build an academic heterogeneous information network for each author. Then, we designed a Clustering Parameters Method (AR4CPM) based on the Attentive Recurrent Neural Network to estimate the clustering number directly. Finally, we used the Hierarchical agglomerative clustering algorithm (HAC) to disambiguate author names with the predicted number as the preset parameter. [Results] We examined the proposed model with the AMiner-AND dataset and found the macro-F1 score was up to 4.75% higher than the suboptimal model, and the average training time was 5-10 minutes shorter than the existing baselines. [Limitations] We need to evaluate the performance of the proposed method with multilingual environment. [Conclusions] The proposed approach could effectively conduct the name disambiguation tasks.

Original languageEnglish
Pages (from-to)13-24
Number of pages12
JournalData Analysis and Knowledge Discovery
Volume5
Issue number8
DOIs
Publication statusPublished - 2021

Keywords

  • Academic Heterogeneous Information Network
  • Clustering
  • Graph Embedding
  • Name Disambiguation

Fingerprint

Dive into the research topics of 'Disambiguating author names with embedding heterogeneous information and attentive rnn clustering parameters'. Together they form a unique fingerprint.

Cite this