Abstract
[Objective] This paper proposes a name disambiguation method for scientific literature, aiming to distinguish scholars with the same name. The existing solutions utilizes document feature extraction or relationship between documents and co-authors, which loses higher-order attributes. [Methods] First, we established a unified feature extraction framework of Paper Embedding Network (PaperEmbNet), which combined content and relationship to build an academic heterogeneous information network for each author. Then, we designed a Clustering Parameters Method (AR4CPM) based on the Attentive Recurrent Neural Network to estimate the clustering number directly. Finally, we used the Hierarchical agglomerative clustering algorithm (HAC) to disambiguate author names with the predicted number as the preset parameter. [Results] We examined the proposed model with the AMiner-AND dataset and found the macro-F1 score was up to 4.75% higher than the suboptimal model, and the average training time was 5-10 minutes shorter than the existing baselines. [Limitations] We need to evaluate the performance of the proposed method with multilingual environment. [Conclusions] The proposed approach could effectively conduct the name disambiguation tasks.
Original language | English |
---|---|
Pages (from-to) | 13-24 |
Number of pages | 12 |
Journal | Data Analysis and Knowledge Discovery |
Volume | 5 |
Issue number | 8 |
DOIs | |
Publication status | Published - 2021 |
Keywords
- Academic Heterogeneous Information Network
- Clustering
- Graph Embedding
- Name Disambiguation