Does deep learning help topic extraction? A kernel k-means clustering method with word embedding

Yi Zhang; Jie Lu; Feng Liu; Qian Liu; Alan Porter; Hongshu Chen; Guangquan Zhang

doi:10.1016/j.joi.2018.09.004

Does deep learning help topic extraction? A kernel k-means clustering method with word embedding

Yi Zhang, Jie Lu, Feng Liu, Qian Liu, Alan Porter, Hongshu Chen^*, Guangquan Zhang

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

101 Citations (Scopus)

Abstract

Topic extraction presents challenges for the bibliometric community, and its performance still depends on human intervention and its practical areas. This paper proposes a novel kernel k-means clustering method incorporated with a word embedding model to create a solution that effectively extracts topics from bibliometric data. The experimental results of a comparison of this method with four clustering baselines (i.e., k-means, fuzzy c-means, principal component analysis, and topic models) on two bibliometric datasets demonstrate its effectiveness across either a relatively broad range of disciplines or a given domain. An empirical study on bibliometric topic extraction from articles published by three top-tier bibliometric journals between 2000 and 2017, supported by expert knowledge-based evaluations, provides supplemental evidence of the method's ability on topic extraction. Additionally, this empirical analysis reveals insights into both overlapping and diverse research interests among the three journals that would benefit journal publishers, editorial boards, and research communities.

Original language	English
Pages (from-to)	1099-1117
Number of pages	19
Journal	Journal of Informetrics
Volume	12
Issue number	4
DOIs	https://doi.org/10.1016/j.joi.2018.09.004
Publication status	Published - Nov 2018
Externally published	Yes

Keywords

Bibliometrics
Cluster analysis
Text mining
Topic analysis

Access to Document

10.1016/j.joi.2018.09.004

Cite this

@article{5e5a3464dc394868ac1f4f513a3e04df,

title = "Does deep learning help topic extraction? A kernel k-means clustering method with word embedding",

abstract = "Topic extraction presents challenges for the bibliometric community, and its performance still depends on human intervention and its practical areas. This paper proposes a novel kernel k-means clustering method incorporated with a word embedding model to create a solution that effectively extracts topics from bibliometric data. The experimental results of a comparison of this method with four clustering baselines (i.e., k-means, fuzzy c-means, principal component analysis, and topic models) on two bibliometric datasets demonstrate its effectiveness across either a relatively broad range of disciplines or a given domain. An empirical study on bibliometric topic extraction from articles published by three top-tier bibliometric journals between 2000 and 2017, supported by expert knowledge-based evaluations, provides supplemental evidence of the method's ability on topic extraction. Additionally, this empirical analysis reveals insights into both overlapping and diverse research interests among the three journals that would benefit journal publishers, editorial boards, and research communities.",

keywords = "Bibliometrics, Cluster analysis, Text mining, Topic analysis",

author = "Yi Zhang and Jie Lu and Feng Liu and Qian Liu and Alan Porter and Hongshu Chen and Guangquan Zhang",

year = "2018",

month = nov,

doi = "10.1016/j.joi.2018.09.004",

language = "English",

volume = "12",

pages = "1099--1117",

journal = "Journal of Informetrics",

issn = "1751-1577",

publisher = "Elsevier B.V.",

number = "4",

}

TY - JOUR

T1 - Does deep learning help topic extraction? A kernel k-means clustering method with word embedding

AU - Zhang, Yi

AU - Lu, Jie

AU - Liu, Feng

AU - Liu, Qian

AU - Porter, Alan

AU - Chen, Hongshu

AU - Zhang, Guangquan

PY - 2018/11

Y1 - 2018/11

N2 - Topic extraction presents challenges for the bibliometric community, and its performance still depends on human intervention and its practical areas. This paper proposes a novel kernel k-means clustering method incorporated with a word embedding model to create a solution that effectively extracts topics from bibliometric data. The experimental results of a comparison of this method with four clustering baselines (i.e., k-means, fuzzy c-means, principal component analysis, and topic models) on two bibliometric datasets demonstrate its effectiveness across either a relatively broad range of disciplines or a given domain. An empirical study on bibliometric topic extraction from articles published by three top-tier bibliometric journals between 2000 and 2017, supported by expert knowledge-based evaluations, provides supplemental evidence of the method's ability on topic extraction. Additionally, this empirical analysis reveals insights into both overlapping and diverse research interests among the three journals that would benefit journal publishers, editorial boards, and research communities.

AB - Topic extraction presents challenges for the bibliometric community, and its performance still depends on human intervention and its practical areas. This paper proposes a novel kernel k-means clustering method incorporated with a word embedding model to create a solution that effectively extracts topics from bibliometric data. The experimental results of a comparison of this method with four clustering baselines (i.e., k-means, fuzzy c-means, principal component analysis, and topic models) on two bibliometric datasets demonstrate its effectiveness across either a relatively broad range of disciplines or a given domain. An empirical study on bibliometric topic extraction from articles published by three top-tier bibliometric journals between 2000 and 2017, supported by expert knowledge-based evaluations, provides supplemental evidence of the method's ability on topic extraction. Additionally, this empirical analysis reveals insights into both overlapping and diverse research interests among the three journals that would benefit journal publishers, editorial boards, and research communities.

KW - Bibliometrics

KW - Cluster analysis

KW - Text mining

KW - Topic analysis

UR - http://www.scopus.com/inward/record.url?scp=85053768711&partnerID=8YFLogxK

U2 - 10.1016/j.joi.2018.09.004

DO - 10.1016/j.joi.2018.09.004

M3 - Article

AN - SCOPUS:85053768711

SN - 1751-1577

VL - 12

SP - 1099

EP - 1117

JO - Journal of Informetrics

JF - Journal of Informetrics

IS - 4

ER -

Does deep learning help topic extraction? A kernel k-means clustering method with word embedding

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this