TY - JOUR
T1 - SDRank
T2 - A shallow-to-deep ranking framework for enhanced unsupervised keyphrase extraction
AU - Xu, Chun
AU - Mao, Xian Ling
AU - Che, Tian Yi
AU - Mao, Hong Li
AU - Huang, Heyan
N1 - Publisher Copyright:
© 2025 Elsevier Ltd
PY - 2025/5/5
Y1 - 2025/5/5
N2 - Keyphrase Extraction (KE) aims to automatically identify a concise set of single-word or multi-word expressions that summarize the core content of a document. Most state-of-the-art models estimate the importance of candidate phrases by modeling deep semantic relationships between them and their documents and/or among themselves. However, these models struggle to effectively select potential candidates in long documents due to the large number of noisy candidates that distort semantic relevance. To address this issue, we introduce a novel ranking framework called SDRank, which jointly models the shallow and deep semantic information of candidates. The core principle of SDRank is to align the shallow and deep semantics of candidates to enhance their overall relevance. Specifically, SDRank first calculates the shallow semantic relevance of a candidate by analyzing word-overlap similarity between the candidate and other candidates. Next, deep semantic relevance is calculated by context similarity, as with most existing methods. Finally, SDRank combines these two relevance measures with positional information to rank candidates. We extensively test SDRank across three benchmark datasets — Inspec, SemEval 2010, and DUC 2001 — which vary in document length and domain. The results demonstrate that SDRank consistently outperforms robust unsupervised models, highlighting the benefits of fusing diverse semantic relationships in unsupervised keyphrase extraction (UKE). Additionally, when applied to three strong baselines, SDRank shows significant performance improvements on two long-document datasets, demonstrating the model's adaptability.
AB - Keyphrase Extraction (KE) aims to automatically identify a concise set of single-word or multi-word expressions that summarize the core content of a document. Most state-of-the-art models estimate the importance of candidate phrases by modeling deep semantic relationships between them and their documents and/or among themselves. However, these models struggle to effectively select potential candidates in long documents due to the large number of noisy candidates that distort semantic relevance. To address this issue, we introduce a novel ranking framework called SDRank, which jointly models the shallow and deep semantic information of candidates. The core principle of SDRank is to align the shallow and deep semantics of candidates to enhance their overall relevance. Specifically, SDRank first calculates the shallow semantic relevance of a candidate by analyzing word-overlap similarity between the candidate and other candidates. Next, deep semantic relevance is calculated by context similarity, as with most existing methods. Finally, SDRank combines these two relevance measures with positional information to rank candidates. We extensively test SDRank across three benchmark datasets — Inspec, SemEval 2010, and DUC 2001 — which vary in document length and domain. The results demonstrate that SDRank consistently outperforms robust unsupervised models, highlighting the benefits of fusing diverse semantic relationships in unsupervised keyphrase extraction (UKE). Additionally, when applied to three strong baselines, SDRank shows significant performance improvements on two long-document datasets, demonstrating the model's adaptability.
KW - Deep relevance
KW - Degree centrality
KW - Shallow relevance
KW - Unsupervised keyphrase extraction
KW - Word overlapping
UR - http://www.scopus.com/inward/record.url?scp=85217680322&partnerID=8YFLogxK
U2 - 10.1016/j.eswa.2025.126748
DO - 10.1016/j.eswa.2025.126748
M3 - Article
AN - SCOPUS:85217680322
SN - 0957-4174
VL - 272
JO - Expert Systems with Applications
JF - Expert Systems with Applications
M1 - 126748
ER -