SDRank: A shallow-to-deep ranking framework for enhanced unsupervised keyphrase extraction

Chun Xu, Xian Ling Mao*, Tian Yi Che, Hong Li Mao, Heyan Huang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Keyphrase Extraction (KE) aims to automatically identify a concise set of single-word or multi-word expressions that summarize the core content of a document. Most state-of-the-art models estimate the importance of candidate phrases by modeling deep semantic relationships between them and their documents and/or among themselves. However, these models struggle to effectively select potential candidates in long documents due to the large number of noisy candidates that distort semantic relevance. To address this issue, we introduce a novel ranking framework called SDRank, which jointly models the shallow and deep semantic information of candidates. The core principle of SDRank is to align the shallow and deep semantics of candidates to enhance their overall relevance. Specifically, SDRank first calculates the shallow semantic relevance of a candidate by analyzing word-overlap similarity between the candidate and other candidates. Next, deep semantic relevance is calculated by context similarity, as with most existing methods. Finally, SDRank combines these two relevance measures with positional information to rank candidates. We extensively test SDRank across three benchmark datasets — Inspec, SemEval 2010, and DUC 2001 — which vary in document length and domain. The results demonstrate that SDRank consistently outperforms robust unsupervised models, highlighting the benefits of fusing diverse semantic relationships in unsupervised keyphrase extraction (UKE). Additionally, when applied to three strong baselines, SDRank shows significant performance improvements on two long-document datasets, demonstrating the model's adaptability.

Original languageEnglish
Article number126748
JournalExpert Systems with Applications
Volume272
DOIs
Publication statusPublished - 5 May 2025

Keywords

  • Deep relevance
  • Degree centrality
  • Shallow relevance
  • Unsupervised keyphrase extraction
  • Word overlapping

Cite this