TY - JOUR
T1 - HCUKE
T2 - A Hierarchical Context-aware approach for Unsupervised Keyphrase Extraction
AU - Xu, Chun
AU - Mao, Xian Ling
AU - Xin, Cheng Xin
AU - Shang, Yu Ming
AU - Che, Tian Yi
AU - Mao, Hong Li
AU - Huang, Heyan
N1 - Publisher Copyright:
© 2024 Elsevier B.V.
PY - 2024/11/25
Y1 - 2024/11/25
N2 - Keyphrase Extraction (KE) aims to identify a concise set of words or phrases that effectively summarizes the core ideas of a document. Recent embedding-based models have achieved state-of-the-art performance by jointly modeling local and global contexts in Unsupervised Keyphrase Extraction (UKE). However, these models often ignore either sentence- or document-level contexts, leading directly to weak or incorrect global significance. Furthermore, they rely heavily on local significance, making them vulnerable to noisy data, particularly in long documents, resulting in unstable and suboptimal performance. Intuitively, hierarchical contexts enable a more accurate understanding of the candidates, thereby enhancing their global relevance. Inspired by this, we propose a novel Hierarchical Context-aware Unsupervised Keyphrase Extraction method called HCUKE. Specifically, HCUKE comprises three core modules: (i) a hierarchical context-based global significance measure module that incrementally learns global semantic information from a three-level hierarchical structure; (ii) a phrase-level local significance measure module that captures local semantic information by modeling the context interaction among candidates; and (iii) a candidate ranking module that integrates the measure scores with positional weights to compute a final ranking score. Extensive experiments on three benchmark datasets demonstrate that the proposed method significantly outperforms state-of-the-art baselines.
AB - Keyphrase Extraction (KE) aims to identify a concise set of words or phrases that effectively summarizes the core ideas of a document. Recent embedding-based models have achieved state-of-the-art performance by jointly modeling local and global contexts in Unsupervised Keyphrase Extraction (UKE). However, these models often ignore either sentence- or document-level contexts, leading directly to weak or incorrect global significance. Furthermore, they rely heavily on local significance, making them vulnerable to noisy data, particularly in long documents, resulting in unstable and suboptimal performance. Intuitively, hierarchical contexts enable a more accurate understanding of the candidates, thereby enhancing their global relevance. Inspired by this, we propose a novel Hierarchical Context-aware Unsupervised Keyphrase Extraction method called HCUKE. Specifically, HCUKE comprises three core modules: (i) a hierarchical context-based global significance measure module that incrementally learns global semantic information from a three-level hierarchical structure; (ii) a phrase-level local significance measure module that captures local semantic information by modeling the context interaction among candidates; and (iii) a candidate ranking module that integrates the measure scores with positional weights to compute a final ranking score. Extensive experiments on three benchmark datasets demonstrate that the proposed method significantly outperforms state-of-the-art baselines.
KW - Contextual embedding
KW - Global significance
KW - Hierarchical context
KW - Unsupervised Keyphrase Extraction
UR - http://www.scopus.com/inward/record.url?scp=85203659857&partnerID=8YFLogxK
U2 - 10.1016/j.knosys.2024.112511
DO - 10.1016/j.knosys.2024.112511
M3 - Article
AN - SCOPUS:85203659857
SN - 0950-7051
VL - 304
JO - Knowledge-Based Systems
JF - Knowledge-Based Systems
M1 - 112511
ER -