摘要
Keyword extraction is a key basic problem in the field of natural language processing. The keyphrase extraction algorithms(PhraseVecRank) is proposed based on phrase embedding. Firstly, a phrase vector construction model based on LSTM(Long Short-Term Memory) and CNN(Convolutional Neural Network) is designed to solve the semantic representation of complex phrases. Then, PhraseVecRank uses phrase embedding to calculate theme weight for each candidate phrase, and uses semantic similarity between candidate phrase embedding and co-occurrence information to calculate edge weight together, which can improve the extraction effect of keyphrases through topic weighted ranking. The experimental results verify that PhraseVecRank can effectively extract keyphrases covering the topic information of text, and the phrase embedding models we proposed can better represent the semantic information of phrases.
投稿的翻译标题 | The Theme-Weighted Keyphrase Extraction Algorithm Based on Phrase Embedding |
---|---|
源语言 | 繁体中文 |
页(从-至) | 1682-1690 |
页数 | 9 |
期刊 | Tien Tzu Hsueh Pao/Acta Electronica Sinica |
卷 | 49 |
期 | 9 |
DOI | |
出版状态 | 已出版 - 9月 2021 |
关键词
- Auto-encoder
- Keyphrases extraction
- Phrase embedding
- Theme-weighted