Abstract
Keyword extraction is a key basic problem in the field of natural language processing. The keyphrase extraction algorithms(PhraseVecRank) is proposed based on phrase embedding. Firstly, a phrase vector construction model based on LSTM(Long Short-Term Memory) and CNN(Convolutional Neural Network) is designed to solve the semantic representation of complex phrases. Then, PhraseVecRank uses phrase embedding to calculate theme weight for each candidate phrase, and uses semantic similarity between candidate phrase embedding and co-occurrence information to calculate edge weight together, which can improve the extraction effect of keyphrases through topic weighted ranking. The experimental results verify that PhraseVecRank can effectively extract keyphrases covering the topic information of text, and the phrase embedding models we proposed can better represent the semantic information of phrases.
Translated title of the contribution | The Theme-Weighted Keyphrase Extraction Algorithm Based on Phrase Embedding |
---|---|
Original language | Chinese (Traditional) |
Pages (from-to) | 1682-1690 |
Number of pages | 9 |
Journal | Tien Tzu Hsueh Pao/Acta Electronica Sinica |
Volume | 49 |
Issue number | 9 |
DOIs | |
Publication status | Published - Sept 2021 |