Jointly Learning Topics in Sentence Embedding for Document Summarization

Yang Gao*, Yue Xu, Heyan Huang, Qian Liu, Linjing Wei, Luyang Liu

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

25 引用 (Scopus)

摘要

Summarization systems for various applications, such as opinion mining, online news services, and answering questions, have attracted increasing attention in recent years. These tasks are complicated, and a classic representation using bag-of-words does not adequately meet the comprehensive needs of applications that rely on sentence extraction. In this paper, we focus on representing sentences as continuous vectors as a basis for measuring relevance between user needs and candidate sentences in source documents. Embedding models based on distributed vector representations are often used in the summarization community because, through cosine similarity, they simplify sentence relevance when comparing two sentences or a sentence/query and a document. However, the vector-based embedding models do not typically account for the salience of a sentence, and this is a very necessary part of document summarization. To incorporate sentence salience, we developed a model, called CCTSenEmb, that learns latent discriminative Gaussian topics in the embedding space and extended the new framework by seamlessly incorporating both topic and sentence embedding into one summarization system. To facilitate the semantic coherence between sentences in the framework of prediction-based tasks for sentence embedding, the CCTSenEmb further considers the associations between neighboring sentences. As a result, this novel sentence embedding framework combines sentence representations, word-based content, and topic assignments to predict the representation of the next sentence. A series of experiments with the DUC datasets validate CCTSenEmb's efficacy in document summarization in a query-focused extraction-based setting and an unsupervised ILP-based setting.

源语言英语
文章编号8611098
页(从-至)688-699
页数12
期刊IEEE Transactions on Knowledge and Data Engineering
32
4
DOI
出版状态已出版 - 1 4月 2020

指纹

探究 'Jointly Learning Topics in Sentence Embedding for Document Summarization' 的科研主题。它们共同构成独一无二的指纹。

引用此