Skip to main navigation Skip to search Skip to main content

Jointly Learning Topics in Sentence Embedding for Document Summarization

  • Yang Gao*
  • , Yue Xu
  • , Heyan Huang
  • , Qian Liu
  • , Linjing Wei
  • , Luyang Liu
  • *Corresponding author for this work
  • Beijing Institute of Technology
  • Queensland University of Technology

Research output: Contribution to journalArticlepeer-review

Abstract

Summarization systems for various applications, such as opinion mining, online news services, and answering questions, have attracted increasing attention in recent years. These tasks are complicated, and a classic representation using bag-of-words does not adequately meet the comprehensive needs of applications that rely on sentence extraction. In this paper, we focus on representing sentences as continuous vectors as a basis for measuring relevance between user needs and candidate sentences in source documents. Embedding models based on distributed vector representations are often used in the summarization community because, through cosine similarity, they simplify sentence relevance when comparing two sentences or a sentence/query and a document. However, the vector-based embedding models do not typically account for the salience of a sentence, and this is a very necessary part of document summarization. To incorporate sentence salience, we developed a model, called CCTSenEmb, that learns latent discriminative Gaussian topics in the embedding space and extended the new framework by seamlessly incorporating both topic and sentence embedding into one summarization system. To facilitate the semantic coherence between sentences in the framework of prediction-based tasks for sentence embedding, the CCTSenEmb further considers the associations between neighboring sentences. As a result, this novel sentence embedding framework combines sentence representations, word-based content, and topic assignments to predict the representation of the next sentence. A series of experiments with the DUC datasets validate CCTSenEmb's efficacy in document summarization in a query-focused extraction-based setting and an unsupervised ILP-based setting.

Original languageEnglish
Article number8611098
Pages (from-to)688-699
Number of pages12
JournalIEEE Transactions on Knowledge and Data Engineering
Volume32
Issue number4
DOIs
Publication statusPublished - 1 Apr 2020

Keywords

  • Gaussian topics
  • Sentence embedding
  • and salience
  • relevance
  • summarization

Fingerprint

Dive into the research topics of 'Jointly Learning Topics in Sentence Embedding for Document Summarization'. Together they form a unique fingerprint.

Cite this