TY - JOUR
T1 - Jointly Learning Topics in Sentence Embedding for Document Summarization
AU - Gao, Yang
AU - Xu, Yue
AU - Huang, Heyan
AU - Liu, Qian
AU - Wei, Linjing
AU - Liu, Luyang
N1 - Publisher Copyright:
© 1989-2012 IEEE.
PY - 2020/4/1
Y1 - 2020/4/1
N2 - Summarization systems for various applications, such as opinion mining, online news services, and answering questions, have attracted increasing attention in recent years. These tasks are complicated, and a classic representation using bag-of-words does not adequately meet the comprehensive needs of applications that rely on sentence extraction. In this paper, we focus on representing sentences as continuous vectors as a basis for measuring relevance between user needs and candidate sentences in source documents. Embedding models based on distributed vector representations are often used in the summarization community because, through cosine similarity, they simplify sentence relevance when comparing two sentences or a sentence/query and a document. However, the vector-based embedding models do not typically account for the salience of a sentence, and this is a very necessary part of document summarization. To incorporate sentence salience, we developed a model, called CCTSenEmb, that learns latent discriminative Gaussian topics in the embedding space and extended the new framework by seamlessly incorporating both topic and sentence embedding into one summarization system. To facilitate the semantic coherence between sentences in the framework of prediction-based tasks for sentence embedding, the CCTSenEmb further considers the associations between neighboring sentences. As a result, this novel sentence embedding framework combines sentence representations, word-based content, and topic assignments to predict the representation of the next sentence. A series of experiments with the DUC datasets validate CCTSenEmb's efficacy in document summarization in a query-focused extraction-based setting and an unsupervised ILP-based setting.
AB - Summarization systems for various applications, such as opinion mining, online news services, and answering questions, have attracted increasing attention in recent years. These tasks are complicated, and a classic representation using bag-of-words does not adequately meet the comprehensive needs of applications that rely on sentence extraction. In this paper, we focus on representing sentences as continuous vectors as a basis for measuring relevance between user needs and candidate sentences in source documents. Embedding models based on distributed vector representations are often used in the summarization community because, through cosine similarity, they simplify sentence relevance when comparing two sentences or a sentence/query and a document. However, the vector-based embedding models do not typically account for the salience of a sentence, and this is a very necessary part of document summarization. To incorporate sentence salience, we developed a model, called CCTSenEmb, that learns latent discriminative Gaussian topics in the embedding space and extended the new framework by seamlessly incorporating both topic and sentence embedding into one summarization system. To facilitate the semantic coherence between sentences in the framework of prediction-based tasks for sentence embedding, the CCTSenEmb further considers the associations between neighboring sentences. As a result, this novel sentence embedding framework combines sentence representations, word-based content, and topic assignments to predict the representation of the next sentence. A series of experiments with the DUC datasets validate CCTSenEmb's efficacy in document summarization in a query-focused extraction-based setting and an unsupervised ILP-based setting.
KW - Gaussian topics
KW - Sentence embedding
KW - and salience
KW - relevance
KW - summarization
UR - http://www.scopus.com/inward/record.url?scp=85081652165&partnerID=8YFLogxK
U2 - 10.1109/TKDE.2019.2892430
DO - 10.1109/TKDE.2019.2892430
M3 - Article
AN - SCOPUS:85081652165
SN - 1041-4347
VL - 32
SP - 688
EP - 699
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
IS - 4
M1 - 8611098
ER -