Incorporating entities in news topic modeling

Linmei Hu, Juanzi Li, Zhihui Li, Chao Shao, Zhixing Li

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

10 Citations (Scopus)

Abstract

News articles express information by concentrating on named entities like who, when, and where in news. Whereas, extracting the relationships among entities, words and topics through a large amount of news articles is nontrivial. Topic modeling like Latent Dirichlet Allocation has been applied a lot to mine hidden topics in text analysis, which have achieved considerable performance. However, it cannot explicitly show relationship between words and entities. In this paper, we propose a generative model, Entity-Centered Topic Model(ECTM) to summarize the correlation among entities, words and topics by taking entity topic as a mixture of word topics. Experiments on real news data sets show our model of a lower perplexity and better in clustering of entities than state-of-the-art entity topic model(CorrLDA2). We also present analysis for results of ECTM and further compare it with CorrLDA2.

Original languageEnglish
Title of host publicationNatural Language Processing and Chinese Computing - Second CCF Conference, NLPCC 2013, Proceedings
PublisherSpringer Verlag
Pages139-150
Number of pages12
ISBN (Print)9783642416439
DOIs
Publication statusPublished - 2013
Externally publishedYes
Event2nd CCF Conference on Natural Language Processing and Chinese Computing, NLPCC 2013 - Chongqing, China
Duration: 15 Nov 201319 Nov 2013

Publication series

NameCommunications in Computer and Information Science
Volume400
ISSN (Print)1865-0929

Conference

Conference2nd CCF Conference on Natural Language Processing and Chinese Computing, NLPCC 2013
Country/TerritoryChina
CityChongqing
Period15/11/1319/11/13

Keywords

  • Generative entity topic models
  • Named entity
  • News

Fingerprint

Dive into the research topics of 'Incorporating entities in news topic modeling'. Together they form a unique fingerprint.

Cite this