TY - GEN
T1 - Incorporating entities in news topic modeling
AU - Hu, Linmei
AU - Li, Juanzi
AU - Li, Zhihui
AU - Shao, Chao
AU - Li, Zhixing
PY - 2013
Y1 - 2013
N2 - News articles express information by concentrating on named entities like who, when, and where in news. Whereas, extracting the relationships among entities, words and topics through a large amount of news articles is nontrivial. Topic modeling like Latent Dirichlet Allocation has been applied a lot to mine hidden topics in text analysis, which have achieved considerable performance. However, it cannot explicitly show relationship between words and entities. In this paper, we propose a generative model, Entity-Centered Topic Model(ECTM) to summarize the correlation among entities, words and topics by taking entity topic as a mixture of word topics. Experiments on real news data sets show our model of a lower perplexity and better in clustering of entities than state-of-the-art entity topic model(CorrLDA2). We also present analysis for results of ECTM and further compare it with CorrLDA2.
AB - News articles express information by concentrating on named entities like who, when, and where in news. Whereas, extracting the relationships among entities, words and topics through a large amount of news articles is nontrivial. Topic modeling like Latent Dirichlet Allocation has been applied a lot to mine hidden topics in text analysis, which have achieved considerable performance. However, it cannot explicitly show relationship between words and entities. In this paper, we propose a generative model, Entity-Centered Topic Model(ECTM) to summarize the correlation among entities, words and topics by taking entity topic as a mixture of word topics. Experiments on real news data sets show our model of a lower perplexity and better in clustering of entities than state-of-the-art entity topic model(CorrLDA2). We also present analysis for results of ECTM and further compare it with CorrLDA2.
KW - Generative entity topic models
KW - Named entity
KW - News
UR - http://www.scopus.com/inward/record.url?scp=84901492173&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-41644-6_14
DO - 10.1007/978-3-642-41644-6_14
M3 - Conference contribution
AN - SCOPUS:84901492173
SN - 9783642416439
T3 - Communications in Computer and Information Science
SP - 139
EP - 150
BT - Natural Language Processing and Chinese Computing - Second CCF Conference, NLPCC 2013, Proceedings
PB - Springer Verlag
T2 - 2nd CCF Conference on Natural Language Processing and Chinese Computing, NLPCC 2013
Y2 - 15 November 2013 through 19 November 2013
ER -