TY - GEN
T1 - A unified generative model for characterizing microblogs' topics
AU - Zhuang, Kun
AU - Huang, Heyan
AU - Xin, Xin
AU - Wei, Xiaochi
AU - Yang, Xianxiang
AU - Feng, Chong
AU - Fang, Ying
PY - 2013
Y1 - 2013
N2 - In this paper, we focus on the issue of characterizing microblogs' topics based on topic models. Different from dealing with traditional textual media (such as news documents), modeling microblogs has three challenges: 1) too much noise; 2) short text; and 3) content incompleteness. Previously, all these limitations have been investigated separately. Some work filters the noise through a prior classification; some enhances the text through the user's blog history; and some utilizes the social network. However, none of these work could solve all the above limitations simultaneously. To solve this problem, we make a combination of previous work in this paper, and propose a unified generative model for characterizing microblogs' topics. In the proposed unified approach, all the three limitations could be solved. A collapsed Gibbs-sampling optimization method is derived for estimating the parameters. Through both qualitative and quantitative analysis in Twitter, we demonstrate that our approach consistently outperforms previous methods at a significant scale.
AB - In this paper, we focus on the issue of characterizing microblogs' topics based on topic models. Different from dealing with traditional textual media (such as news documents), modeling microblogs has three challenges: 1) too much noise; 2) short text; and 3) content incompleteness. Previously, all these limitations have been investigated separately. Some work filters the noise through a prior classification; some enhances the text through the user's blog history; and some utilizes the social network. However, none of these work could solve all the above limitations simultaneously. To solve this problem, we make a combination of previous work in this paper, and propose a unified generative model for characterizing microblogs' topics. In the proposed unified approach, all the three limitations could be solved. A collapsed Gibbs-sampling optimization method is derived for estimating the parameters. Through both qualitative and quantitative analysis in Twitter, we demonstrate that our approach consistently outperforms previous methods at a significant scale.
KW - Latent Dirichlet Allocation
KW - Microblog Analysis
KW - Modeling Topics from Social Network Data
UR - http://www.scopus.com/inward/record.url?scp=84880033498&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-38562-9_59
DO - 10.1007/978-3-642-38562-9_59
M3 - Conference contribution
AN - SCOPUS:84880033498
SN - 9783642385612
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 583
EP - 594
BT - Web-Age Information Management - 14th International Conference, WAIM 2013, Proceedings
PB - Springer Verlag
T2 - 14th International Conference on Web-Age Information Management, WAIM 2013
Y2 - 14 June 2013 through 16 June 2013
ER -