TY - JOUR
T1 - Text Categorization Based on Topic Model
AU - Zhou, Shibin
AU - Li, Kan
AU - Liu, Yushu
N1 - Publisher Copyright:
© 2009, the authors.
PY - 2009/12
Y1 - 2009/12
N2 - In the text literature, many topic models were proposed to represent documents and words as topics or latent topics in order to process text effectively and accurately. In this paper, we propose LDACLM or Latent Dirichlet Allocation Category Language Model for text categorization and estimate parameters of models by variational inference. As a variant of Latent Dirichlet Allocation Model, LDACLM regards documents of category as Language Model and uses variational parameters to estimate maximum a posteriori of terms. In general, experiments show LDACLM model is effective and outperform Naïve Bayes with Laplace smoothing and Rocchio algorithm but little inferior to SVM for text categorization.
AB - In the text literature, many topic models were proposed to represent documents and words as topics or latent topics in order to process text effectively and accurately. In this paper, we propose LDACLM or Latent Dirichlet Allocation Category Language Model for text categorization and estimate parameters of models by variational inference. As a variant of Latent Dirichlet Allocation Model, LDACLM regards documents of category as Language Model and uses variational parameters to estimate maximum a posteriori of terms. In general, experiments show LDACLM model is effective and outperform Naïve Bayes with Laplace smoothing and Rocchio algorithm but little inferior to SVM for text categorization.
KW - Category Language Model
KW - Latent Dirichlet allocation
KW - Topic model
KW - Variational Inference
UR - http://www.scopus.com/inward/record.url?scp=85181242860&partnerID=8YFLogxK
U2 - 10.2991/ijcis.2009.2.4.8
DO - 10.2991/ijcis.2009.2.4.8
M3 - Article
AN - SCOPUS:85181242860
SN - 1875-6891
VL - 2
SP - 398
EP - 409
JO - International Journal of Computational Intelligence Systems
JF - International Journal of Computational Intelligence Systems
IS - 4
ER -