Text Categorization Based on Topic Model

Shibin Zhou*, Kan Li, Yushu Liu

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

In the text literature, many topic models were proposed to represent documents and words as topics or latent topics in order to process text effectively and accurately. In this paper, we propose LDACLM or Latent Dirichlet Allocation Category Language Model for text categorization and estimate parameters of models by variational inference. As a variant of Latent Dirichlet Allocation Model, LDACLM regards documents of category as Language Model and uses variational parameters to estimate maximum a posteriori of terms. In general, experiments show LDACLM model is effective and outperform Naïve Bayes with Laplace smoothing and Rocchio algorithm but little inferior to SVM for text categorization.

Original languageEnglish
Pages (from-to)398-409
Number of pages12
JournalInternational Journal of Computational Intelligence Systems
Volume2
Issue number4
DOIs
Publication statusPublished - Dec 2009

Keywords

  • Category Language Model
  • Latent Dirichlet allocation
  • Topic model
  • Variational Inference

Fingerprint

Dive into the research topics of 'Text Categorization Based on Topic Model'. Together they form a unique fingerprint.

Cite this