Text categorization based on topic model

Shibin Zhou; Kan Li; Yushu Liu

doi:10.1007/978-3-540-79721-0_77

Text categorization based on topic model

Shibin Zhou^*, Kan Li, Yushu Liu

^*Corresponding author for this work

School of Computer Science and Technology

Beijing Institute of Technology

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

6 Citations (Scopus)

Abstract

In the text literature, many topic models were proposed to represent documents and words as topics or latent topics in order to process text effectively and accurately. In this paper, we propose LDACLM or Latent Dirichlet Allocation Category Language Model for text categorization and estimate parameters of models by variational inference. As a variant of Latent Dirichlet Allocation Model, LDACLM regard documents of category as Language Model and use variational parameters to estimate maximum a posteriori of terms. Experiments show LDACLM model to be effective for text categorization, outperforming standard Naive Bayes and Rocchio method for text categorization.

Original language	English
Title of host publication	Rough Sets and Knowledge Technology - Third International Conference, RSKT 2008, Proceedings
Pages	572-579
Number of pages	8
DOIs	https://doi.org/10.1007/978-3-540-79721-0_77
Publication status	Published - 2008
Event	3rd International Conference on Rough Sets and Knowledge Technology, RSKT 2008 - Chengdu, China Duration: 17 May 2008 → 19 May 2008

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	5009 LNAI
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	3rd International Conference on Rough Sets and Knowledge Technology, RSKT 2008
Country/Territory	China
City	Chengdu
Period	17/05/08 → 19/05/08

Keywords

Category Language Model
Latent Dirichlet Allocation
Variational Inference

Access to Document

10.1007/978-3-540-79721-0_77

Cite this

Zhou, S., Li, K., & Liu, Y. (2008). Text categorization based on topic model. In Rough Sets and Knowledge Technology - Third International Conference, RSKT 2008, Proceedings (pp. 572-579). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5009 LNAI). https://doi.org/10.1007/978-3-540-79721-0_77

@inproceedings{93198330396a4d938ee5d66eabdd88b9,

title = "Text categorization based on topic model",

abstract = "In the text literature, many topic models were proposed to represent documents and words as topics or latent topics in order to process text effectively and accurately. In this paper, we propose LDACLM or Latent Dirichlet Allocation Category Language Model for text categorization and estimate parameters of models by variational inference. As a variant of Latent Dirichlet Allocation Model, LDACLM regard documents of category as Language Model and use variational parameters to estimate maximum a posteriori of terms. Experiments show LDACLM model to be effective for text categorization, outperforming standard Naive Bayes and Rocchio method for text categorization.",

keywords = "Category Language Model, Latent Dirichlet Allocation, Variational Inference",

author = "Shibin Zhou and Kan Li and Yushu Liu",

year = "2008",

doi = "10.1007/978-3-540-79721-0_77",

language = "English",

isbn = "3540797203",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

pages = "572--579",

booktitle = "Rough Sets and Knowledge Technology - Third International Conference, RSKT 2008, Proceedings",

note = "3rd International Conference on Rough Sets and Knowledge Technology, RSKT 2008 ; Conference date: 17-05-2008 Through 19-05-2008",

}

Zhou, S, Li, K & Liu, Y 2008, Text categorization based on topic model. in Rough Sets and Knowledge Technology - Third International Conference, RSKT 2008, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 5009 LNAI, pp. 572-579, 3rd International Conference on Rough Sets and Knowledge Technology, RSKT 2008, Chengdu, China, 17/05/08. https://doi.org/10.1007/978-3-540-79721-0_77

Text categorization based on topic model. / Zhou, Shibin; Li, Kan; Liu, Yushu.
Rough Sets and Knowledge Technology - Third International Conference, RSKT 2008, Proceedings. 2008. p. 572-579 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5009 LNAI).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Text categorization based on topic model

AU - Zhou, Shibin

AU - Li, Kan

AU - Liu, Yushu

PY - 2008

Y1 - 2008

N2 - In the text literature, many topic models were proposed to represent documents and words as topics or latent topics in order to process text effectively and accurately. In this paper, we propose LDACLM or Latent Dirichlet Allocation Category Language Model for text categorization and estimate parameters of models by variational inference. As a variant of Latent Dirichlet Allocation Model, LDACLM regard documents of category as Language Model and use variational parameters to estimate maximum a posteriori of terms. Experiments show LDACLM model to be effective for text categorization, outperforming standard Naive Bayes and Rocchio method for text categorization.

AB - In the text literature, many topic models were proposed to represent documents and words as topics or latent topics in order to process text effectively and accurately. In this paper, we propose LDACLM or Latent Dirichlet Allocation Category Language Model for text categorization and estimate parameters of models by variational inference. As a variant of Latent Dirichlet Allocation Model, LDACLM regard documents of category as Language Model and use variational parameters to estimate maximum a posteriori of terms. Experiments show LDACLM model to be effective for text categorization, outperforming standard Naive Bayes and Rocchio method for text categorization.

KW - Category Language Model

KW - Latent Dirichlet Allocation

KW - Variational Inference

UR - http://www.scopus.com/inward/record.url?scp=44649202910&partnerID=8YFLogxK

U2 - 10.1007/978-3-540-79721-0_77

DO - 10.1007/978-3-540-79721-0_77

M3 - Conference contribution

AN - SCOPUS:44649202910

SN - 3540797203

SN - 9783540797203

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 572

EP - 579

BT - Rough Sets and Knowledge Technology - Third International Conference, RSKT 2008, Proceedings

T2 - 3rd International Conference on Rough Sets and Knowledge Technology, RSKT 2008

Y2 - 17 May 2008 through 19 May 2008

ER -

Text categorization based on topic model

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this