A two-stage approach for generating topic models

Yang Gao, Yue Xu, Yuefeng Li, Bin Liu

科研成果: 书/报告/会议事项章节会议稿件同行评审

14 引用 (Scopus)

摘要

Topic modeling has been widely utilized in the fields of information retrieval, text mining, text classification etc. Most existing statistical topic modeling methods such as LDA and pLSA generate a term based representation to represent a topic by selecting single words from multinomial word distribution over this topic. There are two main shortcomings: firstly, popular or common words occur very often across different topics that bring ambiguity to understand topics; secondly, single words lack coherent semantic meaning to accurately represent topics. In order to overcome these problems, in this paper, we propose a two-stage model that combines text mining and pattern mining with statistical modeling to generate more discriminative and semantic rich topic representations. Experiments show that the optimized topic representations generated by the proposed methods outperform the typical statistical topic modeling method LDA in terms of accuracy and certainty.

源语言英语
主期刊名Advances in Knowledge Discovery and Data Mining - 17th Pacific-Asia Conference, PAKDD 2013, Proceedings
221-232
页数12
版本PART 2
DOI
出版状态已出版 - 2013
已对外发布
活动17th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2013 - Gold Coast, QLD, 澳大利亚
期限: 14 4月 201317 4月 2013

出版系列

姓名Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
编号PART 2
7819 LNAI
ISSN(印刷版)0302-9743
ISSN(电子版)1611-3349

会议

会议17th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2013
国家/地区澳大利亚
Gold Coast, QLD
时期14/04/1317/04/13

指纹

探究 'A two-stage approach for generating topic models' 的科研主题。它们共同构成独一无二的指纹。

引用此