A two-stage approach for generating topic models

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

15 Citations (Scopus)

Abstract

Topic modeling has been widely utilized in the fields of information retrieval, text mining, text classification etc. Most existing statistical topic modeling methods such as LDA and pLSA generate a term based representation to represent a topic by selecting single words from multinomial word distribution over this topic. There are two main shortcomings: firstly, popular or common words occur very often across different topics that bring ambiguity to understand topics; secondly, single words lack coherent semantic meaning to accurately represent topics. In order to overcome these problems, in this paper, we propose a two-stage model that combines text mining and pattern mining with statistical modeling to generate more discriminative and semantic rich topic representations. Experiments show that the optimized topic representations generated by the proposed methods outperform the typical statistical topic modeling method LDA in terms of accuracy and certainty.

Original languageEnglish
Title of host publicationAdvances in Knowledge Discovery and Data Mining - 17th Pacific-Asia Conference, PAKDD 2013, Proceedings
Pages221-232
Number of pages12
EditionPART 2
DOIs
Publication statusPublished - 2013
Externally publishedYes
Event17th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2013 - Gold Coast, QLD, Australia
Duration: 14 Apr 201317 Apr 2013

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 2
Volume7819 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference17th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2013
Country/TerritoryAustralia
CityGold Coast, QLD
Period14/04/1317/04/13

Keywords

  • Entropy
  • Tf-idf, frequent pattern mining
  • Topic modeling
  • Topic representation

Fingerprint

Dive into the research topics of 'A two-stage approach for generating topic models'. Together they form a unique fingerprint.

Cite this