An integrated probabilistic text clustering model with segment-based and word order evidence

Lin Dai*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Text clustering is an important research topic with many practical applications. Traditional clustering algorithms such as K-means and Probabilistic Latent Semantic Indexing (pLSI) simply treat each document as a single chunk of text and also ignore important word order information, which limits their performance. This paper proposes an integrated probabilistic model to explicitly combine the evidence from individual segments within a document and the word order information. Based on this model, a text clustering framework is proposed. Experiments on test datasets indicate substantial performance gains over state-of-the-art algorithms.

Original languageEnglish
Title of host publicationProceedings - 7th International Conference on Information Processing and Management, ICIPM 2011
Pages64-70
Number of pages7
Publication statusPublished - 2011
Event7th International Conference on Information Processing and Management, ICIPM 2011 - Jeju Island, Korea, Republic of
Duration: 29 Nov 20111 Dec 2012

Publication series

NameProceedings - 7th International Conference on Information Processing and Management, ICIPM 2011

Conference

Conference7th International Conference on Information Processing and Management, ICIPM 2011
Country/TerritoryKorea, Republic of
CityJeju Island
Period29/11/111/12/12

Fingerprint

Dive into the research topics of 'An integrated probabilistic text clustering model with segment-based and word order evidence'. Together they form a unique fingerprint.

Cite this