TY - GEN
T1 - An integrated probabilistic text clustering model with segment-based and word order evidence
AU - Dai, Lin
PY - 2011
Y1 - 2011
N2 - Text clustering is an important research topic with many practical applications. Traditional clustering algorithms such as K-means and Probabilistic Latent Semantic Indexing (pLSI) simply treat each document as a single chunk of text and also ignore important word order information, which limits their performance. This paper proposes an integrated probabilistic model to explicitly combine the evidence from individual segments within a document and the word order information. Based on this model, a text clustering framework is proposed. Experiments on test datasets indicate substantial performance gains over state-of-the-art algorithms.
AB - Text clustering is an important research topic with many practical applications. Traditional clustering algorithms such as K-means and Probabilistic Latent Semantic Indexing (pLSI) simply treat each document as a single chunk of text and also ignore important word order information, which limits their performance. This paper proposes an integrated probabilistic model to explicitly combine the evidence from individual segments within a document and the word order information. Based on this model, a text clustering framework is proposed. Experiments on test datasets indicate substantial performance gains over state-of-the-art algorithms.
UR - http://www.scopus.com/inward/record.url?scp=84864241036&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84864241036
SN - 9788988678442
T3 - Proceedings - 7th International Conference on Information Processing and Management, ICIPM 2011
SP - 64
EP - 70
BT - Proceedings - 7th International Conference on Information Processing and Management, ICIPM 2011
T2 - 7th International Conference on Information Processing and Management, ICIPM 2011
Y2 - 29 November 2011 through 1 December 2012
ER -