An integrated probabilistic text clustering model with segment-based and word order evidence

Lin Dai*

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Text clustering is an important research topic with many practical applications. Traditional clustering algorithms such as K-means and Probabilistic Latent Semantic Indexing (pLSI) simply treat each document as a single chunk of text and also ignore important word order information, which limits their performance. This paper proposes an integrated probabilistic model to explicitly combine the evidence from individual segments within a document and the word order information. Based on this model, a text clustering framework is proposed. Experiments on test datasets indicate substantial performance gains over state-of-the-art algorithms.

源语言英语
主期刊名Proceedings - 7th International Conference on Information Processing and Management, ICIPM 2011
64-70
页数7
出版状态已出版 - 2011
活动7th International Conference on Information Processing and Management, ICIPM 2011 - Jeju Island, 韩国
期限: 29 11月 20111 12月 2012

出版系列

姓名Proceedings - 7th International Conference on Information Processing and Management, ICIPM 2011

会议

会议7th International Conference on Information Processing and Management, ICIPM 2011
国家/地区韩国
Jeju Island
时期29/11/111/12/12

指纹

探究 'An integrated probabilistic text clustering model with segment-based and word order evidence' 的科研主题。它们共同构成独一无二的指纹。

引用此