A clustering algorithm based on matrix over high dimensional data stream

Guibin Hou*, Ruixia Yao, Jiadong Ren, Changzhen Hu

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

3 引用 (Scopus)

摘要

Clustering high-dimensional data stream is a difficult and important issue. In this paper, we propose MStream, a new clustering algorithm based on matrix over high dimensional data stream. MStream algorithm incorporates a synopsis structure, called GC (Grid Cell Structure), and grid matrix technique. The algorithm adopts the two-phased framework. In the online component, the GC is employed to monitor one-dimensional statistics data distribution of each dimension independently. Sparse GCs which need to be deleted are checked by predefined threshold. In the offline component, it is possible to tracing multi-dimensional clusters by dense GCs which are maintained in the online component. Grid matrix technique is introduced to generate the final multi-dimensional clusters in the whole data space. Experimental results show that our algorithm has the flexible scalability and higher clustering quality.

源语言英语
主期刊名Web Information Systems and Mining - International Conference, WISM 2010, Proceedings
86-94
页数9
版本M4D
DOI
出版状态已出版 - 2010
活动2010 International Conference on Web Information Systems and Mining, WISM 2010 - Sanya, 中国
期限: 23 10月 201024 10月 2010

出版系列

姓名Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
编号M4D
6318 LNCS
ISSN(印刷版)0302-9743
ISSN(电子版)1611-3349

会议

会议2010 International Conference on Web Information Systems and Mining, WISM 2010
国家/地区中国
Sanya
时期23/10/1024/10/10

指纹

探究 'A clustering algorithm based on matrix over high dimensional data stream' 的科研主题。它们共同构成独一无二的指纹。

引用此