TY - GEN
T1 - A clustering algorithm based on matrix over high dimensional data stream
AU - Hou, Guibin
AU - Yao, Ruixia
AU - Ren, Jiadong
AU - Hu, Changzhen
PY - 2010
Y1 - 2010
N2 - Clustering high-dimensional data stream is a difficult and important issue. In this paper, we propose MStream, a new clustering algorithm based on matrix over high dimensional data stream. MStream algorithm incorporates a synopsis structure, called GC (Grid Cell Structure), and grid matrix technique. The algorithm adopts the two-phased framework. In the online component, the GC is employed to monitor one-dimensional statistics data distribution of each dimension independently. Sparse GCs which need to be deleted are checked by predefined threshold. In the offline component, it is possible to tracing multi-dimensional clusters by dense GCs which are maintained in the online component. Grid matrix technique is introduced to generate the final multi-dimensional clusters in the whole data space. Experimental results show that our algorithm has the flexible scalability and higher clustering quality.
AB - Clustering high-dimensional data stream is a difficult and important issue. In this paper, we propose MStream, a new clustering algorithm based on matrix over high dimensional data stream. MStream algorithm incorporates a synopsis structure, called GC (Grid Cell Structure), and grid matrix technique. The algorithm adopts the two-phased framework. In the online component, the GC is employed to monitor one-dimensional statistics data distribution of each dimension independently. Sparse GCs which need to be deleted are checked by predefined threshold. In the offline component, it is possible to tracing multi-dimensional clusters by dense GCs which are maintained in the online component. Grid matrix technique is introduced to generate the final multi-dimensional clusters in the whole data space. Experimental results show that our algorithm has the flexible scalability and higher clustering quality.
KW - Clustering
KW - Grid matrix
KW - High-dimensional
UR - http://www.scopus.com/inward/record.url?scp=78649497657&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-16515-3_12
DO - 10.1007/978-3-642-16515-3_12
M3 - Conference contribution
AN - SCOPUS:78649497657
SN - 3642165141
SN - 9783642165146
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 86
EP - 94
BT - Web Information Systems and Mining - International Conference, WISM 2010, Proceedings
T2 - 2010 International Conference on Web Information Systems and Mining, WISM 2010
Y2 - 23 October 2010 through 24 October 2010
ER -