A clustering algorithm based on matrix over high dimensional data stream

Guibin Hou*, Ruixia Yao, Jiadong Ren, Changzhen Hu

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Citations (Scopus)

Abstract

Clustering high-dimensional data stream is a difficult and important issue. In this paper, we propose MStream, a new clustering algorithm based on matrix over high dimensional data stream. MStream algorithm incorporates a synopsis structure, called GC (Grid Cell Structure), and grid matrix technique. The algorithm adopts the two-phased framework. In the online component, the GC is employed to monitor one-dimensional statistics data distribution of each dimension independently. Sparse GCs which need to be deleted are checked by predefined threshold. In the offline component, it is possible to tracing multi-dimensional clusters by dense GCs which are maintained in the online component. Grid matrix technique is introduced to generate the final multi-dimensional clusters in the whole data space. Experimental results show that our algorithm has the flexible scalability and higher clustering quality.

Original languageEnglish
Title of host publicationWeb Information Systems and Mining - International Conference, WISM 2010, Proceedings
Pages86-94
Number of pages9
EditionM4D
DOIs
Publication statusPublished - 2010
Event2010 International Conference on Web Information Systems and Mining, WISM 2010 - Sanya, China
Duration: 23 Oct 201024 Oct 2010

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberM4D
Volume6318 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference2010 International Conference on Web Information Systems and Mining, WISM 2010
Country/TerritoryChina
CitySanya
Period23/10/1024/10/10

Keywords

  • Clustering
  • Grid matrix
  • High-dimensional

Fingerprint

Dive into the research topics of 'A clustering algorithm based on matrix over high dimensional data stream'. Together they form a unique fingerprint.

Cite this