Clustering over an evolving data stream based on grid density and correlation

Jiadong Ren*, Binlei Cai, Changzhen Hu

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

7 Citations (Scopus)

Abstract

Most existing grid-based clustering algorithms are incompetent to the evolution of data streams and can not handled noise points effectively. Further, data points of the cluster edge can not be clustered accurately. In this paper, we present GDC-Stream, a new approach for clustering evolving data streams, which is based on grid density and correlation. A new time-based density threshold function is introduced to remove the noise points in real time. Moreover, a novel correlation-based technology is adopted to improve the accuracy of clustering. In the initial stage of the algorithm, the data stream is clustered by grid density, when new data records arriving, the novel pruning strategy is adopted to periodically inspect and remove noise points. Meanwhile, based on grid density and correlation, the generated clusters are dynamically adjusted to capture the changes of the data stream. The experimental results show that GDC-Stream has better clustering quality and scalability than CluStream. ICIC International

Original languageEnglish
Pages (from-to)1603-1609
Number of pages7
JournalICIC Express Letters
Volume4
Issue number5
Publication statusPublished - Oct 2010

Keywords

  • Clustering
  • Correlation
  • Evolving data streams
  • Grid density

Fingerprint

Dive into the research topics of 'Clustering over an evolving data stream based on grid density and correlation'. Together they form a unique fingerprint.

Cite this