Clustering over data streams based on grid density and index tree

  • Jiadong Ren*
  • , Binlei Cai
  • , Changzhen Hu
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

47 Citations (Scopus)

Abstract

Most existing grid-based stream clustering algorithms are low efficient for high-dimensional data streams due to a large number of cells, and can not handle noise points effectively. In this paper, we propose a novel approach PKS-Stream for clustering data streams, which is based on grid density and index tree Pks-tree. The new index structure Pks-tree is introduced to store the non-empty grid cells, which aims to improve the efficiency of storage and indexing. Simultaneously, we define a novel time based density threshold function to remove the noise points in real time. Based on Pks-tree, the data stream is clustered by grid density in the initial stage. With new data records arriving, the novel pruning strategy is adopted to periodically detect and remove noise points. Also, the generated clusters are dynamically adjusted. The experimental results show that PKS-Stream has better clustering quality and scalability.

Original languageEnglish
Pages (from-to)83-93
Number of pages11
JournalJournal of Convergence Information Technology
Volume6
Issue number1
DOIs
Publication statusPublished - Jan 2011

Keywords

  • Clustering
  • Data streams
  • Grid density
  • Index tree

Fingerprint

Dive into the research topics of 'Clustering over data streams based on grid density and index tree'. Together they form a unique fingerprint.

Cite this