Abstract
Most existing grid-based clustering algorithms are incompetent to the evolution of data streams and can not handled noise points effectively. Further, data points of the cluster edge can not be clustered accurately. In this paper, we present GDC-Stream, a new approach for clustering evolving data streams, which is based on grid density and correlation. A new time-based density threshold function is introduced to remove the noise points in real time. Moreover, a novel correlation-based technology is adopted to improve the accuracy of clustering. In the initial stage of the algorithm, the data stream is clustered by grid density, when new data records arriving, the novel pruning strategy is adopted to periodically inspect and remove noise points. Meanwhile, based on grid density and correlation, the generated clusters are dynamically adjusted to capture the changes of the data stream. The experimental results show that GDC-Stream has better clustering quality and scalability than CluStream. ICIC International
Original language | English |
---|---|
Pages (from-to) | 1603-1609 |
Number of pages | 7 |
Journal | ICIC Express Letters |
Volume | 4 |
Issue number | 5 |
Publication status | Published - Oct 2010 |
Keywords
- Clustering
- Correlation
- Evolving data streams
- Grid density