TY - JOUR
T1 - Clustering over an evolving data stream based on grid density and correlation
AU - Ren, Jiadong
AU - Cai, Binlei
AU - Hu, Changzhen
PY - 2010/10
Y1 - 2010/10
N2 - Most existing grid-based clustering algorithms are incompetent to the evolution of data streams and can not handled noise points effectively. Further, data points of the cluster edge can not be clustered accurately. In this paper, we present GDC-Stream, a new approach for clustering evolving data streams, which is based on grid density and correlation. A new time-based density threshold function is introduced to remove the noise points in real time. Moreover, a novel correlation-based technology is adopted to improve the accuracy of clustering. In the initial stage of the algorithm, the data stream is clustered by grid density, when new data records arriving, the novel pruning strategy is adopted to periodically inspect and remove noise points. Meanwhile, based on grid density and correlation, the generated clusters are dynamically adjusted to capture the changes of the data stream. The experimental results show that GDC-Stream has better clustering quality and scalability than CluStream. ICIC International
AB - Most existing grid-based clustering algorithms are incompetent to the evolution of data streams and can not handled noise points effectively. Further, data points of the cluster edge can not be clustered accurately. In this paper, we present GDC-Stream, a new approach for clustering evolving data streams, which is based on grid density and correlation. A new time-based density threshold function is introduced to remove the noise points in real time. Moreover, a novel correlation-based technology is adopted to improve the accuracy of clustering. In the initial stage of the algorithm, the data stream is clustered by grid density, when new data records arriving, the novel pruning strategy is adopted to periodically inspect and remove noise points. Meanwhile, based on grid density and correlation, the generated clusters are dynamically adjusted to capture the changes of the data stream. The experimental results show that GDC-Stream has better clustering quality and scalability than CluStream. ICIC International
KW - Clustering
KW - Correlation
KW - Evolving data streams
KW - Grid density
UR - http://www.scopus.com/inward/record.url?scp=77956706405&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:77956706405
SN - 1881-803X
VL - 4
SP - 1603
EP - 1609
JO - ICIC Express Letters
JF - ICIC Express Letters
IS - 5
ER -