Abstract
Uncertainties make it impossible to cluster uncertain data streams using traditional clustering algorithms. This paper presents a density-based clustering algorithms for uncertain data stream environments. An uncertainty metric is used to measure the distribution information in the uncertain data. The uncertain data streams DENCLUE algorithm (USDENCLUE) is then modified to deal with uncertainty data to minimize the impact of the data uncertainty on the clustering results. A density-based clustering algorithm is then given for uncertain data streams with a sliding window to rapidly prune the clusters using an exponential histogram of the cluster features. This algorithm can efficiently handle noisy data in evolving data streams to generate arbitrary clusters to improve the clustering quality. Comparisons of this algorithm with the CluStream clustering algorithm on real and synthetic data sets show the efficiency and effectiveness of this algorithm.
Original language | English |
---|---|
Pages (from-to) | 884-891 |
Number of pages | 8 |
Journal | Qinghua Daxue Xuebao/Journal of Tsinghua University |
Volume | 57 |
Issue number | 8 |
DOIs | |
Publication status | Published - 1 Aug 2017 |
Externally published | Yes |
Keywords
- Clustering
- Density
- Sliding window
- Uncertain data streams