TY - GEN
T1 - An algorithm for clustering heterogeneous data streams with uncertainty
AU - Huang, Guo Yan
AU - Liang, Da Peng
AU - Hu, Chang Zhen
AU - Ren, Jia Dong
PY - 2010
Y1 - 2010
N2 - In many applications, the heterogeneous data streams with uncertainty are ubiquitous. However, the clustering quality of the existing methods for clustering heterogeneous data streams with uncertainty is lower. In this paper, an algorithm for clustering heterogeneous data streams with uncertainty, called HU-Clustering, is proposed. A Heterogeneous Uncertainty Clustering Feature (H-UCF) is presented to describe the feature of heterogeneous data streams with uncertainty. Based on H-UCF, a probability frequency histogram is proposed to track the statistics of categorical attributes; the algorithm initially creates n clusters by k-prototypes algorithm. In order to improve clustering quality, a two phase streams clustering selection process is applied to HU-Clustering algorithm. Firstly, the candidate clustering is selected through the new similarity measure; secondly, the most similar cluster for each new arriving tuple is selected through clustering uncertainty in candidate clustering set. The experimental results show that the clustering quality of HU-Clustering is higher than that of UMicro.
AB - In many applications, the heterogeneous data streams with uncertainty are ubiquitous. However, the clustering quality of the existing methods for clustering heterogeneous data streams with uncertainty is lower. In this paper, an algorithm for clustering heterogeneous data streams with uncertainty, called HU-Clustering, is proposed. A Heterogeneous Uncertainty Clustering Feature (H-UCF) is presented to describe the feature of heterogeneous data streams with uncertainty. Based on H-UCF, a probability frequency histogram is proposed to track the statistics of categorical attributes; the algorithm initially creates n clusters by k-prototypes algorithm. In order to improve clustering quality, a two phase streams clustering selection process is applied to HU-Clustering algorithm. Firstly, the candidate clustering is selected through the new similarity measure; secondly, the most similar cluster for each new arriving tuple is selected through clustering uncertainty in candidate clustering set. The experimental results show that the clustering quality of HU-Clustering is higher than that of UMicro.
KW - Clustering
KW - Heterogeneous attributes
KW - Probability frequency histogram
KW - Uncertain data stream
UR - http://www.scopus.com/inward/record.url?scp=78149310432&partnerID=8YFLogxK
U2 - 10.1109/ICMLC.2010.5580502
DO - 10.1109/ICMLC.2010.5580502
M3 - Conference contribution
AN - SCOPUS:78149310432
SN - 9781424465262
T3 - 2010 International Conference on Machine Learning and Cybernetics, ICMLC 2010
SP - 2059
EP - 2064
BT - 2010 International Conference on Machine Learning and Cybernetics, ICMLC 2010
T2 - 2010 International Conference on Machine Learning and Cybernetics, ICMLC 2010
Y2 - 11 July 2010 through 14 July 2010
ER -