An algorithm for clustering heterogeneous data streams with uncertainty

Guo Yan Huang*, Da Peng Liang, Chang Zhen Hu, Jia Dong Ren

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

9 引用 (Scopus)

摘要

In many applications, the heterogeneous data streams with uncertainty are ubiquitous. However, the clustering quality of the existing methods for clustering heterogeneous data streams with uncertainty is lower. In this paper, an algorithm for clustering heterogeneous data streams with uncertainty, called HU-Clustering, is proposed. A Heterogeneous Uncertainty Clustering Feature (H-UCF) is presented to describe the feature of heterogeneous data streams with uncertainty. Based on H-UCF, a probability frequency histogram is proposed to track the statistics of categorical attributes; the algorithm initially creates n clusters by k-prototypes algorithm. In order to improve clustering quality, a two phase streams clustering selection process is applied to HU-Clustering algorithm. Firstly, the candidate clustering is selected through the new similarity measure; secondly, the most similar cluster for each new arriving tuple is selected through clustering uncertainty in candidate clustering set. The experimental results show that the clustering quality of HU-Clustering is higher than that of UMicro.

源语言英语
主期刊名2010 International Conference on Machine Learning and Cybernetics, ICMLC 2010
2059-2064
页数6
DOI
出版状态已出版 - 2010
活动2010 International Conference on Machine Learning and Cybernetics, ICMLC 2010 - Qingdao, 中国
期限: 11 7月 201014 7月 2010

出版系列

姓名2010 International Conference on Machine Learning and Cybernetics, ICMLC 2010
4

会议

会议2010 International Conference on Machine Learning and Cybernetics, ICMLC 2010
国家/地区中国
Qingdao
时期11/07/1014/07/10

指纹

探究 'An algorithm for clustering heterogeneous data streams with uncertainty' 的科研主题。它们共同构成独一无二的指纹。

引用此