An algorithm for clustering heterogeneous data streams with uncertainty

Guo Yan Huang*, Da Peng Liang, Chang Zhen Hu, Jia Dong Ren

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

9 Citations (Scopus)

Abstract

In many applications, the heterogeneous data streams with uncertainty are ubiquitous. However, the clustering quality of the existing methods for clustering heterogeneous data streams with uncertainty is lower. In this paper, an algorithm for clustering heterogeneous data streams with uncertainty, called HU-Clustering, is proposed. A Heterogeneous Uncertainty Clustering Feature (H-UCF) is presented to describe the feature of heterogeneous data streams with uncertainty. Based on H-UCF, a probability frequency histogram is proposed to track the statistics of categorical attributes; the algorithm initially creates n clusters by k-prototypes algorithm. In order to improve clustering quality, a two phase streams clustering selection process is applied to HU-Clustering algorithm. Firstly, the candidate clustering is selected through the new similarity measure; secondly, the most similar cluster for each new arriving tuple is selected through clustering uncertainty in candidate clustering set. The experimental results show that the clustering quality of HU-Clustering is higher than that of UMicro.

Original languageEnglish
Title of host publication2010 International Conference on Machine Learning and Cybernetics, ICMLC 2010
Pages2059-2064
Number of pages6
DOIs
Publication statusPublished - 2010
Event2010 International Conference on Machine Learning and Cybernetics, ICMLC 2010 - Qingdao, China
Duration: 11 Jul 201014 Jul 2010

Publication series

Name2010 International Conference on Machine Learning and Cybernetics, ICMLC 2010
Volume4

Conference

Conference2010 International Conference on Machine Learning and Cybernetics, ICMLC 2010
Country/TerritoryChina
CityQingdao
Period11/07/1014/07/10

Keywords

  • Clustering
  • Heterogeneous attributes
  • Probability frequency histogram
  • Uncertain data stream

Fingerprint

Dive into the research topics of 'An algorithm for clustering heterogeneous data streams with uncertainty'. Together they form a unique fingerprint.

Cite this