An algorithm for clustering heterogeneous data streams with uncertainty

Guo Yan Huang; Da Peng Liang; Chang Zhen Hu; Jia Dong Ren

doi:10.1109/ICMLC.2010.5580502

An algorithm for clustering heterogeneous data streams with uncertainty

Guo Yan Huang^*, Da Peng Liang, Chang Zhen Hu, Jia Dong Ren

^*此作品的通讯作者

网络空间安全学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

9 引用（Scopus）

摘要

In many applications, the heterogeneous data streams with uncertainty are ubiquitous. However, the clustering quality of the existing methods for clustering heterogeneous data streams with uncertainty is lower. In this paper, an algorithm for clustering heterogeneous data streams with uncertainty, called HU-Clustering, is proposed. A Heterogeneous Uncertainty Clustering Feature (H-UCF) is presented to describe the feature of heterogeneous data streams with uncertainty. Based on H-UCF, a probability frequency histogram is proposed to track the statistics of categorical attributes; the algorithm initially creates n clusters by k-prototypes algorithm. In order to improve clustering quality, a two phase streams clustering selection process is applied to HU-Clustering algorithm. Firstly, the candidate clustering is selected through the new similarity measure; secondly, the most similar cluster for each new arriving tuple is selected through clustering uncertainty in candidate clustering set. The experimental results show that the clustering quality of HU-Clustering is higher than that of UMicro.

源语言	英语
主期刊名	2010 International Conference on Machine Learning and Cybernetics, ICMLC 2010
页	2059-2064
页数	6
DOI	https://doi.org/10.1109/ICMLC.2010.5580502
出版状态	已出版 - 2010
活动	2010 International Conference on Machine Learning and Cybernetics, ICMLC 2010 - Qingdao, 中国期限: 11 7月 2010 → 14 7月 2010

出版系列

姓名	2010 International Conference on Machine Learning and Cybernetics, ICMLC 2010
卷	4

会议

会议	2010 International Conference on Machine Learning and Cybernetics, ICMLC 2010
国家/地区	中国
市	Qingdao
时期	11/07/10 → 14/07/10

访问文件

10.1109/ICMLC.2010.5580502

其它文件与链接

链接到 Scopus 的出版物

引用此

Huang, G. Y., Liang, D. P., Hu, C. Z., & Ren, J. D. (2010). An algorithm for clustering heterogeneous data streams with uncertainty. 在 2010 International Conference on Machine Learning and Cybernetics, ICMLC 2010 (页码 2059-2064). 文章 5580502 (2010 International Conference on Machine Learning and Cybernetics, ICMLC 2010; 卷 4). https://doi.org/10.1109/ICMLC.2010.5580502

@inproceedings{e897a97739574604b3b4d665099177f1,

title = "An algorithm for clustering heterogeneous data streams with uncertainty",

abstract = "In many applications, the heterogeneous data streams with uncertainty are ubiquitous. However, the clustering quality of the existing methods for clustering heterogeneous data streams with uncertainty is lower. In this paper, an algorithm for clustering heterogeneous data streams with uncertainty, called HU-Clustering, is proposed. A Heterogeneous Uncertainty Clustering Feature (H-UCF) is presented to describe the feature of heterogeneous data streams with uncertainty. Based on H-UCF, a probability frequency histogram is proposed to track the statistics of categorical attributes; the algorithm initially creates n clusters by k-prototypes algorithm. In order to improve clustering quality, a two phase streams clustering selection process is applied to HU-Clustering algorithm. Firstly, the candidate clustering is selected through the new similarity measure; secondly, the most similar cluster for each new arriving tuple is selected through clustering uncertainty in candidate clustering set. The experimental results show that the clustering quality of HU-Clustering is higher than that of UMicro.",

keywords = "Clustering, Heterogeneous attributes, Probability frequency histogram, Uncertain data stream",

author = "Huang, {Guo Yan} and Liang, {Da Peng} and Hu, {Chang Zhen} and Ren, {Jia Dong}",

year = "2010",

doi = "10.1109/ICMLC.2010.5580502",

language = "English",

isbn = "9781424465262",

series = "2010 International Conference on Machine Learning and Cybernetics, ICMLC 2010",

pages = "2059--2064",

booktitle = "2010 International Conference on Machine Learning and Cybernetics, ICMLC 2010",

note = "2010 International Conference on Machine Learning and Cybernetics, ICMLC 2010 ; Conference date: 11-07-2010 Through 14-07-2010",

}

Huang, GY, Liang, DP, Hu, CZ & Ren, JD 2010, An algorithm for clustering heterogeneous data streams with uncertainty. 在 2010 International Conference on Machine Learning and Cybernetics, ICMLC 2010., 5580502, 2010 International Conference on Machine Learning and Cybernetics, ICMLC 2010, 卷 4, 页码 2059-2064, 2010 International Conference on Machine Learning and Cybernetics, ICMLC 2010, Qingdao, 中国, 11/07/10. https://doi.org/10.1109/ICMLC.2010.5580502

An algorithm for clustering heterogeneous data streams with uncertainty. / Huang, Guo Yan; Liang, Da Peng; Hu, Chang Zhen 等.
2010 International Conference on Machine Learning and Cybernetics, ICMLC 2010. 2010. 页码 2059-2064 5580502 (2010 International Conference on Machine Learning and Cybernetics, ICMLC 2010; 卷 4).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - An algorithm for clustering heterogeneous data streams with uncertainty

AU - Huang, Guo Yan

AU - Liang, Da Peng

AU - Hu, Chang Zhen

AU - Ren, Jia Dong

PY - 2010

Y1 - 2010

N2 - In many applications, the heterogeneous data streams with uncertainty are ubiquitous. However, the clustering quality of the existing methods for clustering heterogeneous data streams with uncertainty is lower. In this paper, an algorithm for clustering heterogeneous data streams with uncertainty, called HU-Clustering, is proposed. A Heterogeneous Uncertainty Clustering Feature (H-UCF) is presented to describe the feature of heterogeneous data streams with uncertainty. Based on H-UCF, a probability frequency histogram is proposed to track the statistics of categorical attributes; the algorithm initially creates n clusters by k-prototypes algorithm. In order to improve clustering quality, a two phase streams clustering selection process is applied to HU-Clustering algorithm. Firstly, the candidate clustering is selected through the new similarity measure; secondly, the most similar cluster for each new arriving tuple is selected through clustering uncertainty in candidate clustering set. The experimental results show that the clustering quality of HU-Clustering is higher than that of UMicro.

AB - In many applications, the heterogeneous data streams with uncertainty are ubiquitous. However, the clustering quality of the existing methods for clustering heterogeneous data streams with uncertainty is lower. In this paper, an algorithm for clustering heterogeneous data streams with uncertainty, called HU-Clustering, is proposed. A Heterogeneous Uncertainty Clustering Feature (H-UCF) is presented to describe the feature of heterogeneous data streams with uncertainty. Based on H-UCF, a probability frequency histogram is proposed to track the statistics of categorical attributes; the algorithm initially creates n clusters by k-prototypes algorithm. In order to improve clustering quality, a two phase streams clustering selection process is applied to HU-Clustering algorithm. Firstly, the candidate clustering is selected through the new similarity measure; secondly, the most similar cluster for each new arriving tuple is selected through clustering uncertainty in candidate clustering set. The experimental results show that the clustering quality of HU-Clustering is higher than that of UMicro.

KW - Clustering

KW - Heterogeneous attributes

KW - Probability frequency histogram

KW - Uncertain data stream

UR - http://www.scopus.com/inward/record.url?scp=78149310432&partnerID=8YFLogxK

U2 - 10.1109/ICMLC.2010.5580502

DO - 10.1109/ICMLC.2010.5580502

M3 - Conference contribution

AN - SCOPUS:78149310432

SN - 9781424465262

T3 - 2010 International Conference on Machine Learning and Cybernetics, ICMLC 2010

SP - 2059

EP - 2064

BT - 2010 International Conference on Machine Learning and Cybernetics, ICMLC 2010

T2 - 2010 International Conference on Machine Learning and Cybernetics, ICMLC 2010

Y2 - 11 July 2010 through 14 July 2010

ER -

An algorithm for clustering heterogeneous data streams with uncertainty

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此