Constraint-based clustering algorithm for multi-density data and arbitrary shapes

Walid Atwa, Kan Li

科研成果: 书/报告/会议事项章节会议稿件同行评审

1 引用 (Scopus)

摘要

The purpose of data clustering is to identify useful patterns in the underlying dataset. However, finding clusters in data is a challenging problem especially when the clusters are being of widely varied shapes, sizes, and densities. Density-based clustering methods are the most important due to their high ability to detect arbitrary shaped clusters. Moreover these methods often show good noise-handling capabilities. Existing methods are based on DBSCAN which depends on two specified parameters (Eps and Minpts) that define a single density. Moreover, most of these methods are unsupervised, which cannot improve the clustering quality by utilizing a small number of prior knowledge. In this paper we show how background knowledge can be used to bias a density- based clustering algorithm for multi-density data. First we divide the dataset into different density levels and detect suitable density parameters for each density level. Then we describe how pairwise constraints can be used to help the algorithm expanding the clustering process based on the computed density parameters. Experimental results on both synthetic and real datasets confirm that the proposed algorithm gives better results than other semi-supervised and unsupervised clustering algorithms.

源语言英语
主期刊名Advances in Data Mining
主期刊副标题Applications and Theoretical Aspects - 17th Industrial Conference, ICDM 2017, Proceedings
编辑Petra Perner
出版商Springer Verlag
78-92
页数15
ISBN(印刷版)9783319627007
DOI
出版状态已出版 - 2017
活动17th Industrial Conference on Advances in Data Mining, ICDM 2017 - New York, 美国
期限: 12 7月 201713 7月 2017

出版系列

姓名Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
10357 LNAI
ISSN(印刷版)0302-9743
ISSN(电子版)1611-3349

会议

会议17th Industrial Conference on Advances in Data Mining, ICDM 2017
国家/地区美国
New York
时期12/07/1713/07/17

指纹

探究 'Constraint-based clustering algorithm for multi-density data and arbitrary shapes' 的科研主题。它们共同构成独一无二的指纹。

引用此