TY - JOUR
T1 - Extreme clustering – A clustering method via density extreme points
AU - Wang, Shuliang
AU - Li, Qi
AU - Zhao, Chuanfeng
AU - Zhu, Xingquan
AU - Yuan, Hanning
AU - Dai, Tianru
N1 - Publisher Copyright:
© 2020 Elsevier Inc.
PY - 2021/1/4
Y1 - 2021/1/4
N2 - Peak clustering, a density based clustering method, has shown remarkable performance in clustering analysis of data. In reality, peak clustering suffers from two major drawbacks: (i) when the difference in cluster sample density is significant, it becomes difficult for peak clustering to find cluster centres in low density clusters. (ii) in some cases, it will incorrectly detect many normal points as noises. In this paper, we propose a new extreme clustering method to overcome the drawbacks of peak clustering. The theme of extreme clustering is to identify density extreme points to find cluster centres. In addition, a noise detection module is also introduced to identify noisy data points from the clustering results. As a result, the extreme clustering is robust to datasets with different density distributions. Experiments and validations, on over 40 datasets, show that extreme clustering can not only inherit the cluster validity of peak clustering, but also overcome its shortages with significant performance gain. Case studies on real-world haze analysis also demonstrate the performance of extreme clustering method in finding some main haze origins in a Chinese city.
AB - Peak clustering, a density based clustering method, has shown remarkable performance in clustering analysis of data. In reality, peak clustering suffers from two major drawbacks: (i) when the difference in cluster sample density is significant, it becomes difficult for peak clustering to find cluster centres in low density clusters. (ii) in some cases, it will incorrectly detect many normal points as noises. In this paper, we propose a new extreme clustering method to overcome the drawbacks of peak clustering. The theme of extreme clustering is to identify density extreme points to find cluster centres. In addition, a noise detection module is also introduced to identify noisy data points from the clustering results. As a result, the extreme clustering is robust to datasets with different density distributions. Experiments and validations, on over 40 datasets, show that extreme clustering can not only inherit the cluster validity of peak clustering, but also overcome its shortages with significant performance gain. Case studies on real-world haze analysis also demonstrate the performance of extreme clustering method in finding some main haze origins in a Chinese city.
KW - Clustering
KW - Density
KW - Density peak clustering
KW - Extreme point
KW - Haze analysis
UR - http://www.scopus.com/inward/record.url?scp=85088044346&partnerID=8YFLogxK
U2 - 10.1016/j.ins.2020.06.069
DO - 10.1016/j.ins.2020.06.069
M3 - Article
AN - SCOPUS:85088044346
SN - 0020-0255
VL - 542
SP - 24
EP - 39
JO - Information Sciences
JF - Information Sciences
ER -