TY - JOUR
T1 - HIBOG
T2 - Improving the clustering accuracy by ameliorating dataset with gravitation
AU - Li, Qi
AU - Wang, Shuliang
AU - Zhao, Chuanfeng
AU - Zhao, Boxiang
AU - Yue, Xin
AU - Geng, Jing
N1 - Publisher Copyright:
© 2020 Elsevier Inc.
PY - 2021/3
Y1 - 2021/3
N2 - Clustering is an important technology applied in many fields. Most researchers focus on only clustering algorithms when they want more accurate results. However, this is not an optimal strategy because each algorithm has its unique advantages and disadvantages. Furthermore, a given algorithm cannot get satisfactory results on all datasets. In this paper, focusing on datasets, a method called HIBOG is proposed to improve the clustering accuracy by ameliorating datasets with gravitation. HIBOG can help many clustering algorithms acquire better results on more datasets by ameliorating datasets so that similar objects get closer and dissimilar objects separate further apart. As a result, ameliorated datasets are friendlier to many clustering algorithms than original datasets. Though datasets are diverse, HIBOG can cope with the diversity to some extent due to its robustness to high dimensional datasets, Gaussian distribution datasets, shaped datasets, and datasets with high overlap clusters. We have conducted numerous experiments on real-world datasets to verify the effectiveness of HIBOG. The experiments demonstrated that HIBOG successfully improves the accuracy of different clustering algorithms, and accuracy increases by an average of 113.4% (except maximum and minimum). Moreover, compared with other similar methods, HIBOG improves much higher clustering accuracy and dramatically shortens the running time. At the same time, we conducted 360 experiments, each of which selected different parameter values. The experiments show that most values enable HIBOG to ameliorate datasets, and HIBOG has strong robustness to the parameter adjustment.
AB - Clustering is an important technology applied in many fields. Most researchers focus on only clustering algorithms when they want more accurate results. However, this is not an optimal strategy because each algorithm has its unique advantages and disadvantages. Furthermore, a given algorithm cannot get satisfactory results on all datasets. In this paper, focusing on datasets, a method called HIBOG is proposed to improve the clustering accuracy by ameliorating datasets with gravitation. HIBOG can help many clustering algorithms acquire better results on more datasets by ameliorating datasets so that similar objects get closer and dissimilar objects separate further apart. As a result, ameliorated datasets are friendlier to many clustering algorithms than original datasets. Though datasets are diverse, HIBOG can cope with the diversity to some extent due to its robustness to high dimensional datasets, Gaussian distribution datasets, shaped datasets, and datasets with high overlap clusters. We have conducted numerous experiments on real-world datasets to verify the effectiveness of HIBOG. The experiments demonstrated that HIBOG successfully improves the accuracy of different clustering algorithms, and accuracy increases by an average of 113.4% (except maximum and minimum). Moreover, compared with other similar methods, HIBOG improves much higher clustering accuracy and dramatically shortens the running time. At the same time, we conducted 360 experiments, each of which selected different parameter values. The experiments show that most values enable HIBOG to ameliorate datasets, and HIBOG has strong robustness to the parameter adjustment.
KW - Clustering
KW - Good datasets
KW - Gravitation
KW - Improving accuracy
UR - http://www.scopus.com/inward/record.url?scp=85095582467&partnerID=8YFLogxK
U2 - 10.1016/j.ins.2020.10.046
DO - 10.1016/j.ins.2020.10.046
M3 - Article
AN - SCOPUS:85095582467
SN - 0020-0255
VL - 550
SP - 41
EP - 56
JO - Information Sciences
JF - Information Sciences
ER -