TY - JOUR
T1 - How to improve the accuracy of clustering algorithms
AU - Li, Qi
AU - Wang, Shuliang
AU - Zeng, Xianjun
AU - Zhao, Boxiang
AU - Dang, Yingxu
N1 - Publisher Copyright:
© 2023 Elsevier Inc.
PY - 2023/5
Y1 - 2023/5
N2 - Clustering is an important data analysis technique. However, due to the diversity of datasets, each clustering algorithm is unable to produce satisfactory results on some particular datasets. In this paper, we propose a clustering optimization method called HIAC (Highly Improving the Accuracy of Clustering algorithms). By introducing gravitation, HIAC forces objects in the dataset to move towards similar objects, making the ameliorated dataset more friendly to clustering algorithms (i.e., clustering algorithms can produce more accurate results on the ameliorated dataset). HIAC is independent of clustering principle, so it can optimize different clustering algorithms. In contrast to other similar methods, HIAC is the first to adopt the selective-addition mechanism, i.e., only adding gravitation between valid-neighbors, to avoid dissimilar objects approaching each other. In order to identify valid-neighbors from neighbors, HIAC introduces a decision graph, from which the naked eye can observe a clear division threshold. Additionally, the decision graph can assist HIAC in reducing the negative effects that improper parameter values have on optimization. We conducted numerous experiments to test HIAC. Experiment results show that HIAC can effectively ameliorate high-dimensional datasets, Gaussian datasets, shape datasets, the datasets with outliers, and overlapping datasets. HIAC greatly improves the accuracy of clustering algorithms, and its improvement rates are far higher than that of similar methods. The average improvement rate is as high as 253.6% (except maximum and minimum). Moreover, its runtime is significantly shorter than that of most similar methods. More importantly, with different parameter values, the advantages of HIAC over similar methods are always maintained. The code of HIAC is available at https://github.com/qiqi12/HIAC.
AB - Clustering is an important data analysis technique. However, due to the diversity of datasets, each clustering algorithm is unable to produce satisfactory results on some particular datasets. In this paper, we propose a clustering optimization method called HIAC (Highly Improving the Accuracy of Clustering algorithms). By introducing gravitation, HIAC forces objects in the dataset to move towards similar objects, making the ameliorated dataset more friendly to clustering algorithms (i.e., clustering algorithms can produce more accurate results on the ameliorated dataset). HIAC is independent of clustering principle, so it can optimize different clustering algorithms. In contrast to other similar methods, HIAC is the first to adopt the selective-addition mechanism, i.e., only adding gravitation between valid-neighbors, to avoid dissimilar objects approaching each other. In order to identify valid-neighbors from neighbors, HIAC introduces a decision graph, from which the naked eye can observe a clear division threshold. Additionally, the decision graph can assist HIAC in reducing the negative effects that improper parameter values have on optimization. We conducted numerous experiments to test HIAC. Experiment results show that HIAC can effectively ameliorate high-dimensional datasets, Gaussian datasets, shape datasets, the datasets with outliers, and overlapping datasets. HIAC greatly improves the accuracy of clustering algorithms, and its improvement rates are far higher than that of similar methods. The average improvement rate is as high as 253.6% (except maximum and minimum). Moreover, its runtime is significantly shorter than that of most similar methods. More importantly, with different parameter values, the advantages of HIAC over similar methods are always maintained. The code of HIAC is available at https://github.com/qiqi12/HIAC.
KW - Clustering optimization
KW - Gravitation
KW - Improving accuracy
UR - http://www.scopus.com/inward/record.url?scp=85146897063&partnerID=8YFLogxK
U2 - 10.1016/j.ins.2023.01.094
DO - 10.1016/j.ins.2023.01.094
M3 - Article
AN - SCOPUS:85146897063
SN - 0020-0255
VL - 627
SP - 52
EP - 70
JO - Information Sciences
JF - Information Sciences
ER -