How to improve the accuracy of clustering algorithms

Qi Li, Shuliang Wang*, Xianjun Zeng, Boxiang Zhao, Yingxu Dang

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

5 引用 (Scopus)

摘要

Clustering is an important data analysis technique. However, due to the diversity of datasets, each clustering algorithm is unable to produce satisfactory results on some particular datasets. In this paper, we propose a clustering optimization method called HIAC (Highly Improving the Accuracy of Clustering algorithms). By introducing gravitation, HIAC forces objects in the dataset to move towards similar objects, making the ameliorated dataset more friendly to clustering algorithms (i.e., clustering algorithms can produce more accurate results on the ameliorated dataset). HIAC is independent of clustering principle, so it can optimize different clustering algorithms. In contrast to other similar methods, HIAC is the first to adopt the selective-addition mechanism, i.e., only adding gravitation between valid-neighbors, to avoid dissimilar objects approaching each other. In order to identify valid-neighbors from neighbors, HIAC introduces a decision graph, from which the naked eye can observe a clear division threshold. Additionally, the decision graph can assist HIAC in reducing the negative effects that improper parameter values have on optimization. We conducted numerous experiments to test HIAC. Experiment results show that HIAC can effectively ameliorate high-dimensional datasets, Gaussian datasets, shape datasets, the datasets with outliers, and overlapping datasets. HIAC greatly improves the accuracy of clustering algorithms, and its improvement rates are far higher than that of similar methods. The average improvement rate is as high as 253.6% (except maximum and minimum). Moreover, its runtime is significantly shorter than that of most similar methods. More importantly, with different parameter values, the advantages of HIAC over similar methods are always maintained. The code of HIAC is available at https://github.com/qiqi12/HIAC.

源语言英语
页(从-至)52-70
页数19
期刊Information Sciences
627
DOI
出版状态已出版 - 5月 2023

指纹

探究 'How to improve the accuracy of clustering algorithms' 的科研主题。它们共同构成独一无二的指纹。

引用此