How to improve the accuracy of clustering algorithms

Qi Li, Shuliang Wang*, Xianjun Zeng, Boxiang Zhao, Yingxu Dang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

5 Citations (Scopus)

Abstract

Clustering is an important data analysis technique. However, due to the diversity of datasets, each clustering algorithm is unable to produce satisfactory results on some particular datasets. In this paper, we propose a clustering optimization method called HIAC (Highly Improving the Accuracy of Clustering algorithms). By introducing gravitation, HIAC forces objects in the dataset to move towards similar objects, making the ameliorated dataset more friendly to clustering algorithms (i.e., clustering algorithms can produce more accurate results on the ameliorated dataset). HIAC is independent of clustering principle, so it can optimize different clustering algorithms. In contrast to other similar methods, HIAC is the first to adopt the selective-addition mechanism, i.e., only adding gravitation between valid-neighbors, to avoid dissimilar objects approaching each other. In order to identify valid-neighbors from neighbors, HIAC introduces a decision graph, from which the naked eye can observe a clear division threshold. Additionally, the decision graph can assist HIAC in reducing the negative effects that improper parameter values have on optimization. We conducted numerous experiments to test HIAC. Experiment results show that HIAC can effectively ameliorate high-dimensional datasets, Gaussian datasets, shape datasets, the datasets with outliers, and overlapping datasets. HIAC greatly improves the accuracy of clustering algorithms, and its improvement rates are far higher than that of similar methods. The average improvement rate is as high as 253.6% (except maximum and minimum). Moreover, its runtime is significantly shorter than that of most similar methods. More importantly, with different parameter values, the advantages of HIAC over similar methods are always maintained. The code of HIAC is available at https://github.com/qiqi12/HIAC.

Original languageEnglish
Pages (from-to)52-70
Number of pages19
JournalInformation Sciences
Volume627
DOIs
Publication statusPublished - May 2023

Keywords

  • Clustering optimization
  • Gravitation
  • Improving accuracy

Fingerprint

Dive into the research topics of 'How to improve the accuracy of clustering algorithms'. Together they form a unique fingerprint.

Cite this