TY - JOUR
T1 - A novel open-set clustering algorithm
AU - Li, Qi
AU - Yan, Guochen
AU - Wang, Shuliang
AU - Zhao, Boxiang
N1 - Publisher Copyright:
© 2023 Elsevier Inc.
PY - 2023/11
Y1 - 2023/11
N2 - DOS (Delta Open Set) is an interesting clustering algorithm that transforms cluster identification into set identification. It identifies the objects whose neighborhoods coincide as an open-set, and an open-set corresponds to a cluster. However, once the dataset is complex, DOS tends to identify overlapping clusters as one category. We believe the main reason is that DOS unifies the neighborhood radius by a specific function, resulting in the inability to cope with various object distributions. To improve DOS, we propose DOS-IN (Irregular Neighborhoods). Specifically, DOS-IN generates irregular neighborhoods based on the similarity between objects to self-adapt to diverse object distributions. As a result, DOS-IN not only can accurately distinguish overlapping clusters but also has fewer input parameters. In addition, DOS-IN introduces the small-cluster merging mechanism to address the shortcoming of DOS in recognizing Gaussian clusters. The experimental results show that DOS-IN is completely superior to DOS. Compared with baseline methods, DOS-IN outperforms them on 7 out of 10 datasets, with at least 13.8% (NMI) and 2.4% (RI) improvement in accuracy. The code of DOS-IN is available at https://github.com/Youth-49/2023-DOS-IN.
AB - DOS (Delta Open Set) is an interesting clustering algorithm that transforms cluster identification into set identification. It identifies the objects whose neighborhoods coincide as an open-set, and an open-set corresponds to a cluster. However, once the dataset is complex, DOS tends to identify overlapping clusters as one category. We believe the main reason is that DOS unifies the neighborhood radius by a specific function, resulting in the inability to cope with various object distributions. To improve DOS, we propose DOS-IN (Irregular Neighborhoods). Specifically, DOS-IN generates irregular neighborhoods based on the similarity between objects to self-adapt to diverse object distributions. As a result, DOS-IN not only can accurately distinguish overlapping clusters but also has fewer input parameters. In addition, DOS-IN introduces the small-cluster merging mechanism to address the shortcoming of DOS in recognizing Gaussian clusters. The experimental results show that DOS-IN is completely superior to DOS. Compared with baseline methods, DOS-IN outperforms them on 7 out of 10 datasets, with at least 13.8% (NMI) and 2.4% (RI) improvement in accuracy. The code of DOS-IN is available at https://github.com/Youth-49/2023-DOS-IN.
KW - Clustering
KW - Irregular neighborhood
KW - Open-set
UR - http://www.scopus.com/inward/record.url?scp=85168799854&partnerID=8YFLogxK
U2 - 10.1016/j.ins.2023.119561
DO - 10.1016/j.ins.2023.119561
M3 - Article
AN - SCOPUS:85168799854
SN - 0020-0255
VL - 648
JO - Information Sciences
JF - Information Sciences
M1 - 119561
ER -