A novel open-set clustering algorithm

Qi Li; Guochen Yan; Shuliang Wang; Boxiang Zhao

doi:10.1016/j.ins.2023.119561

A novel open-set clustering algorithm

Qi Li, Guochen Yan, Shuliang Wang^*, Boxiang Zhao

^*Corresponding author for this work

School of Computer Science and Technology

Research output: Contribution to journal › Article › peer-review

Abstract

DOS (Delta Open Set) is an interesting clustering algorithm that transforms cluster identification into set identification. It identifies the objects whose neighborhoods coincide as an open-set, and an open-set corresponds to a cluster. However, once the dataset is complex, DOS tends to identify overlapping clusters as one category. We believe the main reason is that DOS unifies the neighborhood radius by a specific function, resulting in the inability to cope with various object distributions. To improve DOS, we propose DOS-IN (Irregular Neighborhoods). Specifically, DOS-IN generates irregular neighborhoods based on the similarity between objects to self-adapt to diverse object distributions. As a result, DOS-IN not only can accurately distinguish overlapping clusters but also has fewer input parameters. In addition, DOS-IN introduces the small-cluster merging mechanism to address the shortcoming of DOS in recognizing Gaussian clusters. The experimental results show that DOS-IN is completely superior to DOS. Compared with baseline methods, DOS-IN outperforms them on 7 out of 10 datasets, with at least 13.8% (NMI) and 2.4% (RI) improvement in accuracy. The code of DOS-IN is available at https://github.com/Youth-49/2023-DOS-IN.

Original language	English
Article number	119561
Journal	Information Sciences
Volume	648
DOIs	https://doi.org/10.1016/j.ins.2023.119561
Publication status	Published - Nov 2023

Keywords

Clustering
Irregular neighborhood
Open-set

Access to Document

10.1016/j.ins.2023.119561

Cite this

Li, Q., Yan, G., Wang, S., & Zhao, B. (2023). A novel open-set clustering algorithm. Information Sciences, 648, Article 119561. https://doi.org/10.1016/j.ins.2023.119561

@article{29bbd40506894fe9836ee9bac785a1fe,

title = "A novel open-set clustering algorithm",

abstract = "DOS (Delta Open Set) is an interesting clustering algorithm that transforms cluster identification into set identification. It identifies the objects whose neighborhoods coincide as an open-set, and an open-set corresponds to a cluster. However, once the dataset is complex, DOS tends to identify overlapping clusters as one category. We believe the main reason is that DOS unifies the neighborhood radius by a specific function, resulting in the inability to cope with various object distributions. To improve DOS, we propose DOS-IN (Irregular Neighborhoods). Specifically, DOS-IN generates irregular neighborhoods based on the similarity between objects to self-adapt to diverse object distributions. As a result, DOS-IN not only can accurately distinguish overlapping clusters but also has fewer input parameters. In addition, DOS-IN introduces the small-cluster merging mechanism to address the shortcoming of DOS in recognizing Gaussian clusters. The experimental results show that DOS-IN is completely superior to DOS. Compared with baseline methods, DOS-IN outperforms them on 7 out of 10 datasets, with at least 13.8% (NMI) and 2.4% (RI) improvement in accuracy. The code of DOS-IN is available at https://github.com/Youth-49/2023-DOS-IN.",

keywords = "Clustering, Irregular neighborhood, Open-set",

author = "Qi Li and Guochen Yan and Shuliang Wang and Boxiang Zhao",

note = "Publisher Copyright: {\textcopyright} 2023 Elsevier Inc.",

year = "2023",

month = nov,

doi = "10.1016/j.ins.2023.119561",

language = "English",

volume = "648",

journal = "Information Sciences",

issn = "0020-0255",

publisher = "Elsevier Inc.",

}

TY - JOUR

T1 - A novel open-set clustering algorithm

AU - Li, Qi

AU - Yan, Guochen

AU - Wang, Shuliang

AU - Zhao, Boxiang

PY - 2023/11

Y1 - 2023/11

N2 - DOS (Delta Open Set) is an interesting clustering algorithm that transforms cluster identification into set identification. It identifies the objects whose neighborhoods coincide as an open-set, and an open-set corresponds to a cluster. However, once the dataset is complex, DOS tends to identify overlapping clusters as one category. We believe the main reason is that DOS unifies the neighborhood radius by a specific function, resulting in the inability to cope with various object distributions. To improve DOS, we propose DOS-IN (Irregular Neighborhoods). Specifically, DOS-IN generates irregular neighborhoods based on the similarity between objects to self-adapt to diverse object distributions. As a result, DOS-IN not only can accurately distinguish overlapping clusters but also has fewer input parameters. In addition, DOS-IN introduces the small-cluster merging mechanism to address the shortcoming of DOS in recognizing Gaussian clusters. The experimental results show that DOS-IN is completely superior to DOS. Compared with baseline methods, DOS-IN outperforms them on 7 out of 10 datasets, with at least 13.8% (NMI) and 2.4% (RI) improvement in accuracy. The code of DOS-IN is available at https://github.com/Youth-49/2023-DOS-IN.

AB - DOS (Delta Open Set) is an interesting clustering algorithm that transforms cluster identification into set identification. It identifies the objects whose neighborhoods coincide as an open-set, and an open-set corresponds to a cluster. However, once the dataset is complex, DOS tends to identify overlapping clusters as one category. We believe the main reason is that DOS unifies the neighborhood radius by a specific function, resulting in the inability to cope with various object distributions. To improve DOS, we propose DOS-IN (Irregular Neighborhoods). Specifically, DOS-IN generates irregular neighborhoods based on the similarity between objects to self-adapt to diverse object distributions. As a result, DOS-IN not only can accurately distinguish overlapping clusters but also has fewer input parameters. In addition, DOS-IN introduces the small-cluster merging mechanism to address the shortcoming of DOS in recognizing Gaussian clusters. The experimental results show that DOS-IN is completely superior to DOS. Compared with baseline methods, DOS-IN outperforms them on 7 out of 10 datasets, with at least 13.8% (NMI) and 2.4% (RI) improvement in accuracy. The code of DOS-IN is available at https://github.com/Youth-49/2023-DOS-IN.

KW - Clustering

KW - Irregular neighborhood

KW - Open-set

UR - http://www.scopus.com/inward/record.url?scp=85168799854&partnerID=8YFLogxK

U2 - 10.1016/j.ins.2023.119561

DO - 10.1016/j.ins.2023.119561

M3 - Article

AN - SCOPUS:85168799854

SN - 0020-0255

VL - 648

JO - Information Sciences

JF - Information Sciences

M1 - 119561

ER -

A novel open-set clustering algorithm

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this