Constraint-based clustering algorithm for multi-density data and arbitrary shapes

Walid Atwa; Kan Li

doi:10.1007/978-3-319-62701-4_7

Constraint-based clustering algorithm for multi-density data and arbitrary shapes

Walid Atwa, Kan Li

School of Computer Science and Technology

Menoufia University

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

3 Citations (Scopus)

Abstract

The purpose of data clustering is to identify useful patterns in the underlying dataset. However, finding clusters in data is a challenging problem especially when the clusters are being of widely varied shapes, sizes, and densities. Density-based clustering methods are the most important due to their high ability to detect arbitrary shaped clusters. Moreover these methods often show good noise-handling capabilities. Existing methods are based on DBSCAN which depends on two specified parameters (Eps and Minpts) that define a single density. Moreover, most of these methods are unsupervised, which cannot improve the clustering quality by utilizing a small number of prior knowledge. In this paper we show how background knowledge can be used to bias a density- based clustering algorithm for multi-density data. First we divide the dataset into different density levels and detect suitable density parameters for each density level. Then we describe how pairwise constraints can be used to help the algorithm expanding the clustering process based on the computed density parameters. Experimental results on both synthetic and real datasets confirm that the proposed algorithm gives better results than other semi-supervised and unsupervised clustering algorithms.

Original language	English
Title of host publication	Advances in Data Mining
Subtitle of host publication	Applications and Theoretical Aspects - 17th Industrial Conference, ICDM 2017, Proceedings
Editors	Petra Perner
Publisher	Springer Verlag
Pages	78-92
Number of pages	15
ISBN (Print)	9783319627007
DOIs	https://doi.org/10.1007/978-3-319-62701-4_7
Publication status	Published - 2017
Event	17th Industrial Conference on Advances in Data Mining, ICDM 2017 - New York, United States Duration: 12 Jul 2017 → 13 Jul 2017

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	10357 LNAI
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	17th Industrial Conference on Advances in Data Mining, ICDM 2017
Country/Territory	United States
City	New York
Period	12/07/17 → 13/07/17

Keywords

Multi-density data
Pairwise constraint
Semi-supervised clustering

Access to Document

10.1007/978-3-319-62701-4_7

Cite this

Atwa, W., & Li, K. (2017). Constraint-based clustering algorithm for multi-density data and arbitrary shapes. In P. Perner (Ed.), Advances in Data Mining: Applications and Theoretical Aspects - 17th Industrial Conference, ICDM 2017, Proceedings (pp. 78-92). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10357 LNAI). Springer Verlag. https://doi.org/10.1007/978-3-319-62701-4_7

Atwa, Walid ; Li, Kan. / Constraint-based clustering algorithm for multi-density data and arbitrary shapes. Advances in Data Mining: Applications and Theoretical Aspects - 17th Industrial Conference, ICDM 2017, Proceedings. editor / Petra Perner. Springer Verlag, 2017. pp. 78-92 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{c0adf3f6d231448e9f4468e341fa6966,

title = "Constraint-based clustering algorithm for multi-density data and arbitrary shapes",

abstract = "The purpose of data clustering is to identify useful patterns in the underlying dataset. However, finding clusters in data is a challenging problem especially when the clusters are being of widely varied shapes, sizes, and densities. Density-based clustering methods are the most important due to their high ability to detect arbitrary shaped clusters. Moreover these methods often show good noise-handling capabilities. Existing methods are based on DBSCAN which depends on two specified parameters (Eps and Minpts) that define a single density. Moreover, most of these methods are unsupervised, which cannot improve the clustering quality by utilizing a small number of prior knowledge. In this paper we show how background knowledge can be used to bias a density- based clustering algorithm for multi-density data. First we divide the dataset into different density levels and detect suitable density parameters for each density level. Then we describe how pairwise constraints can be used to help the algorithm expanding the clustering process based on the computed density parameters. Experimental results on both synthetic and real datasets confirm that the proposed algorithm gives better results than other semi-supervised and unsupervised clustering algorithms.",

keywords = "Multi-density data, Pairwise constraint, Semi-supervised clustering",

author = "Walid Atwa and Kan Li",

note = "Publisher Copyright: {\textcopyright} Springer International Publishing AG 2017.; 17th Industrial Conference on Advances in Data Mining, ICDM 2017 ; Conference date: 12-07-2017 Through 13-07-2017",

year = "2017",

doi = "10.1007/978-3-319-62701-4_7",

language = "English",

isbn = "9783319627007",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Verlag",

pages = "78--92",

editor = "Petra Perner",

booktitle = "Advances in Data Mining",

address = "Germany",

}

Atwa, W & Li, K 2017, Constraint-based clustering algorithm for multi-density data and arbitrary shapes. in P Perner (ed.), Advances in Data Mining: Applications and Theoretical Aspects - 17th Industrial Conference, ICDM 2017, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 10357 LNAI, Springer Verlag, pp. 78-92, 17th Industrial Conference on Advances in Data Mining, ICDM 2017, New York, United States, 12/07/17. https://doi.org/10.1007/978-3-319-62701-4_7

Constraint-based clustering algorithm for multi-density data and arbitrary shapes. / Atwa, Walid; Li, Kan.
Advances in Data Mining: Applications and Theoretical Aspects - 17th Industrial Conference, ICDM 2017, Proceedings. ed. / Petra Perner. Springer Verlag, 2017. p. 78-92 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10357 LNAI).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Constraint-based clustering algorithm for multi-density data and arbitrary shapes

AU - Atwa, Walid

AU - Li, Kan

N1 - Publisher Copyright: © Springer International Publishing AG 2017.

PY - 2017

Y1 - 2017

N2 - The purpose of data clustering is to identify useful patterns in the underlying dataset. However, finding clusters in data is a challenging problem especially when the clusters are being of widely varied shapes, sizes, and densities. Density-based clustering methods are the most important due to their high ability to detect arbitrary shaped clusters. Moreover these methods often show good noise-handling capabilities. Existing methods are based on DBSCAN which depends on two specified parameters (Eps and Minpts) that define a single density. Moreover, most of these methods are unsupervised, which cannot improve the clustering quality by utilizing a small number of prior knowledge. In this paper we show how background knowledge can be used to bias a density- based clustering algorithm for multi-density data. First we divide the dataset into different density levels and detect suitable density parameters for each density level. Then we describe how pairwise constraints can be used to help the algorithm expanding the clustering process based on the computed density parameters. Experimental results on both synthetic and real datasets confirm that the proposed algorithm gives better results than other semi-supervised and unsupervised clustering algorithms.

AB - The purpose of data clustering is to identify useful patterns in the underlying dataset. However, finding clusters in data is a challenging problem especially when the clusters are being of widely varied shapes, sizes, and densities. Density-based clustering methods are the most important due to their high ability to detect arbitrary shaped clusters. Moreover these methods often show good noise-handling capabilities. Existing methods are based on DBSCAN which depends on two specified parameters (Eps and Minpts) that define a single density. Moreover, most of these methods are unsupervised, which cannot improve the clustering quality by utilizing a small number of prior knowledge. In this paper we show how background knowledge can be used to bias a density- based clustering algorithm for multi-density data. First we divide the dataset into different density levels and detect suitable density parameters for each density level. Then we describe how pairwise constraints can be used to help the algorithm expanding the clustering process based on the computed density parameters. Experimental results on both synthetic and real datasets confirm that the proposed algorithm gives better results than other semi-supervised and unsupervised clustering algorithms.

KW - Multi-density data

KW - Pairwise constraint

KW - Semi-supervised clustering

UR - http://www.scopus.com/inward/record.url?scp=85025175305&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-62701-4_7

DO - 10.1007/978-3-319-62701-4_7

M3 - Conference contribution

AN - SCOPUS:85025175305

SN - 9783319627007

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 78

EP - 92

BT - Advances in Data Mining

A2 - Perner, Petra

PB - Springer Verlag

T2 - 17th Industrial Conference on Advances in Data Mining, ICDM 2017

Y2 - 12 July 2017 through 13 July 2017

ER -

Atwa W, Li K. Constraint-based clustering algorithm for multi-density data and arbitrary shapes. In Perner P, editor, Advances in Data Mining: Applications and Theoretical Aspects - 17th Industrial Conference, ICDM 2017, Proceedings. Springer Verlag. 2017. p. 78-92. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-319-62701-4_7

Constraint-based clustering algorithm for multi-density data and arbitrary shapes

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this