Constraint-based clustering algorithm for multi-density data and arbitrary shapes

Walid Atwa, Kan Li

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

The purpose of data clustering is to identify useful patterns in the underlying dataset. However, finding clusters in data is a challenging problem especially when the clusters are being of widely varied shapes, sizes, and densities. Density-based clustering methods are the most important due to their high ability to detect arbitrary shaped clusters. Moreover these methods often show good noise-handling capabilities. Existing methods are based on DBSCAN which depends on two specified parameters (Eps and Minpts) that define a single density. Moreover, most of these methods are unsupervised, which cannot improve the clustering quality by utilizing a small number of prior knowledge. In this paper we show how background knowledge can be used to bias a density- based clustering algorithm for multi-density data. First we divide the dataset into different density levels and detect suitable density parameters for each density level. Then we describe how pairwise constraints can be used to help the algorithm expanding the clustering process based on the computed density parameters. Experimental results on both synthetic and real datasets confirm that the proposed algorithm gives better results than other semi-supervised and unsupervised clustering algorithms.

Original languageEnglish
Title of host publicationAdvances in Data Mining
Subtitle of host publicationApplications and Theoretical Aspects - 17th Industrial Conference, ICDM 2017, Proceedings
EditorsPetra Perner
PublisherSpringer Verlag
Pages78-92
Number of pages15
ISBN (Print)9783319627007
DOIs
Publication statusPublished - 2017
Event17th Industrial Conference on Advances in Data Mining, ICDM 2017 - New York, United States
Duration: 12 Jul 201713 Jul 2017

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10357 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference17th Industrial Conference on Advances in Data Mining, ICDM 2017
Country/TerritoryUnited States
CityNew York
Period12/07/1713/07/17

Keywords

  • Multi-density data
  • Pairwise constraint
  • Semi-supervised clustering

Fingerprint

Dive into the research topics of 'Constraint-based clustering algorithm for multi-density data and arbitrary shapes'. Together they form a unique fingerprint.

Cite this