TY - GEN
T1 - Active query selection for constraint-based clustering algorithms
AU - Atwa, Walid
AU - Li, Kan
PY - 2014
Y1 - 2014
N2 - Semi-supervised clustering uses a small amount of supervised data in the form of pairwise constraints to improve the clustering performance. However, most current methods are passive in the sense that the pairwise constraints are provided beforehand and selected randomly. This may lead to the use of constraints that are redundant, unnecessary, or even harmful to the clustering results. In this paper, we address the problem of constraint selection to improve the performance of constraint-based clustering algorithms. Based on the concepts of Maximum Mean Discrepancy, we select the set of most informative instances that minimizes the difference in distribution between the labeled and unlabeled data. Then, we query these instances with the existing neighborhoods to determine which neighborhood they belong. The experimental results with state-of-the-art methods on different real world dataset demonstrate the effectiveness and efficiency of the proposed method.
AB - Semi-supervised clustering uses a small amount of supervised data in the form of pairwise constraints to improve the clustering performance. However, most current methods are passive in the sense that the pairwise constraints are provided beforehand and selected randomly. This may lead to the use of constraints that are redundant, unnecessary, or even harmful to the clustering results. In this paper, we address the problem of constraint selection to improve the performance of constraint-based clustering algorithms. Based on the concepts of Maximum Mean Discrepancy, we select the set of most informative instances that minimizes the difference in distribution between the labeled and unlabeled data. Then, we query these instances with the existing neighborhoods to determine which neighborhood they belong. The experimental results with state-of-the-art methods on different real world dataset demonstrate the effectiveness and efficiency of the proposed method.
KW - Semi-supervised clustering
KW - active learning
KW - pairwise constrain
UR - http://www.scopus.com/inward/record.url?scp=84958531615&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-10073-9_37
DO - 10.1007/978-3-319-10073-9_37
M3 - Conference contribution
AN - SCOPUS:84958531615
SN - 9783319100722
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 438
EP - 445
BT - Database and Expert Systems Applications - 25th International Conference, DEXA 2014, Proceedings
PB - Springer Verlag
T2 - 25th International Conference on Database and Expert Systems Applications, DEXA 2014
Y2 - 1 September 2014 through 4 September 2014
ER -