Robust K-median and K-means clustering algorithms for incomplete data

Jinhua Li; Shiji Song; Yuli Zhang; Zhen Zhou

doi:10.1155/2016/4321928

Robust K-median and K-means clustering algorithms for incomplete data

Jinhua Li, Shiji Song^*, Yuli Zhang, Zhen Zhou

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

26 Citations (Scopus)

Abstract

Incomplete data with missing feature values are prevalent in clustering problems. Traditional clustering methods first estimate the missing values by imputation and then apply the classical clustering algorithms for complete data, such as K-median and K-means. However, in practice, it is often hard to obtain accurate estimation of the missing values, which deteriorates the performance of clustering. To enhance the robustness of clustering algorithms, this paper represents the missing values by interval data and introduces the concept of robust cluster objective function. A minimax robust optimization (RO) formulation is presented to provide clustering results, which are insensitive to estimation errors. To solve the proposed RO problem, we propose robust K-median and K-means clustering algorithms with low time and space complexity. Comparisons and analysis of experimental results on both artificially generated and real-world incomplete data sets validate the robustness and effectiveness of the proposed algorithms.

Original language	English
Article number	4321928
Journal	Mathematical Problems in Engineering
Volume	2016
DOIs	https://doi.org/10.1155/2016/4321928
Publication status	Published - 2016
Externally published	Yes

Access to Document

10.1155/2016/4321928

Cite this

@article{77795721a0aa4940998f88cdc098fd63,

title = "Robust K-median and K-means clustering algorithms for incomplete data",

abstract = "Incomplete data with missing feature values are prevalent in clustering problems. Traditional clustering methods first estimate the missing values by imputation and then apply the classical clustering algorithms for complete data, such as K-median and K-means. However, in practice, it is often hard to obtain accurate estimation of the missing values, which deteriorates the performance of clustering. To enhance the robustness of clustering algorithms, this paper represents the missing values by interval data and introduces the concept of robust cluster objective function. A minimax robust optimization (RO) formulation is presented to provide clustering results, which are insensitive to estimation errors. To solve the proposed RO problem, we propose robust K-median and K-means clustering algorithms with low time and space complexity. Comparisons and analysis of experimental results on both artificially generated and real-world incomplete data sets validate the robustness and effectiveness of the proposed algorithms.",

author = "Jinhua Li and Shiji Song and Yuli Zhang and Zhen Zhou",

note = "Publisher Copyright: {\textcopyright} 2016 Jinhua Li et al.",

year = "2016",

doi = "10.1155/2016/4321928",

language = "English",

volume = "2016",

journal = "Mathematical Problems in Engineering",

issn = "1024-123X",

publisher = "Hindawi Publishing Corporation",

}

TY - JOUR

T1 - Robust K-median and K-means clustering algorithms for incomplete data

AU - Li, Jinhua

AU - Song, Shiji

AU - Zhang, Yuli

AU - Zhou, Zhen

PY - 2016

Y1 - 2016

N2 - Incomplete data with missing feature values are prevalent in clustering problems. Traditional clustering methods first estimate the missing values by imputation and then apply the classical clustering algorithms for complete data, such as K-median and K-means. However, in practice, it is often hard to obtain accurate estimation of the missing values, which deteriorates the performance of clustering. To enhance the robustness of clustering algorithms, this paper represents the missing values by interval data and introduces the concept of robust cluster objective function. A minimax robust optimization (RO) formulation is presented to provide clustering results, which are insensitive to estimation errors. To solve the proposed RO problem, we propose robust K-median and K-means clustering algorithms with low time and space complexity. Comparisons and analysis of experimental results on both artificially generated and real-world incomplete data sets validate the robustness and effectiveness of the proposed algorithms.

AB - Incomplete data with missing feature values are prevalent in clustering problems. Traditional clustering methods first estimate the missing values by imputation and then apply the classical clustering algorithms for complete data, such as K-median and K-means. However, in practice, it is often hard to obtain accurate estimation of the missing values, which deteriorates the performance of clustering. To enhance the robustness of clustering algorithms, this paper represents the missing values by interval data and introduces the concept of robust cluster objective function. A minimax robust optimization (RO) formulation is presented to provide clustering results, which are insensitive to estimation errors. To solve the proposed RO problem, we propose robust K-median and K-means clustering algorithms with low time and space complexity. Comparisons and analysis of experimental results on both artificially generated and real-world incomplete data sets validate the robustness and effectiveness of the proposed algorithms.

UR - http://www.scopus.com/inward/record.url?scp=85008608296&partnerID=8YFLogxK

U2 - 10.1155/2016/4321928

DO - 10.1155/2016/4321928

M3 - Article

AN - SCOPUS:85008608296

SN - 1024-123X

VL - 2016

JO - Mathematical Problems in Engineering

JF - Mathematical Problems in Engineering

M1 - 4321928

ER -

Robust K-median and K-means clustering algorithms for incomplete data

Abstract

Access to Document

Other files and links

Fingerprint

Cite this