Robust K-median and K-means clustering algorithms for incomplete data

Jinhua Li; Shiji Song; Yuli Zhang; Zhen Zhou

doi:10.1155/2016/4321928

Robust K-median and K-means clustering algorithms for incomplete data

Jinhua Li, Shiji Song^*, Yuli Zhang, Zhen Zhou

^*此作品的通讯作者

科研成果: 期刊稿件 › 文章 › 同行评审

26 引用（Scopus）

摘要

Incomplete data with missing feature values are prevalent in clustering problems. Traditional clustering methods first estimate the missing values by imputation and then apply the classical clustering algorithms for complete data, such as K-median and K-means. However, in practice, it is often hard to obtain accurate estimation of the missing values, which deteriorates the performance of clustering. To enhance the robustness of clustering algorithms, this paper represents the missing values by interval data and introduces the concept of robust cluster objective function. A minimax robust optimization (RO) formulation is presented to provide clustering results, which are insensitive to estimation errors. To solve the proposed RO problem, we propose robust K-median and K-means clustering algorithms with low time and space complexity. Comparisons and analysis of experimental results on both artificially generated and real-world incomplete data sets validate the robustness and effectiveness of the proposed algorithms.

源语言	英语
文章编号	4321928
期刊	Mathematical Problems in Engineering
卷	2016
DOI	https://doi.org/10.1155/2016/4321928
出版状态	已出版 - 2016
已对外发布	是

访问文件

10.1155/2016/4321928

其它文件与链接

链接到 Scopus 的出版物

引用此

Li, J., Song, S., Zhang, Y., & Zhou, Z. (2016). Robust K-median and K-means clustering algorithms for incomplete data. Mathematical Problems in Engineering, 2016, 文章 4321928. https://doi.org/10.1155/2016/4321928

@article{77795721a0aa4940998f88cdc098fd63,

title = "Robust K-median and K-means clustering algorithms for incomplete data",

abstract = "Incomplete data with missing feature values are prevalent in clustering problems. Traditional clustering methods first estimate the missing values by imputation and then apply the classical clustering algorithms for complete data, such as K-median and K-means. However, in practice, it is often hard to obtain accurate estimation of the missing values, which deteriorates the performance of clustering. To enhance the robustness of clustering algorithms, this paper represents the missing values by interval data and introduces the concept of robust cluster objective function. A minimax robust optimization (RO) formulation is presented to provide clustering results, which are insensitive to estimation errors. To solve the proposed RO problem, we propose robust K-median and K-means clustering algorithms with low time and space complexity. Comparisons and analysis of experimental results on both artificially generated and real-world incomplete data sets validate the robustness and effectiveness of the proposed algorithms.",

author = "Jinhua Li and Shiji Song and Yuli Zhang and Zhen Zhou",

note = "Publisher Copyright: {\textcopyright} 2016 Jinhua Li et al.",

year = "2016",

doi = "10.1155/2016/4321928",

language = "English",

volume = "2016",

journal = "Mathematical Problems in Engineering",

issn = "1024-123X",

publisher = "John Wiley and Sons Ltd",

}

TY - JOUR

T1 - Robust K-median and K-means clustering algorithms for incomplete data

AU - Li, Jinhua

AU - Song, Shiji

AU - Zhang, Yuli

AU - Zhou, Zhen

PY - 2016

Y1 - 2016

N2 - Incomplete data with missing feature values are prevalent in clustering problems. Traditional clustering methods first estimate the missing values by imputation and then apply the classical clustering algorithms for complete data, such as K-median and K-means. However, in practice, it is often hard to obtain accurate estimation of the missing values, which deteriorates the performance of clustering. To enhance the robustness of clustering algorithms, this paper represents the missing values by interval data and introduces the concept of robust cluster objective function. A minimax robust optimization (RO) formulation is presented to provide clustering results, which are insensitive to estimation errors. To solve the proposed RO problem, we propose robust K-median and K-means clustering algorithms with low time and space complexity. Comparisons and analysis of experimental results on both artificially generated and real-world incomplete data sets validate the robustness and effectiveness of the proposed algorithms.

AB - Incomplete data with missing feature values are prevalent in clustering problems. Traditional clustering methods first estimate the missing values by imputation and then apply the classical clustering algorithms for complete data, such as K-median and K-means. However, in practice, it is often hard to obtain accurate estimation of the missing values, which deteriorates the performance of clustering. To enhance the robustness of clustering algorithms, this paper represents the missing values by interval data and introduces the concept of robust cluster objective function. A minimax robust optimization (RO) formulation is presented to provide clustering results, which are insensitive to estimation errors. To solve the proposed RO problem, we propose robust K-median and K-means clustering algorithms with low time and space complexity. Comparisons and analysis of experimental results on both artificially generated and real-world incomplete data sets validate the robustness and effectiveness of the proposed algorithms.

UR - http://www.scopus.com/inward/record.url?scp=85008608296&partnerID=8YFLogxK

U2 - 10.1155/2016/4321928

DO - 10.1155/2016/4321928

M3 - Article

AN - SCOPUS:85008608296

SN - 1024-123X

VL - 2016

JO - Mathematical Problems in Engineering

JF - Mathematical Problems in Engineering

M1 - 4321928

ER -

Robust K-median and K-means clustering algorithms for incomplete data

摘要

访问文件

其它文件与链接

指纹

引用此