Enriching data imputation with extensive similarity neighbors

Shaoxu Song, Aoqian Zhang, Lei Chen, Jianmin Wang

科研成果: 书/报告/会议事项章节章节同行评审

42 引用 (Scopus)

摘要

Incomplete information often occur along with many database applications, e.g., in data integration, data cleaning or data exchange. The idea of data imputation is to fill the miss- ing data with the values of its neighbors who share the same information. Such neighbors could either be identified certainly by editing rules or statistically by relational de- pendency networks. Unfortunately, owing to data sparsity, the number of neighbors (identified w.r.t. value equality) is rather limited, especially in the presence of data values with variances. In this paper, we argue to extensively en- rich similarity neighbors by similarity rules with tolerance to small variations. More fillings can thus be acquired that the aforesaid equality neighbors fail to reveal. To fill the missing values more, we study the problem of maximizing the missing data imputation. Our major contributions in- clude (1) the np-hardness analysis on solving and approx- imating the problem, (2) exact algorithms for tackling the problem, and (3) eficient approximation with performance guarantees. Experiments on real and synthetic data sets demonstrate that the filling accuracy can be improved.

源语言英语
主期刊名Proceedings of the VLDB Endowment
编辑Christophe Claramunt, Simonas Saltenis, Ki-Joune Li
出版商Association for Computing Machinery
1286-1297
页数12
8
版本11 11
DOI
出版状态已出版 - 2015
已对外发布
活动3rd Workshop on Spatio-Temporal Database Management, STDBM 2006, Co-located with the 32nd International Conference on Very Large Data Bases, VLDB 2006 - Seoul, 韩国
期限: 11 9月 200611 9月 2006

会议

会议3rd Workshop on Spatio-Temporal Database Management, STDBM 2006, Co-located with the 32nd International Conference on Very Large Data Bases, VLDB 2006
国家/地区韩国
Seoul
时期11/09/0611/09/06

指纹

探究 'Enriching data imputation with extensive similarity neighbors' 的科研主题。它们共同构成独一无二的指纹。

引用此