Sequential data cleaning: A statistical approach

Aoqian Zhang, Shaoxu Song, Jianmin Wang

科研成果: 书/报告/会议事项章节会议稿件同行评审

55 引用 (Scopus)

摘要

Errors are prevalent in data sequences, such as GPS trajectories or sensor readings. Existing methods on cleaning sequential data employ a constraint on value changing speeds and perform constraint-based repairing. While such speed constraints are effective in identifying large spike errors, the small errors that do not significantly deviate from the truth and indeed satisfy the speed constraints can hardly be identified and repaired. To handle such small errors, in this paper, we propose a statistical based cleaning method. Rather than declaring a broad constraint of max/min speeds, we model the probability distribution of speed changes. The repairing problem is thus to maximize the likelihood of the sequence w.r.t. The probability of speed changes. We formalize the likelihood-based cleaning problem, show its np- hardness, devise exact algorithms, and propose several approximate/ heuristic methods to trade off effectiveness for efficiency. Experiments on real data sets (in various applications) demonstrate the superiority of our proposal.

源语言英语
主期刊名SIGMOD 2016 - Proceedings of the 2016 International Conference on Management of Data
出版商Association for Computing Machinery
909-924
页数16
ISBN(电子版)9781450335317
DOI
出版状态已出版 - 26 6月 2016
已对外发布
活动2016 ACM SIGMOD International Conference on Management of Data, SIGMOD 2016 - San Francisco, 美国
期限: 26 6月 20161 7月 2016

出版系列

姓名Proceedings of the ACM SIGMOD International Conference on Management of Data
26-June-2016
ISSN(印刷版)0730-8078

会议

会议2016 ACM SIGMOD International Conference on Management of Data, SIGMOD 2016
国家/地区美国
San Francisco
时期26/06/161/07/16

指纹

探究 'Sequential data cleaning: A statistical approach' 的科研主题。它们共同构成独一无二的指纹。

引用此