Streaming data cleaning based on speed change

Haoyu Wang, Aoqian Zhang, Shaoxu Song*, Jianmin Wang

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

1 引用 (Scopus)

摘要

Errors are prevalent in data sequences, such as GPS trajectories or sensor readings. Existing methods on cleaning sequential data employ a constraint on value changing speeds and perform constraint-based repairing. While such speed constraints are effective in identifying large spike errors, the small errors that do not deviate much from the truth and indeed satisfy the speed constraints can hardly be identified and repaired. To handle such small errors, in this paper, we propose a cleaning method based on probability of speed change. Rather than declaring a broad constraint of max/min speeds, we model the probability distribution of speed changes. The repairing problem is thus to maximize the probability of the sequence w.r.t. the probability of speed changes. We formalize the probability-based repairing problem and devise algorithms in streaming scenarios. Experiments on real data sets (in various applications) demonstrate the superiority of our proposal.

源语言英语
页(从-至)1-24
页数24
期刊VLDB Journal
33
1
DOI
出版状态已出版 - 1月 2024

指纹

探究 'Streaming data cleaning based on speed change' 的科研主题。它们共同构成独一无二的指纹。

引用此