Abstract
Errors are prevalent in data sequences, such as GPS trajectories or sensor readings. Existing methods on cleaning sequential data employ a constraint on value changing speeds and perform constraint-based repairing. While such speed constraints are effective in identifying large spike errors, the small errors that do not deviate much from the truth and indeed satisfy the speed constraints can hardly be identified and repaired. To handle such small errors, in this paper, we propose a cleaning method based on probability of speed change. Rather than declaring a broad constraint of max/min speeds, we model the probability distribution of speed changes. The repairing problem is thus to maximize the probability of the sequence w.r.t. the probability of speed changes. We formalize the probability-based repairing problem and devise algorithms in streaming scenarios. Experiments on real data sets (in various applications) demonstrate the superiority of our proposal.
Original language | English |
---|---|
Pages (from-to) | 1-24 |
Number of pages | 24 |
Journal | VLDB Journal |
Volume | 33 |
Issue number | 1 |
DOIs | |
Publication status | Published - Jan 2024 |
Keywords
- Data cleaning
- Speed change
- Stream processing
- Time series