Streaming data cleaning based on speed change

Haoyu Wang, Aoqian Zhang, Shaoxu Song*, Jianmin Wang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

Errors are prevalent in data sequences, such as GPS trajectories or sensor readings. Existing methods on cleaning sequential data employ a constraint on value changing speeds and perform constraint-based repairing. While such speed constraints are effective in identifying large spike errors, the small errors that do not deviate much from the truth and indeed satisfy the speed constraints can hardly be identified and repaired. To handle such small errors, in this paper, we propose a cleaning method based on probability of speed change. Rather than declaring a broad constraint of max/min speeds, we model the probability distribution of speed changes. The repairing problem is thus to maximize the probability of the sequence w.r.t. the probability of speed changes. We formalize the probability-based repairing problem and devise algorithms in streaming scenarios. Experiments on real data sets (in various applications) demonstrate the superiority of our proposal.

Original languageEnglish
Pages (from-to)1-24
Number of pages24
JournalVLDB Journal
Volume33
Issue number1
DOIs
Publication statusPublished - Jan 2024

Keywords

  • Data cleaning
  • Speed change
  • Stream processing
  • Time series

Fingerprint

Dive into the research topics of 'Streaming data cleaning based on speed change'. Together they form a unique fingerprint.

Cite this