Sequential data cleaning: A statistical approach

Aoqian Zhang, Shaoxu Song, Jianmin Wang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

54 Citations (Scopus)

Abstract

Errors are prevalent in data sequences, such as GPS trajectories or sensor readings. Existing methods on cleaning sequential data employ a constraint on value changing speeds and perform constraint-based repairing. While such speed constraints are effective in identifying large spike errors, the small errors that do not significantly deviate from the truth and indeed satisfy the speed constraints can hardly be identified and repaired. To handle such small errors, in this paper, we propose a statistical based cleaning method. Rather than declaring a broad constraint of max/min speeds, we model the probability distribution of speed changes. The repairing problem is thus to maximize the likelihood of the sequence w.r.t. The probability of speed changes. We formalize the likelihood-based cleaning problem, show its np- hardness, devise exact algorithms, and propose several approximate/ heuristic methods to trade off effectiveness for efficiency. Experiments on real data sets (in various applications) demonstrate the superiority of our proposal.

Original languageEnglish
Title of host publicationSIGMOD 2016 - Proceedings of the 2016 International Conference on Management of Data
PublisherAssociation for Computing Machinery
Pages909-924
Number of pages16
ISBN (Electronic)9781450335317
DOIs
Publication statusPublished - 26 Jun 2016
Externally publishedYes
Event2016 ACM SIGMOD International Conference on Management of Data, SIGMOD 2016 - San Francisco, United States
Duration: 26 Jun 20161 Jul 2016

Publication series

NameProceedings of the ACM SIGMOD International Conference on Management of Data
Volume26-June-2016
ISSN (Print)0730-8078

Conference

Conference2016 ACM SIGMOD International Conference on Management of Data, SIGMOD 2016
Country/TerritoryUnited States
CitySan Francisco
Period26/06/161/07/16

Fingerprint

Dive into the research topics of 'Sequential data cleaning: A statistical approach'. Together they form a unique fingerprint.

Cite this