SCREEN: Stream data cleaning under speed constraints

Shaoxu Song, Aoqian Zhang, Jianmin Wang, Philip S. Yu

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

93 Citations (Scopus)

Abstract

Stream data are often dirty, for example, owing to unreliable sensor reading, or erroneous extraction of stock prices. Most stream data cleaning approaches employ a smoothing filter, which may seriously alter the data without preserving the original information. We argue that the cleaning should avoid changing those originally correct/clean data, a.k.a. the minimum change principle in data cleaning. To capture the knowledge about what is clean, we consider the (widely existing) constraints on the speed of data changes, such as fuel consumption per hour, or daily limit of stock prices. Guided by these semantic constraints, in this paper, we propose SCREEN, the first constraint-based approach for cleaning stream data. It is notable that existing data repair techniques clean (a sequence of) data as a whole and fail to support stream computation. To this end, we have to relax the global optimum over the entire sequence to the local optimum in a window. Rather than the commonly observed NP-hardness of general data repairing problems, our major contributions include (1) polynomial time algorithm for global optimum, (2) linear time algorithm towards local optimum under an efficient Median Principle, (3) support on out-of-order arrivals of data points, and (4) adaptive window size for balancing repair accuracy and efficiency. Experiments on real datasets demonstrate that SCREEN can show significantly higher repair accuracy than the existing approaches such as smoothing.

Original languageEnglish
Title of host publicationSIGMOD 2015 - Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data
PublisherAssociation for Computing Machinery
Pages827-841
Number of pages15
ISBN (Electronic)9781450327589
DOIs
Publication statusPublished - 27 May 2015
Externally publishedYes
EventACM SIGMOD International Conference on Management of Data, SIGMOD 2015 - Melbourne, Australia
Duration: 31 May 20154 Jun 2015

Publication series

NameProceedings of the ACM SIGMOD International Conference on Management of Data
Volume2015-May
ISSN (Print)0730-8078

Conference

ConferenceACM SIGMOD International Conference on Management of Data, SIGMOD 2015
Country/TerritoryAustralia
CityMelbourne
Period31/05/154/06/15

Keywords

  • Data repairing
  • Speed constraints

Fingerprint

Dive into the research topics of 'SCREEN: Stream data cleaning under speed constraints'. Together they form a unique fingerprint.

Cite this