TY - JOUR
T1 - Time series data cleaning
T2 - 43rd International Conference on Very Large Data Bases, VLDB 2017
AU - Zhang, Aoqian
AU - Song, Shaoxu
AU - Wang, Jianmin
AU - Yu, Philip S.
N1 - Publisher Copyright:
© 2017 VLDB Endowment.
PY - 2017/6/1
Y1 - 2017/6/1
N2 - Errors are prevalent in time series data, such as GPS trajectories or sensor readings. Existing methods focus more on anomaly detection but not on repairing the detected anomalies. By simply filtering out the dirty data via anomaly detection, applications could still be unreliable over the incomplete time series. Instead of simply discarding anomalies, we propose to (iteratively) repair them in time series data, by creatively bonding the beauty of temporal nature in anomaly detection with the widely considered minimum change principle in data repairing. Our major contributions include: (1) a novel framework of iterative minimum repairing (IMR) over time series data, (2) explicit analysis on convergence of the proposed iterative minimum repairing, and (3) efficient estimation of parameters in each iteration. Remarkably, with incremental computation, we reduce the complexity of parameter estimation from O(n) to O(1). Experiments on real datasets demonstrate the superiority of our proposal compared to the state-of-the-art approaches. In particular, we show that (the proposed) repairing indeed improves the time series classification application.
AB - Errors are prevalent in time series data, such as GPS trajectories or sensor readings. Existing methods focus more on anomaly detection but not on repairing the detected anomalies. By simply filtering out the dirty data via anomaly detection, applications could still be unreliable over the incomplete time series. Instead of simply discarding anomalies, we propose to (iteratively) repair them in time series data, by creatively bonding the beauty of temporal nature in anomaly detection with the widely considered minimum change principle in data repairing. Our major contributions include: (1) a novel framework of iterative minimum repairing (IMR) over time series data, (2) explicit analysis on convergence of the proposed iterative minimum repairing, and (3) efficient estimation of parameters in each iteration. Remarkably, with incremental computation, we reduce the complexity of parameter estimation from O(n) to O(1). Experiments on real datasets demonstrate the superiority of our proposal compared to the state-of-the-art approaches. In particular, we show that (the proposed) repairing indeed improves the time series classification application.
UR - http://www.scopus.com/inward/record.url?scp=85029554452&partnerID=8YFLogxK
U2 - 10.14778/3115404.3115410
DO - 10.14778/3115404.3115410
M3 - Conference article
AN - SCOPUS:85029554452
SN - 2150-8097
VL - 10
SP - 1046
EP - 1057
JO - Proceedings of the VLDB Endowment
JF - Proceedings of the VLDB Endowment
IS - 10
Y2 - 28 August 2017 through 1 September 2017
ER -