TY - GEN
T1 - Anomaly Detection for Time Series Data Stream
AU - Wang, Qifan
AU - Yan, Bo
AU - Su, Hongyi
AU - Zheng, Hong
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/3/5
Y1 - 2021/3/5
N2 - Time Series is an important data object, which has the characteristics of high dimensionality, large amount of data, and fast data update. In the field of anomaly detection problems, there are problems of data skew and few abnormal data samples, which makes it difficult to train traditional supervised learning models. At the same time, with the rise of the Internet of Things, more and more data exists in the form of streams. In response to the above problems, this paper proposes a anomaly detection method for time series data stream. This method first uses multiple random convolution kernels to perform feature transformation on the time series, and then inputs the obtained feature map into RRCF (Robust random cut forest), and finally scores the samples according to the characteristics of the RRCF, and the ones that exceed the threshold are considered abnormal. This method does not need pre training model for real-Time detection of time series data stream, but dynamic maintenance model, so it does not need manual label and has low cost. The experimental results show that the method in this paper has good performance on different data sets. Finally, the algorithm is implemented on the Apache Flink platform, which greatly improves the throughput of the detection system and enables the system to process massive data.
AB - Time Series is an important data object, which has the characteristics of high dimensionality, large amount of data, and fast data update. In the field of anomaly detection problems, there are problems of data skew and few abnormal data samples, which makes it difficult to train traditional supervised learning models. At the same time, with the rise of the Internet of Things, more and more data exists in the form of streams. In response to the above problems, this paper proposes a anomaly detection method for time series data stream. This method first uses multiple random convolution kernels to perform feature transformation on the time series, and then inputs the obtained feature map into RRCF (Robust random cut forest), and finally scores the samples according to the characteristics of the RRCF, and the ones that exceed the threshold are considered abnormal. This method does not need pre training model for real-Time detection of time series data stream, but dynamic maintenance model, so it does not need manual label and has low cost. The experimental results show that the method in this paper has good performance on different data sets. Finally, the algorithm is implemented on the Apache Flink platform, which greatly improves the throughput of the detection system and enables the system to process massive data.
KW - anomaly detection
KW - apache flink
KW - data stream
KW - time series
UR - http://www.scopus.com/inward/record.url?scp=85105285440&partnerID=8YFLogxK
U2 - 10.1109/ICBDA51983.2021.9402957
DO - 10.1109/ICBDA51983.2021.9402957
M3 - Conference contribution
AN - SCOPUS:85105285440
T3 - 2021 IEEE 6th International Conference on Big Data Analytics, ICBDA 2021
SP - 118
EP - 122
BT - 2021 IEEE 6th International Conference on Big Data Analytics, ICBDA 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 6th IEEE International Conference on Big Data Analytics, ICBDA 2021
Y2 - 5 March 2021 through 8 March 2021
ER -