TY - JOUR
T1 - A novel anomaly detection approach based on ensemble semi-supervised active learning (ADESSA)
AU - Niu, Zequn
AU - Guo, Wenjie
AU - Xue, Jingfeng
AU - Wang, Yong
AU - Kong, Zixiao
AU - Huang, Lu
N1 - Publisher Copyright:
© 2023
PY - 2023/6
Y1 - 2023/6
N2 - As an industrial infrastructure, the safety and reliability of the Cyber-Physical System requires the effective anomaly detection. However, the existing detection methods have bottleneck in the face of insufficient training datasets. This work proposed and a novel anomaly detection approach based on ensemble semi-supervised active learning, which can effectively detect anomalous traffic when there is few labeled samples and the dataset is unbalanced. Specifically, this work proposed balanced sampling strategy, which combines the margin sampling and the democratic co-learning techniques, to construct a balanced training set that consists of manually labeled high-information samples and automatically labeled high-confidence samples, to effectively train the detection model on a limited budget. We also found adding correctly labeled high-confidence samples into training set improves the performance of detection model when the training samples are few and the label budget is limited. This work achieves a good balance between the effectiveness of model training and the cost of sample querying when the traffic data in CPS is rare labeled and imbalanced. In addition, we designed five pairs of experiments with NSL-KDD and SWaT dataset, and the results demonstrate the capability and advancement of proposed approach.
AB - As an industrial infrastructure, the safety and reliability of the Cyber-Physical System requires the effective anomaly detection. However, the existing detection methods have bottleneck in the face of insufficient training datasets. This work proposed and a novel anomaly detection approach based on ensemble semi-supervised active learning, which can effectively detect anomalous traffic when there is few labeled samples and the dataset is unbalanced. Specifically, this work proposed balanced sampling strategy, which combines the margin sampling and the democratic co-learning techniques, to construct a balanced training set that consists of manually labeled high-information samples and automatically labeled high-confidence samples, to effectively train the detection model on a limited budget. We also found adding correctly labeled high-confidence samples into training set improves the performance of detection model when the training samples are few and the label budget is limited. This work achieves a good balance between the effectiveness of model training and the cost of sample querying when the traffic data in CPS is rare labeled and imbalanced. In addition, we designed five pairs of experiments with NSL-KDD and SWaT dataset, and the results demonstrate the capability and advancement of proposed approach.
KW - Active learning
KW - Anomaly detection
KW - Cyber-physical systems
KW - Ensemble learning
KW - Semi-supervised learning
UR - http://www.scopus.com/inward/record.url?scp=85152603696&partnerID=8YFLogxK
U2 - 10.1016/j.cose.2023.103190
DO - 10.1016/j.cose.2023.103190
M3 - Article
AN - SCOPUS:85152603696
SN - 0167-4048
VL - 129
JO - Computers and Security
JF - Computers and Security
M1 - 103190
ER -