TY - JOUR
T1 - A multistage intrusion detection method for alleviating class overlapping problem
AU - Pang, He
AU - Jin, Fusheng
AU - Chen, Mengnan
AU - Jiang, Yutong
AU - Yuan, Ye
N1 - Publisher Copyright:
© The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2024.
PY - 2024
Y1 - 2024
N2 - Intrusion detection system (IDS) can identify abnormal network traffic and attacks, which is an important means of network security defense. However, some intrusion data are often disguised as normal data for transmission, which increases the difficulty of intrusion data classification. In addition, the existing packet-based or flow-based data feature extraction methods result in low feature dimensions, causing the problem of class overlapping between different categories with the same features. To clarify, overlapping samples are those that overlap between erroneous samples and correct samples. Nonoverlapping samples are those in the test set that do not match the characteristics of the already identified overlapping samples and are therefore considered nonoverlapping samples. Therefore, the detection effect of some attacks with high concealment is poor. In order to solve the above problems, this paper proposes a multistage intrusion detection method: an existing intrusion detection model with higher classification performance (OBLR) is used to predict the data in the first stage. In the second stage, for the overlapping data in the confusing data, the method learns the distribution of each feature group according to the randomly divided “intermediary set,” and realizes the prediction of overlapping samples through the prior distribution knowledge, and achieves efficient classification of overlapping samples without increasing the computational burden of the model. For nonoverlapping data in the confusing data, KPCA (kernel principal component analysis) dimension elevation is used in the third stage to capture more detailed difference information between samples, and GMM (Gaussian mixed model) is combined with the “representative samples” proposed in this paper to assist classifier classification. At the same time, all the base classifiers are integrated through LTR (learning to rank) to improve the classification effect of the model for nonoverlapping data in the confusing data. The experimental results show that 99.71% accuracy and 0.158% false positive rate are achieved on the complex intrusion dataset UNSW-NB15, which is better than the existing methods. In particular, this method can increase the accuracy of 38.1% for the confusing samples that cannot be correctly detected by the existing model.
AB - Intrusion detection system (IDS) can identify abnormal network traffic and attacks, which is an important means of network security defense. However, some intrusion data are often disguised as normal data for transmission, which increases the difficulty of intrusion data classification. In addition, the existing packet-based or flow-based data feature extraction methods result in low feature dimensions, causing the problem of class overlapping between different categories with the same features. To clarify, overlapping samples are those that overlap between erroneous samples and correct samples. Nonoverlapping samples are those in the test set that do not match the characteristics of the already identified overlapping samples and are therefore considered nonoverlapping samples. Therefore, the detection effect of some attacks with high concealment is poor. In order to solve the above problems, this paper proposes a multistage intrusion detection method: an existing intrusion detection model with higher classification performance (OBLR) is used to predict the data in the first stage. In the second stage, for the overlapping data in the confusing data, the method learns the distribution of each feature group according to the randomly divided “intermediary set,” and realizes the prediction of overlapping samples through the prior distribution knowledge, and achieves efficient classification of overlapping samples without increasing the computational burden of the model. For nonoverlapping data in the confusing data, KPCA (kernel principal component analysis) dimension elevation is used in the third stage to capture more detailed difference information between samples, and GMM (Gaussian mixed model) is combined with the “representative samples” proposed in this paper to assist classifier classification. At the same time, all the base classifiers are integrated through LTR (learning to rank) to improve the classification effect of the model for nonoverlapping data in the confusing data. The experimental results show that 99.71% accuracy and 0.158% false positive rate are achieved on the complex intrusion dataset UNSW-NB15, which is better than the existing methods. In particular, this method can increase the accuracy of 38.1% for the confusing samples that cannot be correctly detected by the existing model.
KW - Gaussian mixed model
KW - Intrusion detection
KW - Kernel principal component analysis
KW - Learning to rank
UR - http://www.scopus.com/inward/record.url?scp=85213350786&partnerID=8YFLogxK
U2 - 10.1007/s00521-024-10903-x
DO - 10.1007/s00521-024-10903-x
M3 - Article
AN - SCOPUS:85213350786
SN - 0941-0643
JO - Neural Computing and Applications
JF - Neural Computing and Applications
M1 - 106167
ER -