A multistage intrusion detection method for alleviating class overlapping problem

He Pang; Fusheng Jin; Mengnan Chen; Yutong Jiang; Ye Yuan

doi:10.1007/s00521-024-10903-x

A multistage intrusion detection method for alleviating class overlapping problem

He Pang, Fusheng Jin^*, Mengnan Chen, Yutong Jiang, Ye Yuan

^*Corresponding author for this work

School of Computer Science and Technology

Research output: Contribution to journal › Article › peer-review

Abstract

Intrusion detection system (IDS) can identify abnormal network traffic and attacks, which is an important means of network security defense. However, some intrusion data are often disguised as normal data for transmission, which increases the difficulty of intrusion data classification. In addition, the existing packet-based or flow-based data feature extraction methods result in low feature dimensions, causing the problem of class overlapping between different categories with the same features. To clarify, overlapping samples are those that overlap between erroneous samples and correct samples. Nonoverlapping samples are those in the test set that do not match the characteristics of the already identified overlapping samples and are therefore considered nonoverlapping samples. Therefore, the detection effect of some attacks with high concealment is poor. In order to solve the above problems, this paper proposes a multistage intrusion detection method: an existing intrusion detection model with higher classification performance (OBLR) is used to predict the data in the first stage. In the second stage, for the overlapping data in the confusing data, the method learns the distribution of each feature group according to the randomly divided “intermediary set,” and realizes the prediction of overlapping samples through the prior distribution knowledge, and achieves efficient classification of overlapping samples without increasing the computational burden of the model. For nonoverlapping data in the confusing data, KPCA (kernel principal component analysis) dimension elevation is used in the third stage to capture more detailed difference information between samples, and GMM (Gaussian mixed model) is combined with the “representative samples” proposed in this paper to assist classifier classification. At the same time, all the base classifiers are integrated through LTR (learning to rank) to improve the classification effect of the model for nonoverlapping data in the confusing data. The experimental results show that 99.71% accuracy and 0.158% false positive rate are achieved on the complex intrusion dataset UNSW-NB15, which is better than the existing methods. In particular, this method can increase the accuracy of 38.1% for the confusing samples that cannot be correctly detected by the existing model.

Original language	English
Article number	106167
Journal	Neural Computing and Applications
DOIs	https://doi.org/10.1007/s00521-024-10903-x
Publication status	Accepted/In press - 2024

Keywords

Gaussian mixed model
Intrusion detection
Kernel principal component analysis
Learning to rank

Access to Document

10.1007/s00521-024-10903-x

Cite this

Pang, H., Jin, F., Chen, M., Jiang, Y., & Yuan, Y. (Accepted/In press). A multistage intrusion detection method for alleviating class overlapping problem. Neural Computing and Applications, Article 106167. https://doi.org/10.1007/s00521-024-10903-x

@article{a6f7a908292649b9b4ca16cbf73b74de,

title = "A multistage intrusion detection method for alleviating class overlapping problem",

abstract = "Intrusion detection system (IDS) can identify abnormal network traffic and attacks, which is an important means of network security defense. However, some intrusion data are often disguised as normal data for transmission, which increases the difficulty of intrusion data classification. In addition, the existing packet-based or flow-based data feature extraction methods result in low feature dimensions, causing the problem of class overlapping between different categories with the same features. To clarify, overlapping samples are those that overlap between erroneous samples and correct samples. Nonoverlapping samples are those in the test set that do not match the characteristics of the already identified overlapping samples and are therefore considered nonoverlapping samples. Therefore, the detection effect of some attacks with high concealment is poor. In order to solve the above problems, this paper proposes a multistage intrusion detection method: an existing intrusion detection model with higher classification performance (OBLR) is used to predict the data in the first stage. In the second stage, for the overlapping data in the confusing data, the method learns the distribution of each feature group according to the randomly divided “intermediary set,” and realizes the prediction of overlapping samples through the prior distribution knowledge, and achieves efficient classification of overlapping samples without increasing the computational burden of the model. For nonoverlapping data in the confusing data, KPCA (kernel principal component analysis) dimension elevation is used in the third stage to capture more detailed difference information between samples, and GMM (Gaussian mixed model) is combined with the “representative samples” proposed in this paper to assist classifier classification. At the same time, all the base classifiers are integrated through LTR (learning to rank) to improve the classification effect of the model for nonoverlapping data in the confusing data. The experimental results show that 99.71% accuracy and 0.158% false positive rate are achieved on the complex intrusion dataset UNSW-NB15, which is better than the existing methods. In particular, this method can increase the accuracy of 38.1% for the confusing samples that cannot be correctly detected by the existing model.",

keywords = "Gaussian mixed model, Intrusion detection, Kernel principal component analysis, Learning to rank",

author = "He Pang and Fusheng Jin and Mengnan Chen and Yutong Jiang and Ye Yuan",

note = "Publisher Copyright: {\textcopyright} The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2024.",

year = "2024",

doi = "10.1007/s00521-024-10903-x",

language = "English",

journal = "Neural Computing and Applications",

issn = "0941-0643",

publisher = "Springer London",

}

TY - JOUR

T1 - A multistage intrusion detection method for alleviating class overlapping problem

AU - Pang, He

AU - Jin, Fusheng

AU - Chen, Mengnan

AU - Jiang, Yutong

AU - Yuan, Ye

N1 - Publisher Copyright: © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2024.

PY - 2024

Y1 - 2024

N2 - Intrusion detection system (IDS) can identify abnormal network traffic and attacks, which is an important means of network security defense. However, some intrusion data are often disguised as normal data for transmission, which increases the difficulty of intrusion data classification. In addition, the existing packet-based or flow-based data feature extraction methods result in low feature dimensions, causing the problem of class overlapping between different categories with the same features. To clarify, overlapping samples are those that overlap between erroneous samples and correct samples. Nonoverlapping samples are those in the test set that do not match the characteristics of the already identified overlapping samples and are therefore considered nonoverlapping samples. Therefore, the detection effect of some attacks with high concealment is poor. In order to solve the above problems, this paper proposes a multistage intrusion detection method: an existing intrusion detection model with higher classification performance (OBLR) is used to predict the data in the first stage. In the second stage, for the overlapping data in the confusing data, the method learns the distribution of each feature group according to the randomly divided “intermediary set,” and realizes the prediction of overlapping samples through the prior distribution knowledge, and achieves efficient classification of overlapping samples without increasing the computational burden of the model. For nonoverlapping data in the confusing data, KPCA (kernel principal component analysis) dimension elevation is used in the third stage to capture more detailed difference information between samples, and GMM (Gaussian mixed model) is combined with the “representative samples” proposed in this paper to assist classifier classification. At the same time, all the base classifiers are integrated through LTR (learning to rank) to improve the classification effect of the model for nonoverlapping data in the confusing data. The experimental results show that 99.71% accuracy and 0.158% false positive rate are achieved on the complex intrusion dataset UNSW-NB15, which is better than the existing methods. In particular, this method can increase the accuracy of 38.1% for the confusing samples that cannot be correctly detected by the existing model.

AB - Intrusion detection system (IDS) can identify abnormal network traffic and attacks, which is an important means of network security defense. However, some intrusion data are often disguised as normal data for transmission, which increases the difficulty of intrusion data classification. In addition, the existing packet-based or flow-based data feature extraction methods result in low feature dimensions, causing the problem of class overlapping between different categories with the same features. To clarify, overlapping samples are those that overlap between erroneous samples and correct samples. Nonoverlapping samples are those in the test set that do not match the characteristics of the already identified overlapping samples and are therefore considered nonoverlapping samples. Therefore, the detection effect of some attacks with high concealment is poor. In order to solve the above problems, this paper proposes a multistage intrusion detection method: an existing intrusion detection model with higher classification performance (OBLR) is used to predict the data in the first stage. In the second stage, for the overlapping data in the confusing data, the method learns the distribution of each feature group according to the randomly divided “intermediary set,” and realizes the prediction of overlapping samples through the prior distribution knowledge, and achieves efficient classification of overlapping samples without increasing the computational burden of the model. For nonoverlapping data in the confusing data, KPCA (kernel principal component analysis) dimension elevation is used in the third stage to capture more detailed difference information between samples, and GMM (Gaussian mixed model) is combined with the “representative samples” proposed in this paper to assist classifier classification. At the same time, all the base classifiers are integrated through LTR (learning to rank) to improve the classification effect of the model for nonoverlapping data in the confusing data. The experimental results show that 99.71% accuracy and 0.158% false positive rate are achieved on the complex intrusion dataset UNSW-NB15, which is better than the existing methods. In particular, this method can increase the accuracy of 38.1% for the confusing samples that cannot be correctly detected by the existing model.

KW - Gaussian mixed model

KW - Intrusion detection

KW - Kernel principal component analysis

KW - Learning to rank

UR - http://www.scopus.com/inward/record.url?scp=85213350786&partnerID=8YFLogxK

U2 - 10.1007/s00521-024-10903-x

DO - 10.1007/s00521-024-10903-x

M3 - Article

AN - SCOPUS:85213350786

SN - 0941-0643

JO - Neural Computing and Applications

JF - Neural Computing and Applications

M1 - 106167

ER -

A multistage intrusion detection method for alleviating class overlapping problem

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this