MaliCage: A packed malware family classification framework based on DNN and GAN

Xianwei Gao; Changzhen Hu; Chun Shan; Weijie Han

doi:10.1016/j.jisa.2022.103267

MaliCage: A packed malware family classification framework based on DNN and GAN

Xianwei Gao^*, Changzhen Hu, Chun Shan, Weijie Han

^*Corresponding author for this work

Beijing Institute of Technology

Research output: Contribution to journal › Article › peer-review

17 Citations (Scopus)

Abstract

To evade security detection, hackers always add a deceptive packer outside of the original malicious codes. The coexistence of original unpacked samples and packed samples of same family needs special attention in malware detection. The features of packed malware are changed by the packer, which would disturb the prediction results of malware classifier. The state-of-the-art studies of malware detection mainly focus on whether the malware is packed, or which type of packer is used. However, the ability of detecting the family of packed malware is still insufficient. Motivated by the above challenges, a novel packed malware family classification framework called MaliCage is proposed. The goal of the framework is to classify packed malware accurately. MaliCage consists of three core modules: packer detector, malware classifier, and a packer generative adversarial network (GAN). The packer detector is used as the pre-step of the framework to identify whether malware is packed. After distinguishing the packed samples, the dynamic features extracted from the sandbox are fitted to the malware classifier based on deep neural networks (DNN). The malware classifier can classify unpacked and packed malware simultaneously. Furthermore, the packer GAN generates fake packed samples to alleviate the underfitting of the malware classifiers. We built a single-packer dataset and a multi-packer dataset to evaluate the framework. In the single-packer experiment, 10 classes of malware samples packed by UPX were examined objectively. The accuracy of the malware classifier when using only real packed samples was 91.66%. After introducing fake packed samples generated by packer GAN, the accuracy of the packed malware classifier could reach 97.8%. In the multi-packer scenario, our method can also accurately classify benign programs, unpacked malware and malware packed by several common packers. The validation results show that MaliCage can not only effectively solve the impacts of packed malware on machine learning model, but also improve the detection accuracy.

Original language	English
Article number	103267
Journal	Journal of Information Security and Applications
Volume	68
DOIs	https://doi.org/10.1016/j.jisa.2022.103267
Publication status	Published - Aug 2022

Keywords

Classification
DNN
GAN
Packed malware

Access to Document

10.1016/j.jisa.2022.103267

Cite this

Gao, X., Hu, C., Shan, C., & Han, W. (2022). MaliCage: A packed malware family classification framework based on DNN and GAN. Journal of Information Security and Applications, 68, Article 103267. https://doi.org/10.1016/j.jisa.2022.103267

@article{78dc79273833495db06b98207998f7d1,

title = "MaliCage: A packed malware family classification framework based on DNN and GAN",

abstract = "To evade security detection, hackers always add a deceptive packer outside of the original malicious codes. The coexistence of original unpacked samples and packed samples of same family needs special attention in malware detection. The features of packed malware are changed by the packer, which would disturb the prediction results of malware classifier. The state-of-the-art studies of malware detection mainly focus on whether the malware is packed, or which type of packer is used. However, the ability of detecting the family of packed malware is still insufficient. Motivated by the above challenges, a novel packed malware family classification framework called MaliCage is proposed. The goal of the framework is to classify packed malware accurately. MaliCage consists of three core modules: packer detector, malware classifier, and a packer generative adversarial network (GAN). The packer detector is used as the pre-step of the framework to identify whether malware is packed. After distinguishing the packed samples, the dynamic features extracted from the sandbox are fitted to the malware classifier based on deep neural networks (DNN). The malware classifier can classify unpacked and packed malware simultaneously. Furthermore, the packer GAN generates fake packed samples to alleviate the underfitting of the malware classifiers. We built a single-packer dataset and a multi-packer dataset to evaluate the framework. In the single-packer experiment, 10 classes of malware samples packed by UPX were examined objectively. The accuracy of the malware classifier when using only real packed samples was 91.66%. After introducing fake packed samples generated by packer GAN, the accuracy of the packed malware classifier could reach 97.8%. In the multi-packer scenario, our method can also accurately classify benign programs, unpacked malware and malware packed by several common packers. The validation results show that MaliCage can not only effectively solve the impacts of packed malware on machine learning model, but also improve the detection accuracy.",

keywords = "Classification, DNN, GAN, Packed malware",

author = "Xianwei Gao and Changzhen Hu and Chun Shan and Weijie Han",

note = "Publisher Copyright: {\textcopyright} 2022 Elsevier Ltd",

year = "2022",

month = aug,

doi = "10.1016/j.jisa.2022.103267",

language = "English",

volume = "68",

journal = "Journal of Information Security and Applications",

issn = "2214-2134",

publisher = "Elsevier Ltd.",

}

TY - JOUR

T1 - MaliCage

T2 - A packed malware family classification framework based on DNN and GAN

AU - Gao, Xianwei

AU - Hu, Changzhen

AU - Shan, Chun

AU - Han, Weijie

PY - 2022/8

Y1 - 2022/8

N2 - To evade security detection, hackers always add a deceptive packer outside of the original malicious codes. The coexistence of original unpacked samples and packed samples of same family needs special attention in malware detection. The features of packed malware are changed by the packer, which would disturb the prediction results of malware classifier. The state-of-the-art studies of malware detection mainly focus on whether the malware is packed, or which type of packer is used. However, the ability of detecting the family of packed malware is still insufficient. Motivated by the above challenges, a novel packed malware family classification framework called MaliCage is proposed. The goal of the framework is to classify packed malware accurately. MaliCage consists of three core modules: packer detector, malware classifier, and a packer generative adversarial network (GAN). The packer detector is used as the pre-step of the framework to identify whether malware is packed. After distinguishing the packed samples, the dynamic features extracted from the sandbox are fitted to the malware classifier based on deep neural networks (DNN). The malware classifier can classify unpacked and packed malware simultaneously. Furthermore, the packer GAN generates fake packed samples to alleviate the underfitting of the malware classifiers. We built a single-packer dataset and a multi-packer dataset to evaluate the framework. In the single-packer experiment, 10 classes of malware samples packed by UPX were examined objectively. The accuracy of the malware classifier when using only real packed samples was 91.66%. After introducing fake packed samples generated by packer GAN, the accuracy of the packed malware classifier could reach 97.8%. In the multi-packer scenario, our method can also accurately classify benign programs, unpacked malware and malware packed by several common packers. The validation results show that MaliCage can not only effectively solve the impacts of packed malware on machine learning model, but also improve the detection accuracy.

AB - To evade security detection, hackers always add a deceptive packer outside of the original malicious codes. The coexistence of original unpacked samples and packed samples of same family needs special attention in malware detection. The features of packed malware are changed by the packer, which would disturb the prediction results of malware classifier. The state-of-the-art studies of malware detection mainly focus on whether the malware is packed, or which type of packer is used. However, the ability of detecting the family of packed malware is still insufficient. Motivated by the above challenges, a novel packed malware family classification framework called MaliCage is proposed. The goal of the framework is to classify packed malware accurately. MaliCage consists of three core modules: packer detector, malware classifier, and a packer generative adversarial network (GAN). The packer detector is used as the pre-step of the framework to identify whether malware is packed. After distinguishing the packed samples, the dynamic features extracted from the sandbox are fitted to the malware classifier based on deep neural networks (DNN). The malware classifier can classify unpacked and packed malware simultaneously. Furthermore, the packer GAN generates fake packed samples to alleviate the underfitting of the malware classifiers. We built a single-packer dataset and a multi-packer dataset to evaluate the framework. In the single-packer experiment, 10 classes of malware samples packed by UPX were examined objectively. The accuracy of the malware classifier when using only real packed samples was 91.66%. After introducing fake packed samples generated by packer GAN, the accuracy of the packed malware classifier could reach 97.8%. In the multi-packer scenario, our method can also accurately classify benign programs, unpacked malware and malware packed by several common packers. The validation results show that MaliCage can not only effectively solve the impacts of packed malware on machine learning model, but also improve the detection accuracy.

KW - Classification

KW - DNN

KW - GAN

KW - Packed malware

UR - http://www.scopus.com/inward/record.url?scp=85133929006&partnerID=8YFLogxK

U2 - 10.1016/j.jisa.2022.103267

DO - 10.1016/j.jisa.2022.103267

M3 - Article

AN - SCOPUS:85133929006

SN - 2214-2134

VL - 68

JO - Journal of Information Security and Applications

JF - Journal of Information Security and Applications

M1 - 103267

ER -

MaliCage: A packed malware family classification framework based on DNN and GAN

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this