Malware classification for the cloud via semi-supervised transfer learning

Xianwei Gao; Changzhen Hu; Chun Shan; Baoxu Liu; Zequn Niu; Hui Xie

doi:10.1016/j.jisa.2020.102661

Malware classification for the cloud via semi-supervised transfer learning

Xianwei Gao, Changzhen Hu, Chun Shan^*, Baoxu Liu, Zequn Niu, Hui Xie

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

47 Citations (Scopus)

Abstract

Malware threats and privacy protection are two of the biggest challenges in the cloud computing environment. Many studies have focused on the accuracy of malware detection, but they did not sufficiently take into account the privacy protection of cloud tenants. This paper proposes a novel malware detection model, based on semi-supervised transfer learning (SSTL) for the cloud, that consists of detection, prediction, and transfer components. To protect the privacy of tenants in the public cloud, a byte classifier based on a recurrent neural network (RNN) for its detection component is designed to detect malware. However, because it is limited by the scarcity of training samples, the accuracy of the byte classifier is only 94.72% after supervised learning. An asm classifier is proposed for the prediction component, and it achieves 99.69% accuracy. The transfer component invokes the prediction component to classify an unlabeled dataset, and it combines the predicted labels and byte features of the unlabeled dataset into a new training dataset. Through the advantages of semi-supervised learning, the new dataset is transferred to the byte classifier for training again. The test results on the Kaggle malware datasets show that semi-supervised transfer learning improved the accuracy of the detection component from 94.72% to 96.9%. The improved malware detection method can not only do a better job of resolving the privacy concerns of tenants in the public cloud than other similar methods, but it can also detect malware more accurately.

Original language	English
Article number	102661
Journal	Journal of Information Security and Applications
Volume	55
DOIs	https://doi.org/10.1016/j.jisa.2020.102661
Publication status	Published - Dec 2020

Keywords

Assembly opcode
Cloud
Malware classification
Privacy protection
SSTL
Semi-supervised learning
Transfer learning

Access to Document

10.1016/j.jisa.2020.102661

Cite this

Gao, X., Hu, C., Shan, C., Liu, B., Niu, Z., & Xie, H. (2020). Malware classification for the cloud via semi-supervised transfer learning. Journal of Information Security and Applications, 55, Article 102661. https://doi.org/10.1016/j.jisa.2020.102661

@article{2bc71d0845d84882b9062709170bca10,

title = "Malware classification for the cloud via semi-supervised transfer learning",

abstract = "Malware threats and privacy protection are two of the biggest challenges in the cloud computing environment. Many studies have focused on the accuracy of malware detection, but they did not sufficiently take into account the privacy protection of cloud tenants. This paper proposes a novel malware detection model, based on semi-supervised transfer learning (SSTL) for the cloud, that consists of detection, prediction, and transfer components. To protect the privacy of tenants in the public cloud, a byte classifier based on a recurrent neural network (RNN) for its detection component is designed to detect malware. However, because it is limited by the scarcity of training samples, the accuracy of the byte classifier is only 94.72% after supervised learning. An asm classifier is proposed for the prediction component, and it achieves 99.69% accuracy. The transfer component invokes the prediction component to classify an unlabeled dataset, and it combines the predicted labels and byte features of the unlabeled dataset into a new training dataset. Through the advantages of semi-supervised learning, the new dataset is transferred to the byte classifier for training again. The test results on the Kaggle malware datasets show that semi-supervised transfer learning improved the accuracy of the detection component from 94.72% to 96.9%. The improved malware detection method can not only do a better job of resolving the privacy concerns of tenants in the public cloud than other similar methods, but it can also detect malware more accurately.",

keywords = "Assembly opcode, Cloud, Malware classification, Privacy protection, SSTL, Semi-supervised learning, Transfer learning",

author = "Xianwei Gao and Changzhen Hu and Chun Shan and Baoxu Liu and Zequn Niu and Hui Xie",

note = "Publisher Copyright: {\textcopyright} 2020 The Author(s)",

year = "2020",

month = dec,

doi = "10.1016/j.jisa.2020.102661",

language = "English",

volume = "55",

journal = "Journal of Information Security and Applications",

issn = "2214-2134",

publisher = "Elsevier Ltd.",

}

TY - JOUR

T1 - Malware classification for the cloud via semi-supervised transfer learning

AU - Gao, Xianwei

AU - Hu, Changzhen

AU - Shan, Chun

AU - Liu, Baoxu

AU - Niu, Zequn

AU - Xie, Hui

PY - 2020/12

Y1 - 2020/12

N2 - Malware threats and privacy protection are two of the biggest challenges in the cloud computing environment. Many studies have focused on the accuracy of malware detection, but they did not sufficiently take into account the privacy protection of cloud tenants. This paper proposes a novel malware detection model, based on semi-supervised transfer learning (SSTL) for the cloud, that consists of detection, prediction, and transfer components. To protect the privacy of tenants in the public cloud, a byte classifier based on a recurrent neural network (RNN) for its detection component is designed to detect malware. However, because it is limited by the scarcity of training samples, the accuracy of the byte classifier is only 94.72% after supervised learning. An asm classifier is proposed for the prediction component, and it achieves 99.69% accuracy. The transfer component invokes the prediction component to classify an unlabeled dataset, and it combines the predicted labels and byte features of the unlabeled dataset into a new training dataset. Through the advantages of semi-supervised learning, the new dataset is transferred to the byte classifier for training again. The test results on the Kaggle malware datasets show that semi-supervised transfer learning improved the accuracy of the detection component from 94.72% to 96.9%. The improved malware detection method can not only do a better job of resolving the privacy concerns of tenants in the public cloud than other similar methods, but it can also detect malware more accurately.

AB - Malware threats and privacy protection are two of the biggest challenges in the cloud computing environment. Many studies have focused on the accuracy of malware detection, but they did not sufficiently take into account the privacy protection of cloud tenants. This paper proposes a novel malware detection model, based on semi-supervised transfer learning (SSTL) for the cloud, that consists of detection, prediction, and transfer components. To protect the privacy of tenants in the public cloud, a byte classifier based on a recurrent neural network (RNN) for its detection component is designed to detect malware. However, because it is limited by the scarcity of training samples, the accuracy of the byte classifier is only 94.72% after supervised learning. An asm classifier is proposed for the prediction component, and it achieves 99.69% accuracy. The transfer component invokes the prediction component to classify an unlabeled dataset, and it combines the predicted labels and byte features of the unlabeled dataset into a new training dataset. Through the advantages of semi-supervised learning, the new dataset is transferred to the byte classifier for training again. The test results on the Kaggle malware datasets show that semi-supervised transfer learning improved the accuracy of the detection component from 94.72% to 96.9%. The improved malware detection method can not only do a better job of resolving the privacy concerns of tenants in the public cloud than other similar methods, but it can also detect malware more accurately.

KW - Assembly opcode

KW - Cloud

KW - Malware classification

KW - Privacy protection

KW - SSTL

KW - Semi-supervised learning

KW - Transfer learning

UR - http://www.scopus.com/inward/record.url?scp=85092890618&partnerID=8YFLogxK

U2 - 10.1016/j.jisa.2020.102661

DO - 10.1016/j.jisa.2020.102661

M3 - Article

AN - SCOPUS:85092890618

SN - 2214-2134

VL - 55

JO - Journal of Information Security and Applications

JF - Journal of Information Security and Applications

M1 - 102661

ER -

Malware classification for the cloud via semi-supervised transfer learning

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this