TY - JOUR
T1 - Malware classification for the cloud via semi-supervised transfer learning
AU - Gao, Xianwei
AU - Hu, Changzhen
AU - Shan, Chun
AU - Liu, Baoxu
AU - Niu, Zequn
AU - Xie, Hui
N1 - Publisher Copyright:
© 2020 The Author(s)
PY - 2020/12
Y1 - 2020/12
N2 - Malware threats and privacy protection are two of the biggest challenges in the cloud computing environment. Many studies have focused on the accuracy of malware detection, but they did not sufficiently take into account the privacy protection of cloud tenants. This paper proposes a novel malware detection model, based on semi-supervised transfer learning (SSTL) for the cloud, that consists of detection, prediction, and transfer components. To protect the privacy of tenants in the public cloud, a byte classifier based on a recurrent neural network (RNN) for its detection component is designed to detect malware. However, because it is limited by the scarcity of training samples, the accuracy of the byte classifier is only 94.72% after supervised learning. An asm classifier is proposed for the prediction component, and it achieves 99.69% accuracy. The transfer component invokes the prediction component to classify an unlabeled dataset, and it combines the predicted labels and byte features of the unlabeled dataset into a new training dataset. Through the advantages of semi-supervised learning, the new dataset is transferred to the byte classifier for training again. The test results on the Kaggle malware datasets show that semi-supervised transfer learning improved the accuracy of the detection component from 94.72% to 96.9%. The improved malware detection method can not only do a better job of resolving the privacy concerns of tenants in the public cloud than other similar methods, but it can also detect malware more accurately.
AB - Malware threats and privacy protection are two of the biggest challenges in the cloud computing environment. Many studies have focused on the accuracy of malware detection, but they did not sufficiently take into account the privacy protection of cloud tenants. This paper proposes a novel malware detection model, based on semi-supervised transfer learning (SSTL) for the cloud, that consists of detection, prediction, and transfer components. To protect the privacy of tenants in the public cloud, a byte classifier based on a recurrent neural network (RNN) for its detection component is designed to detect malware. However, because it is limited by the scarcity of training samples, the accuracy of the byte classifier is only 94.72% after supervised learning. An asm classifier is proposed for the prediction component, and it achieves 99.69% accuracy. The transfer component invokes the prediction component to classify an unlabeled dataset, and it combines the predicted labels and byte features of the unlabeled dataset into a new training dataset. Through the advantages of semi-supervised learning, the new dataset is transferred to the byte classifier for training again. The test results on the Kaggle malware datasets show that semi-supervised transfer learning improved the accuracy of the detection component from 94.72% to 96.9%. The improved malware detection method can not only do a better job of resolving the privacy concerns of tenants in the public cloud than other similar methods, but it can also detect malware more accurately.
KW - Assembly opcode
KW - Cloud
KW - Malware classification
KW - Privacy protection
KW - SSTL
KW - Semi-supervised learning
KW - Transfer learning
UR - http://www.scopus.com/inward/record.url?scp=85092890618&partnerID=8YFLogxK
U2 - 10.1016/j.jisa.2020.102661
DO - 10.1016/j.jisa.2020.102661
M3 - Article
AN - SCOPUS:85092890618
SN - 2214-2134
VL - 55
JO - Journal of Information Security and Applications
JF - Journal of Information Security and Applications
M1 - 102661
ER -