TY - JOUR
T1 - EII-MBS
T2 - Malware family classification via enhanced adversarial instruction behavior semantic learning
AU - Hao, Jingwei
AU - Luo, Senlin
AU - Pan, Limin
N1 - Publisher Copyright:
© 2022 Elsevier Ltd
PY - 2022/11
Y1 - 2022/11
N2 - Given the ever-increasing number of malware variants, detecting malware families is crucial. However, the operand semantics of assembly instructions are strongly related to the operating environment and are difficult to extract. This leads to the lack of instruction semantics and the difficulty in correctly classifying malware variants. At the same time, previous research does not mine the internal structural features of the instructions and the contextual relationships between them. This makes it difficult to efficiently identify virus variants. With this as motivation, this article presents a malware family classification method called EII-MBS (enhanced instruction-level behavior semantics learning). By abstracting the types of operands, the semantics of the operands are separated from the constraints of the operating environment. After this, the structure, relationship, and context information of the instructions are fully mined and these three aspects of instruction behavior semantics are embedded into a vector representation for the subsequent building of malware feature images. Furthermore, our method creates channel attention for capturing important features. In addition to the widely used Microsoft Malware Classification Challenge dataset, we take the lead in conducting experiments on the recently made available BODMAS dataset. The average accuracy rates of EII-MBS are 99.40% and 99.26% on the two datasets, respectively. Further experiments on different proportions of training datasets and testing datasets show that our method achieves state-of-the-art malware family classification performance.
AB - Given the ever-increasing number of malware variants, detecting malware families is crucial. However, the operand semantics of assembly instructions are strongly related to the operating environment and are difficult to extract. This leads to the lack of instruction semantics and the difficulty in correctly classifying malware variants. At the same time, previous research does not mine the internal structural features of the instructions and the contextual relationships between them. This makes it difficult to efficiently identify virus variants. With this as motivation, this article presents a malware family classification method called EII-MBS (enhanced instruction-level behavior semantics learning). By abstracting the types of operands, the semantics of the operands are separated from the constraints of the operating environment. After this, the structure, relationship, and context information of the instructions are fully mined and these three aspects of instruction behavior semantics are embedded into a vector representation for the subsequent building of malware feature images. Furthermore, our method creates channel attention for capturing important features. In addition to the widely used Microsoft Malware Classification Challenge dataset, we take the lead in conducting experiments on the recently made available BODMAS dataset. The average accuracy rates of EII-MBS are 99.40% and 99.26% on the two datasets, respectively. Further experiments on different proportions of training datasets and testing datasets show that our method achieves state-of-the-art malware family classification performance.
KW - Behavior semantics
KW - Channel attention
KW - Instruction embedding
KW - Malware family classification
KW - Visualization
UR - http://www.scopus.com/inward/record.url?scp=85138462800&partnerID=8YFLogxK
U2 - 10.1016/j.cose.2022.102905
DO - 10.1016/j.cose.2022.102905
M3 - Article
AN - SCOPUS:85138462800
SN - 0167-4048
VL - 122
JO - Computers and Security
JF - Computers and Security
M1 - 102905
ER -