TY - JOUR
T1 - MalDAE
T2 - Detecting and explaining malware based on correlation and fusion of static and dynamic characteristics
AU - Han, Weijie
AU - Xue, Jingfeng
AU - Wang, Yong
AU - Huang, Lu
AU - Kong, Zixiao
AU - Mao, Limin
N1 - Publisher Copyright:
© 2019 The Authors
PY - 2019/6
Y1 - 2019/6
N2 - It is a wide-spread way to detect malware by analyzing its behavioral characteristics based on API call sequences. However, previous studies usually just focus on its static or dynamic API call sequence, while neglecting the correlation between them. Our experimental results show that there exists an underlying relation between the dynamic and static API call sequences of malware. The relation can be described as “the syntax is different, but the semantics is similar”. Based on this discovery, this paper first attempts to explore the difference and relation between the static and dynamic API sequences of malicious programs. We correlate and fuse their dynamic and static API sequences into one hybrid sequence based on semantics mapping and then construct the hybrid feature vector space. Furthermore, we mine and define the malicious behavior types of the programs, and provide explainable results for malware detection. Our study has addressed the shortcoming of the previous approaches that they usually pay attention to detection but neglect explanation. By correlation and fusion of the static and dynamic API sequences, we establish an explainable malware detection framework, called MalDAE. The evaluation results show that the detection and classification accuracy of MalDAE can reach up to 97.89% and 94.39% respectively outperforming the previous similar studies by comprehensive comparison. In addition, MalDAE gives an understandable explanation for common types of malware and provides predictive support for understanding and resisting malware.
AB - It is a wide-spread way to detect malware by analyzing its behavioral characteristics based on API call sequences. However, previous studies usually just focus on its static or dynamic API call sequence, while neglecting the correlation between them. Our experimental results show that there exists an underlying relation between the dynamic and static API call sequences of malware. The relation can be described as “the syntax is different, but the semantics is similar”. Based on this discovery, this paper first attempts to explore the difference and relation between the static and dynamic API sequences of malicious programs. We correlate and fuse their dynamic and static API sequences into one hybrid sequence based on semantics mapping and then construct the hybrid feature vector space. Furthermore, we mine and define the malicious behavior types of the programs, and provide explainable results for malware detection. Our study has addressed the shortcoming of the previous approaches that they usually pay attention to detection but neglect explanation. By correlation and fusion of the static and dynamic API sequences, we establish an explainable malware detection framework, called MalDAE. The evaluation results show that the detection and classification accuracy of MalDAE can reach up to 97.89% and 94.39% respectively outperforming the previous similar studies by comprehensive comparison. In addition, MalDAE gives an understandable explanation for common types of malware and provides predictive support for understanding and resisting malware.
KW - API call sequence
KW - Behavioral correlation
KW - Behavioral differences
KW - Malicious behavior types
KW - Malware
UR - http://www.scopus.com/inward/record.url?scp=85062293558&partnerID=8YFLogxK
U2 - 10.1016/j.cose.2019.02.007
DO - 10.1016/j.cose.2019.02.007
M3 - Article
AN - SCOPUS:85062293558
SN - 0167-4048
VL - 83
SP - 208
EP - 233
JO - Computers and Security
JF - Computers and Security
ER -