TY - JOUR
T1 - 基于机器学习的内核恶意程序检测研究与实现
AU - Tian, Dong Hai
AU - Wei, Hang
AU - Zhang, Bo
AU - Yu, Yu Lei
AU - Li, Jia Suo
AU - Ma, Rui
N1 - Publisher Copyright:
© 2020, Editorial Department of Transaction of Beijing Institute of Technology. All right reserved.
PY - 2020/12
Y1 - 2020/12
N2 - With the development of computer science, the world is becoming more and more dependent on computers, and computer security is becoming more and more important. Malicious code is the biggest enemy of computer security. In this paper, a new method was proposed based on machine learning and new classification features to identify malicious programs, make a preliminary family classification of them, point out some shortcomings of previous machine learning in malicious code detection and classification, and screen out better distinguishing features. Firstly, n-gram algorithm was used to optimize the opcode characteristics in the disassembly code of malicious code. And then a Bag of Words model and TF-IDF algorithm were used to optimize the API call characteristics. Finally, a model was programmed and the data set was used to train and test the model. In the experiment, the classification accuracy of the model with decision tree algorithm can reach 87.41%, and the classification accuracy of the model with random forest algorithm can reach 90.06%. The experimental results show that, compared with others presented in the detection and classification of malicious code, the features of proposed method can achieve a better effect.
AB - With the development of computer science, the world is becoming more and more dependent on computers, and computer security is becoming more and more important. Malicious code is the biggest enemy of computer security. In this paper, a new method was proposed based on machine learning and new classification features to identify malicious programs, make a preliminary family classification of them, point out some shortcomings of previous machine learning in malicious code detection and classification, and screen out better distinguishing features. Firstly, n-gram algorithm was used to optimize the opcode characteristics in the disassembly code of malicious code. And then a Bag of Words model and TF-IDF algorithm were used to optimize the API call characteristics. Finally, a model was programmed and the data set was used to train and test the model. In the experiment, the classification accuracy of the model with decision tree algorithm can reach 87.41%, and the classification accuracy of the model with random forest algorithm can reach 90.06%. The experimental results show that, compared with others presented in the detection and classification of malicious code, the features of proposed method can achieve a better effect.
KW - API
KW - Decision tree
KW - Malicious code classification
KW - Opcode
KW - Random forest
UR - http://www.scopus.com/inward/record.url?scp=85099419502&partnerID=8YFLogxK
U2 - 10.15918/j.tbit1001-0645.2019.261
DO - 10.15918/j.tbit1001-0645.2019.261
M3 - 文章
AN - SCOPUS:85099419502
SN - 1001-0645
VL - 40
SP - 1295
EP - 1301
JO - Beijing Ligong Daxue Xuebao/Transaction of Beijing Institute of Technology
JF - Beijing Ligong Daxue Xuebao/Transaction of Beijing Institute of Technology
IS - 12
ER -