基于机器学习的内核恶意程序检测研究与实现

Dong Hai Tian; Hang Wei; Bo Zhang; Yu Lei Yu; Jia Suo Li; Rui Ma

doi:10.15918/j.tbit1001-0645.2019.261

基于机器学习的内核恶意程序检测研究与实现

Translated title of the contribution: Research and Implementation of Kernel Malicious Code Detection Based on Machine Learning

Dong Hai Tian, Hang Wei^*, Bo Zhang, Yu Lei Yu, Jia Suo Li, Rui Ma

^*Corresponding author for this work

School of Computer Science and Technology

Research output: Contribution to journal › Article › peer-review

4 Citations (Scopus)

Abstract

With the development of computer science, the world is becoming more and more dependent on computers, and computer security is becoming more and more important. Malicious code is the biggest enemy of computer security. In this paper, a new method was proposed based on machine learning and new classification features to identify malicious programs, make a preliminary family classification of them, point out some shortcomings of previous machine learning in malicious code detection and classification, and screen out better distinguishing features. Firstly, n-gram algorithm was used to optimize the opcode characteristics in the disassembly code of malicious code. And then a Bag of Words model and TF-IDF algorithm were used to optimize the API call characteristics. Finally, a model was programmed and the data set was used to train and test the model. In the experiment, the classification accuracy of the model with decision tree algorithm can reach 87.41%, and the classification accuracy of the model with random forest algorithm can reach 90.06%. The experimental results show that, compared with others presented in the detection and classification of malicious code, the features of proposed method can achieve a better effect.

Translated title of the contribution	Research and Implementation of Kernel Malicious Code Detection Based on Machine Learning
Original language	Chinese (Traditional)
Pages (from-to)	1295-1301
Number of pages	7
Journal	Beijing Ligong Daxue Xuebao/Transaction of Beijing Institute of Technology
Volume	40
Issue number	12
DOIs	https://doi.org/10.15918/j.tbit1001-0645.2019.261
Publication status	Published - Dec 2020

Access to Document

10.15918/j.tbit1001-0645.2019.261

Cite this

@article{4b49c2ec3e3b479a87de0d7548740a7a,

title = "基于机器学习的内核恶意程序检测研究与实现",

abstract = "With the development of computer science, the world is becoming more and more dependent on computers, and computer security is becoming more and more important. Malicious code is the biggest enemy of computer security. In this paper, a new method was proposed based on machine learning and new classification features to identify malicious programs, make a preliminary family classification of them, point out some shortcomings of previous machine learning in malicious code detection and classification, and screen out better distinguishing features. Firstly, n-gram algorithm was used to optimize the opcode characteristics in the disassembly code of malicious code. And then a Bag of Words model and TF-IDF algorithm were used to optimize the API call characteristics. Finally, a model was programmed and the data set was used to train and test the model. In the experiment, the classification accuracy of the model with decision tree algorithm can reach 87.41%, and the classification accuracy of the model with random forest algorithm can reach 90.06%. The experimental results show that, compared with others presented in the detection and classification of malicious code, the features of proposed method can achieve a better effect.",

keywords = "API, Decision tree, Malicious code classification, Opcode, Random forest",

author = "Tian, {Dong Hai} and Hang Wei and Bo Zhang and Yu, {Yu Lei} and Li, {Jia Suo} and Rui Ma",

year = "2020",

month = dec,

doi = "10.15918/j.tbit1001-0645.2019.261",

language = "繁体中文",

volume = "40",

pages = "1295--1301",

journal = "Beijing Ligong Daxue Xuebao/Transaction of Beijing Institute of Technology",

issn = "1001-0645",

publisher = "Beijing Institute of Technology",

number = "12",

}

TY - JOUR

T1 - 基于机器学习的内核恶意程序检测研究与实现

AU - Tian, Dong Hai

AU - Wei, Hang

AU - Zhang, Bo

AU - Yu, Yu Lei

AU - Li, Jia Suo

AU - Ma, Rui

PY - 2020/12

Y1 - 2020/12

N2 - With the development of computer science, the world is becoming more and more dependent on computers, and computer security is becoming more and more important. Malicious code is the biggest enemy of computer security. In this paper, a new method was proposed based on machine learning and new classification features to identify malicious programs, make a preliminary family classification of them, point out some shortcomings of previous machine learning in malicious code detection and classification, and screen out better distinguishing features. Firstly, n-gram algorithm was used to optimize the opcode characteristics in the disassembly code of malicious code. And then a Bag of Words model and TF-IDF algorithm were used to optimize the API call characteristics. Finally, a model was programmed and the data set was used to train and test the model. In the experiment, the classification accuracy of the model with decision tree algorithm can reach 87.41%, and the classification accuracy of the model with random forest algorithm can reach 90.06%. The experimental results show that, compared with others presented in the detection and classification of malicious code, the features of proposed method can achieve a better effect.

AB - With the development of computer science, the world is becoming more and more dependent on computers, and computer security is becoming more and more important. Malicious code is the biggest enemy of computer security. In this paper, a new method was proposed based on machine learning and new classification features to identify malicious programs, make a preliminary family classification of them, point out some shortcomings of previous machine learning in malicious code detection and classification, and screen out better distinguishing features. Firstly, n-gram algorithm was used to optimize the opcode characteristics in the disassembly code of malicious code. And then a Bag of Words model and TF-IDF algorithm were used to optimize the API call characteristics. Finally, a model was programmed and the data set was used to train and test the model. In the experiment, the classification accuracy of the model with decision tree algorithm can reach 87.41%, and the classification accuracy of the model with random forest algorithm can reach 90.06%. The experimental results show that, compared with others presented in the detection and classification of malicious code, the features of proposed method can achieve a better effect.

KW - API

KW - Decision tree

KW - Malicious code classification

KW - Opcode

KW - Random forest

UR - http://www.scopus.com/inward/record.url?scp=85099419502&partnerID=8YFLogxK

U2 - 10.15918/j.tbit1001-0645.2019.261

DO - 10.15918/j.tbit1001-0645.2019.261

M3 - 文章

AN - SCOPUS:85099419502

SN - 1001-0645

VL - 40

SP - 1295

EP - 1301

JO - Beijing Ligong Daxue Xuebao/Transaction of Beijing Institute of Technology

JF - Beijing Ligong Daxue Xuebao/Transaction of Beijing Institute of Technology

IS - 12

ER -

基于机器学习的内核恶意程序检测研究与实现

Abstract

Access to Document

Other files and links

Fingerprint

Cite this