摘要
Classification and distinguishing of malware is key to predict the malicious attack, which is essential in self-driving systems. In order to handle large number of malware variants, many machine learning methods have been proposed. However, the accuracy and efficiency of multiple class classification of malware still remained inadequate to meet demand. In this paper, we propose a 4-LFE method to deal with the issues above. We extract multi-features from malicious programs by combining pixel and n-gram features. In the process of feature selection, we apply L1-L2 penalty into the Logistic Regression, then use LDA to reduce dimensions of malware features. Based on the selected features, we study the performance of classification on ten machine learning algorithms. We assess our approach's precision on a public dataset consisting 10,868 malware samples. Experimental results show our method could classify malware to their family with accuracy of 99.99%.
源语言 | 英语 |
---|---|
页(从-至) | 352-360 |
页数 | 9 |
期刊 | Neurocomputing |
卷 | 428 |
DOI | |
出版状态 | 已出版 - 7 3月 2021 |