Research on malicious code analysis method based on semi-supervised learning

Tingting He; Jingfeng Xue; Jianwen Fu; Yong Wang; Chun Shan

doi:10.1007/978-981-10-7080-8_17

Research on malicious code analysis method based on semi-supervised learning

Tingting He, Jingfeng Xue, Jianwen Fu, Yong Wang^*, Chun Shan

^*此作品的通讯作者

计算机学院

Beijing Institute of Technology

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

1 引用（Scopus）

摘要

The research on classification method of malicious code is helpful for researchers to understand attack characteristics quickly, and help to reduce the loss of users and even the states. Currently, most of the malware classification methods are based on supervised learning algorithms, but it is powerless for the small number of labeled samples. Therefore, in this paper, we propose a new malware classification method, which is based on semi-supervised learning algorithm. First, we extract the impactful static features and dynamic features to serialize and obtain features of high dimension. Then, we select them with Ensemble Feature Grader consistent with Information Gain, Random Forest and Logistic Regression with L₁ and L₂, and reduce dimension again with PCA. Finally, we use Learning with local and global consistency algorithm with K-means to classify malwares. The experimental results of comparison among SVM, LLGC and K-means + LLGC show that using of the feature extraction, feature reduction and classification method, K-means + LLGC algorithm is superior to LLGC in both classification accuracy and efficiency, the accuracy is increased by 2% to 3%, and the accuracy is more than SVM when the number of labeled samples is small.

源语言	英语
主期刊名	Trusted Computing and Information Security - 11th Chinese Conference, CTCIS 2017, Proceedings
编辑	Fei Yan, Ming Xu, Shaojing Fu, Zheng Qin
出版商	Springer Verlag
页	227-241
页数	15
ISBN（印刷版）	9789811070792
DOI	https://doi.org/10.1007/978-981-10-7080-8_17
出版状态	已出版 - 2017
活动	11th Chinese Conference on Trusted Computing and Information Security, CTCIS 2017 - Changsha, 中国期限: 14 9月 2017 → 17 9月 2017

出版系列

姓名	Communications in Computer and Information Science
卷	704
ISSN（印刷版）	1865-0929

会议

会议	11th Chinese Conference on Trusted Computing and Information Security, CTCIS 2017
国家/地区	中国
市	Changsha
时期	14/09/17 → 17/09/17

访问文件

10.1007/978-981-10-7080-8_17

其它文件与链接

链接到 Scopus 的出版物

引用此

He, T., Xue, J., Fu, J., Wang, Y., & Shan, C. (2017). Research on malicious code analysis method based on semi-supervised learning. 在 F. Yan, M. Xu, S. Fu, & Z. Qin (编辑), Trusted Computing and Information Security - 11th Chinese Conference, CTCIS 2017, Proceedings (页码 227-241). (Communications in Computer and Information Science; 卷 704). Springer Verlag. https://doi.org/10.1007/978-981-10-7080-8_17

@inproceedings{40d01e83c9a04230b8f5644d6240e339,

title = "Research on malicious code analysis method based on semi-supervised learning",

abstract = "The research on classification method of malicious code is helpful for researchers to understand attack characteristics quickly, and help to reduce the loss of users and even the states. Currently, most of the malware classification methods are based on supervised learning algorithms, but it is powerless for the small number of labeled samples. Therefore, in this paper, we propose a new malware classification method, which is based on semi-supervised learning algorithm. First, we extract the impactful static features and dynamic features to serialize and obtain features of high dimension. Then, we select them with Ensemble Feature Grader consistent with Information Gain, Random Forest and Logistic Regression with L1 and L2, and reduce dimension again with PCA. Finally, we use Learning with local and global consistency algorithm with K-means to classify malwares. The experimental results of comparison among SVM, LLGC and K-means + LLGC show that using of the feature extraction, feature reduction and classification method, K-means + LLGC algorithm is superior to LLGC in both classification accuracy and efficiency, the accuracy is increased by 2% to 3%, and the accuracy is more than SVM when the number of labeled samples is small.",

keywords = "Feature processing, K-means LLGC, Malicious code",

author = "Tingting He and Jingfeng Xue and Jianwen Fu and Yong Wang and Chun Shan",

note = "Publisher Copyright: {\textcopyright} Springer Nature Singapore Pte Ltd. 2017.; 11th Chinese Conference on Trusted Computing and Information Security, CTCIS 2017 ; Conference date: 14-09-2017 Through 17-09-2017",

year = "2017",

doi = "10.1007/978-981-10-7080-8_17",

language = "English",

isbn = "9789811070792",

series = "Communications in Computer and Information Science",

publisher = "Springer Verlag",

pages = "227--241",

editor = "Fei Yan and Ming Xu and Shaojing Fu and Zheng Qin",

booktitle = "Trusted Computing and Information Security - 11th Chinese Conference, CTCIS 2017, Proceedings",

address = "Germany",

}

He, T, Xue, J, Fu, J, Wang, Y & Shan, C 2017, Research on malicious code analysis method based on semi-supervised learning. 在 F Yan, M Xu, S Fu & Z Qin (编辑), Trusted Computing and Information Security - 11th Chinese Conference, CTCIS 2017, Proceedings. Communications in Computer and Information Science, 卷 704, Springer Verlag, 页码 227-241, 11th Chinese Conference on Trusted Computing and Information Security, CTCIS 2017, Changsha, 中国, 14/09/17. https://doi.org/10.1007/978-981-10-7080-8_17

Research on malicious code analysis method based on semi-supervised learning. / He, Tingting; Xue, Jingfeng; Fu, Jianwen 等.
Trusted Computing and Information Security - 11th Chinese Conference, CTCIS 2017, Proceedings. 编辑 / Fei Yan; Ming Xu; Shaojing Fu; Zheng Qin. Springer Verlag, 2017. 页码 227-241 (Communications in Computer and Information Science; 卷 704).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Research on malicious code analysis method based on semi-supervised learning

AU - He, Tingting

AU - Xue, Jingfeng

AU - Fu, Jianwen

AU - Wang, Yong

AU - Shan, Chun

N1 - Publisher Copyright: © Springer Nature Singapore Pte Ltd. 2017.

PY - 2017

Y1 - 2017

N2 - The research on classification method of malicious code is helpful for researchers to understand attack characteristics quickly, and help to reduce the loss of users and even the states. Currently, most of the malware classification methods are based on supervised learning algorithms, but it is powerless for the small number of labeled samples. Therefore, in this paper, we propose a new malware classification method, which is based on semi-supervised learning algorithm. First, we extract the impactful static features and dynamic features to serialize and obtain features of high dimension. Then, we select them with Ensemble Feature Grader consistent with Information Gain, Random Forest and Logistic Regression with L1 and L2, and reduce dimension again with PCA. Finally, we use Learning with local and global consistency algorithm with K-means to classify malwares. The experimental results of comparison among SVM, LLGC and K-means + LLGC show that using of the feature extraction, feature reduction and classification method, K-means + LLGC algorithm is superior to LLGC in both classification accuracy and efficiency, the accuracy is increased by 2% to 3%, and the accuracy is more than SVM when the number of labeled samples is small.

AB - The research on classification method of malicious code is helpful for researchers to understand attack characteristics quickly, and help to reduce the loss of users and even the states. Currently, most of the malware classification methods are based on supervised learning algorithms, but it is powerless for the small number of labeled samples. Therefore, in this paper, we propose a new malware classification method, which is based on semi-supervised learning algorithm. First, we extract the impactful static features and dynamic features to serialize and obtain features of high dimension. Then, we select them with Ensemble Feature Grader consistent with Information Gain, Random Forest and Logistic Regression with L1 and L2, and reduce dimension again with PCA. Finally, we use Learning with local and global consistency algorithm with K-means to classify malwares. The experimental results of comparison among SVM, LLGC and K-means + LLGC show that using of the feature extraction, feature reduction and classification method, K-means + LLGC algorithm is superior to LLGC in both classification accuracy and efficiency, the accuracy is increased by 2% to 3%, and the accuracy is more than SVM when the number of labeled samples is small.

KW - Feature processing

KW - K-means LLGC

KW - Malicious code

UR - http://www.scopus.com/inward/record.url?scp=85036466164&partnerID=8YFLogxK

U2 - 10.1007/978-981-10-7080-8_17

DO - 10.1007/978-981-10-7080-8_17

M3 - Conference contribution

AN - SCOPUS:85036466164

SN - 9789811070792

T3 - Communications in Computer and Information Science

SP - 227

EP - 241

BT - Trusted Computing and Information Security - 11th Chinese Conference, CTCIS 2017, Proceedings

A2 - Yan, Fei

A2 - Xu, Ming

A2 - Fu, Shaojing

A2 - Qin, Zheng

PB - Springer Verlag

T2 - 11th Chinese Conference on Trusted Computing and Information Security, CTCIS 2017

Y2 - 14 September 2017 through 17 September 2017

ER -

He T, Xue J, Fu J, Wang Y , Shan C. Research on malicious code analysis method based on semi-supervised learning. 在 Yan F, Xu M, Fu S, Qin Z, 编辑, Trusted Computing and Information Security - 11th Chinese Conference, CTCIS 2017, Proceedings. Springer Verlag. 2017. 页码 227-241. (Communications in Computer and Information Science). doi: 10.1007/978-981-10-7080-8_17

Research on malicious code analysis method based on semi-supervised learning

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此