TY - GEN
T1 - Boosting training for PDF malware classifier via active learning
AU - Wang, Xinxin
AU - Li, Yuanzhang
AU - Zhang, Quanxin
AU - Kuang, Xiaohui
N1 - Publisher Copyright:
© 2019, Springer Nature Switzerland AG.
PY - 2019
Y1 - 2019
N2 - Malicious code has been a serious threat in the field of network security. PDF (Portable Document Format) is a widely used file format, and often utilized as a vehicle for malicious behavior. In this paper, machine learning algorithm will be used to detect malicious PDF document, and evaluated on experimental data. The main work of this paper is to implement a malware detection method, which utilizes static pre-processing and machine learning algorithm for classification. During the period of classifying, the differences in structure and content between malicious and benign PDF files will be taken as the classification basis. What’s more, we boost training for the PDF malware classifier via active learning based on mutual agreement analysis. The detector is retrained according to the truth value of the uncertain samples, which can not only reduce the training time consumption of the detector, but also improve the detection performance.
AB - Malicious code has been a serious threat in the field of network security. PDF (Portable Document Format) is a widely used file format, and often utilized as a vehicle for malicious behavior. In this paper, machine learning algorithm will be used to detect malicious PDF document, and evaluated on experimental data. The main work of this paper is to implement a malware detection method, which utilizes static pre-processing and machine learning algorithm for classification. During the period of classifying, the differences in structure and content between malicious and benign PDF files will be taken as the classification basis. What’s more, we boost training for the PDF malware classifier via active learning based on mutual agreement analysis. The detector is retrained according to the truth value of the uncertain samples, which can not only reduce the training time consumption of the detector, but also improve the detection performance.
KW - Active learning
KW - Information security
KW - Malware detection
KW - PDF
UR - http://www.scopus.com/inward/record.url?scp=85078541668&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-37352-8_9
DO - 10.1007/978-3-030-37352-8_9
M3 - Conference contribution
AN - SCOPUS:85078541668
SN - 9783030373511
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 101
EP - 110
BT - Cyberspace Safety and Security - 11th International Symposium, CSS 2019, Proceedings
A2 - Vaidya, Jaideep
A2 - Zhang, Xiao
A2 - Li, Jin
PB - Springer
T2 - 11th International Symposium on Cyberspace Safety and Security, CSS 2019
Y2 - 1 December 2019 through 3 December 2019
ER -