Boosting training for PDF malware classifier via active learning

Xinxin Wang*, Yuanzhang Li, Quanxin Zhang, Xiaohui Kuang

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Malicious code has been a serious threat in the field of network security. PDF (Portable Document Format) is a widely used file format, and often utilized as a vehicle for malicious behavior. In this paper, machine learning algorithm will be used to detect malicious PDF document, and evaluated on experimental data. The main work of this paper is to implement a malware detection method, which utilizes static pre-processing and machine learning algorithm for classification. During the period of classifying, the differences in structure and content between malicious and benign PDF files will be taken as the classification basis. What’s more, we boost training for the PDF malware classifier via active learning based on mutual agreement analysis. The detector is retrained according to the truth value of the uncertain samples, which can not only reduce the training time consumption of the detector, but also improve the detection performance.

Original languageEnglish
Title of host publicationCyberspace Safety and Security - 11th International Symposium, CSS 2019, Proceedings
EditorsJaideep Vaidya, Xiao Zhang, Jin Li
PublisherSpringer
Pages101-110
Number of pages10
ISBN (Print)9783030373511
DOIs
Publication statusPublished - 2019
Event11th International Symposium on Cyberspace Safety and Security, CSS 2019 - Guangzhou, China
Duration: 1 Dec 20193 Dec 2019

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11983 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference11th International Symposium on Cyberspace Safety and Security, CSS 2019
Country/TerritoryChina
CityGuangzhou
Period1/12/193/12/19

Keywords

  • Active learning
  • Information security
  • Malware detection
  • PDF

Fingerprint

Dive into the research topics of 'Boosting training for PDF malware classifier via active learning'. Together they form a unique fingerprint.

Cite this