LM-cAPI:A Lite Model Based on API Core Semantic Information for Malware Classification

Yifan Zhou; Zhenyan Liu; Jingfeng Xue; Yong Wang; Ji Zhang

doi:10.1007/978-3-031-61486-6_3

LM-cAPI:A Lite Model Based on API Core Semantic Information for Malware Classification

Yifan Zhou, Zhenyan Liu^*, Jingfeng Xue, Yong Wang, Ji Zhang

^*Corresponding author for this work

School of Computer Science and Technology

Beijing Institute of Technology

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

Currently, malware is continually evolving and growing in complexity, posing a significant threat to network security. With the constant emergence of new types and quantities of malware coupled with the continuous updating of dissemination methods, the rapid and accurate identification of malware as well as providing precise support for corresponding warning and defense measures have become a crucial challenge in maintaining network security. This article focuses on API call sequences in malware that can characterize the behavioral characteristics of malware as text and then uses the latest text classification-related technologies to achieve the classification of malware. This article proposes a flexible and lightweight malicious code classification model based on API core semantic information. To address the issues of prolonged training time and low accuracy caused by excessive noise and redundant data in API call sequences, this model adopts an intimacy analysis method based on a self-attention mechanism for key information extraction. To enhance the capture of semantic information within malware API call sequences, a feature extraction model based on a self-attention mechanism is used to transform unstructured key API sequences into vector representations, extract core features, and finally connect to the TextCNN model for multi classification. In the dataset of the “Alibaba Cloud Security Malicious Program Detection” competition, the F1 value reached 90% in eight category classification tasks. The experimental results show that the model proposed in this article can achieve better results in malware detection and multi-classification.

Original language	English
Title of host publication	Applied Cryptography and Network Security Workshops - ACNS 2024 Satellite Workshops, AIBlock, AIHWS, AIoTS, SCI, AAC, SiMLA, LLE, and CIMSS, Proceedings
Editors	Martin Andreoni
Publisher	Springer Science and Business Media Deutschland GmbH
Pages	29-42
Number of pages	14
ISBN (Print)	9783031614859
DOIs	https://doi.org/10.1007/978-3-031-61486-6_3
Publication status	Published - 2024
Event	Satellite Workshops held in parallel with the 22nd International Conference on Applied Cryptography and Network Security, ACNS 2024 - Abu Dhabi, United Arab Emirates Duration: 5 Mar 2024 → 8 Mar 2024

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	14586 LNCS
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	Satellite Workshops held in parallel with the 22nd International Conference on Applied Cryptography and Network Security, ACNS 2024
Country/Territory	United Arab Emirates
City	Abu Dhabi
Period	5/03/24 → 8/03/24

Keywords

API call sequence
Malware Classification
Network Security

Access to Document

10.1007/978-3-031-61486-6_3

Cite this

Zhou, Y., Liu, Z., Xue, J., Wang, Y., & Zhang, J. (2024). LM-cAPI:A Lite Model Based on API Core Semantic Information for Malware Classification. In M. Andreoni (Ed.), Applied Cryptography and Network Security Workshops - ACNS 2024 Satellite Workshops, AIBlock, AIHWS, AIoTS, SCI, AAC, SiMLA, LLE, and CIMSS, Proceedings (pp. 29-42). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 14586 LNCS). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-61486-6_3

Zhou, Yifan ; Liu, Zhenyan ; Xue, Jingfeng et al. / LM-cAPI:A Lite Model Based on API Core Semantic Information for Malware Classification. Applied Cryptography and Network Security Workshops - ACNS 2024 Satellite Workshops, AIBlock, AIHWS, AIoTS, SCI, AAC, SiMLA, LLE, and CIMSS, Proceedings. editor / Martin Andreoni. Springer Science and Business Media Deutschland GmbH, 2024. pp. 29-42 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{2d3579a0508a4d15a6258735ef9f1300,

title = "LM-cAPI:A Lite Model Based on API Core Semantic Information for Malware Classification",

abstract = "Currently, malware is continually evolving and growing in complexity, posing a significant threat to network security. With the constant emergence of new types and quantities of malware coupled with the continuous updating of dissemination methods, the rapid and accurate identification of malware as well as providing precise support for corresponding warning and defense measures have become a crucial challenge in maintaining network security. This article focuses on API call sequences in malware that can characterize the behavioral characteristics of malware as text and then uses the latest text classification-related technologies to achieve the classification of malware. This article proposes a flexible and lightweight malicious code classification model based on API core semantic information. To address the issues of prolonged training time and low accuracy caused by excessive noise and redundant data in API call sequences, this model adopts an intimacy analysis method based on a self-attention mechanism for key information extraction. To enhance the capture of semantic information within malware API call sequences, a feature extraction model based on a self-attention mechanism is used to transform unstructured key API sequences into vector representations, extract core features, and finally connect to the TextCNN model for multi classification. In the dataset of the “Alibaba Cloud Security Malicious Program Detection” competition, the F1 value reached 90% in eight category classification tasks. The experimental results show that the model proposed in this article can achieve better results in malware detection and multi-classification.",

keywords = "API call sequence, Malware Classification, Network Security",

author = "Yifan Zhou and Zhenyan Liu and Jingfeng Xue and Yong Wang and Ji Zhang",

note = "Publisher Copyright: {\textcopyright} The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.; Satellite Workshops held in parallel with the 22nd International Conference on Applied Cryptography and Network Security, ACNS 2024 ; Conference date: 05-03-2024 Through 08-03-2024",

year = "2024",

doi = "10.1007/978-3-031-61486-6_3",

language = "English",

isbn = "9783031614859",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "29--42",

editor = "Martin Andreoni",

booktitle = "Applied Cryptography and Network Security Workshops - ACNS 2024 Satellite Workshops, AIBlock, AIHWS, AIoTS, SCI, AAC, SiMLA, LLE, and CIMSS, Proceedings",

address = "Germany",

}

Zhou, Y, Liu, Z, Xue, J, Wang, Y & Zhang, J 2024, LM-cAPI:A Lite Model Based on API Core Semantic Information for Malware Classification. in M Andreoni (ed.), Applied Cryptography and Network Security Workshops - ACNS 2024 Satellite Workshops, AIBlock, AIHWS, AIoTS, SCI, AAC, SiMLA, LLE, and CIMSS, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 14586 LNCS, Springer Science and Business Media Deutschland GmbH, pp. 29-42, Satellite Workshops held in parallel with the 22nd International Conference on Applied Cryptography and Network Security, ACNS 2024, Abu Dhabi, United Arab Emirates, 5/03/24. https://doi.org/10.1007/978-3-031-61486-6_3

LM-cAPI:A Lite Model Based on API Core Semantic Information for Malware Classification. / Zhou, Yifan; Liu, Zhenyan; Xue, Jingfeng et al.
Applied Cryptography and Network Security Workshops - ACNS 2024 Satellite Workshops, AIBlock, AIHWS, AIoTS, SCI, AAC, SiMLA, LLE, and CIMSS, Proceedings. ed. / Martin Andreoni. Springer Science and Business Media Deutschland GmbH, 2024. p. 29-42 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 14586 LNCS).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - LM-cAPI:A Lite Model Based on API Core Semantic Information for Malware Classification

AU - Zhou, Yifan

AU - Liu, Zhenyan

AU - Xue, Jingfeng

AU - Wang, Yong

AU - Zhang, Ji

N1 - Publisher Copyright: © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.

PY - 2024

Y1 - 2024

N2 - Currently, malware is continually evolving and growing in complexity, posing a significant threat to network security. With the constant emergence of new types and quantities of malware coupled with the continuous updating of dissemination methods, the rapid and accurate identification of malware as well as providing precise support for corresponding warning and defense measures have become a crucial challenge in maintaining network security. This article focuses on API call sequences in malware that can characterize the behavioral characteristics of malware as text and then uses the latest text classification-related technologies to achieve the classification of malware. This article proposes a flexible and lightweight malicious code classification model based on API core semantic information. To address the issues of prolonged training time and low accuracy caused by excessive noise and redundant data in API call sequences, this model adopts an intimacy analysis method based on a self-attention mechanism for key information extraction. To enhance the capture of semantic information within malware API call sequences, a feature extraction model based on a self-attention mechanism is used to transform unstructured key API sequences into vector representations, extract core features, and finally connect to the TextCNN model for multi classification. In the dataset of the “Alibaba Cloud Security Malicious Program Detection” competition, the F1 value reached 90% in eight category classification tasks. The experimental results show that the model proposed in this article can achieve better results in malware detection and multi-classification.

AB - Currently, malware is continually evolving and growing in complexity, posing a significant threat to network security. With the constant emergence of new types and quantities of malware coupled with the continuous updating of dissemination methods, the rapid and accurate identification of malware as well as providing precise support for corresponding warning and defense measures have become a crucial challenge in maintaining network security. This article focuses on API call sequences in malware that can characterize the behavioral characteristics of malware as text and then uses the latest text classification-related technologies to achieve the classification of malware. This article proposes a flexible and lightweight malicious code classification model based on API core semantic information. To address the issues of prolonged training time and low accuracy caused by excessive noise and redundant data in API call sequences, this model adopts an intimacy analysis method based on a self-attention mechanism for key information extraction. To enhance the capture of semantic information within malware API call sequences, a feature extraction model based on a self-attention mechanism is used to transform unstructured key API sequences into vector representations, extract core features, and finally connect to the TextCNN model for multi classification. In the dataset of the “Alibaba Cloud Security Malicious Program Detection” competition, the F1 value reached 90% in eight category classification tasks. The experimental results show that the model proposed in this article can achieve better results in malware detection and multi-classification.

KW - API call sequence

KW - Malware Classification

KW - Network Security

UR - http://www.scopus.com/inward/record.url?scp=85198439117&partnerID=8YFLogxK

U2 - 10.1007/978-3-031-61486-6_3

DO - 10.1007/978-3-031-61486-6_3

M3 - Conference contribution

AN - SCOPUS:85198439117

SN - 9783031614859

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 29

EP - 42

BT - Applied Cryptography and Network Security Workshops - ACNS 2024 Satellite Workshops, AIBlock, AIHWS, AIoTS, SCI, AAC, SiMLA, LLE, and CIMSS, Proceedings

A2 - Andreoni, Martin

PB - Springer Science and Business Media Deutschland GmbH

T2 - Satellite Workshops held in parallel with the 22nd International Conference on Applied Cryptography and Network Security, ACNS 2024

Y2 - 5 March 2024 through 8 March 2024

ER -

Zhou Y, Liu Z, Xue J, Wang Y, Zhang J. LM-cAPI:A Lite Model Based on API Core Semantic Information for Malware Classification. In Andreoni M, editor, Applied Cryptography and Network Security Workshops - ACNS 2024 Satellite Workshops, AIBlock, AIHWS, AIoTS, SCI, AAC, SiMLA, LLE, and CIMSS, Proceedings. Springer Science and Business Media Deutschland GmbH. 2024. p. 29-42. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-031-61486-6_3