Hybrid sequence-based Android malware detection using natural language processing

Nan Zhang, Jingfeng Xue, Yuxi Ma, Ruyun Zhang, Tiancai Liang*, Yu an Tan*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

53 Citations (Scopus)

Abstract

Android platform has been the target of attackers due to its openness and increasing popularity. Android malware has explosively increased in recent years, which poses serious threats to Android security. Thus proposing efficient Android malware detection methods is curial in defeating malware. Various features extracted from static or dynamic analysis using machine learning have played an important role in malware detection recently. However, existing code obfuscation, code encryption, and dynamic code loading techniques can be employed to hinder systems that single based on static analysis, purely dynamic analysis systems cannot detect all potential code execution paths. To address these issues, we propose CoDroid, a sequence-based hybrid Android malware detection method, which utilizes the sequences of static opcode and dynamic system call. We treat one sequence as a sentence in the natural language processing and construct a CNN–BiLSTM–Attention classifier which consists of Convolutional Neural Networks (CNNs), the Bidirectional Long Short-Term Memory (BiLSTM) with an attention language model. We extensively evaluate CoDroid under a real-world data set and perform comprehensive analysis against other existing related detection methods. The evaluations show the effectiveness and flexibility of CoDroid across a variety of experimental settings.

Original languageEnglish
Pages (from-to)5770-5784
Number of pages15
JournalInternational Journal of Intelligent Systems
Volume36
Issue number10
DOIs
Publication statusPublished - Oct 2021

Keywords

  • Android malware detection
  • attention
  • deep learning
  • hybrid analysis
  • machine learning
  • natural language processing
  • text classification

Fingerprint

Dive into the research topics of 'Hybrid sequence-based Android malware detection using natural language processing'. Together they form a unique fingerprint.

Cite this