Detecting malicious urls using a deep learning approach based on stacked denoising autoencoder

Huaizhi Yan*, Xin Zhang, Jiangwei Xie, Changzhen Hu

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Citations (Scopus)

Abstract

As the source of spamming, phishing, malware and many more such attacks, malicious URL is a chronic and complicated problem on the Internet. Machine learning approaches have taken effect and obtained high accuracy in detecting malicious URL. But the tedious process of extracting features from URL and the high dimension of feature vector makes the implementing time consuming. This paper presents a deep learning method using Stacked denoising autoencoders model to learn and detect intrinsic malicious features. We employ an SdA network to analyze URLs and extract features automatically. Then a logistic regression is implemented to detect malicious and benign URLs, which can generate detection models without a manually feature engineering. We have implemented our network model using Keras, a high-level neural networks API with a Tensor-flow backend, an open source deep learning library. 5 datasets were used and 4 other method were compared with our model. In the result, our architecture achieves an accuracy of 98.25% and a micro-averaged F1 score of 0.98, tested on a mixed dataset containing around 2 million samples.

Original languageEnglish
Title of host publicationTrusted Computing and Information Security - 12th Chinese Conference, CTCIS 2018, Revised Selected Papers
EditorsHuanguo Zhang, Bo Zhao, Fei Yan
PublisherSpringer Verlag
Pages372-388
Number of pages17
ISBN (Print)9789811359125
DOIs
Publication statusPublished - 2019
Event12th Chinese Conference on Trusted Computing and Information Security, CTCIS 2018 - Wuhan, China
Duration: 18 Oct 201818 Oct 2018

Publication series

NameCommunications in Computer and Information Science
Volume960
ISSN (Print)1865-0929

Conference

Conference12th Chinese Conference on Trusted Computing and Information Security, CTCIS 2018
Country/TerritoryChina
CityWuhan
Period18/10/1818/10/18

Keywords

  • Deep learning
  • Malicious URL detection
  • Network security
  • Stacked denoising autoencoder

Fingerprint

Dive into the research topics of 'Detecting malicious urls using a deep learning approach based on stacked denoising autoencoder'. Together they form a unique fingerprint.

Cite this