TY - GEN
T1 - Detecting malicious urls using a deep learning approach based on stacked denoising autoencoder
AU - Yan, Huaizhi
AU - Zhang, Xin
AU - Xie, Jiangwei
AU - Hu, Changzhen
N1 - Publisher Copyright:
© 2019, Springer Nature Singapore Pte Ltd.
PY - 2019
Y1 - 2019
N2 - As the source of spamming, phishing, malware and many more such attacks, malicious URL is a chronic and complicated problem on the Internet. Machine learning approaches have taken effect and obtained high accuracy in detecting malicious URL. But the tedious process of extracting features from URL and the high dimension of feature vector makes the implementing time consuming. This paper presents a deep learning method using Stacked denoising autoencoders model to learn and detect intrinsic malicious features. We employ an SdA network to analyze URLs and extract features automatically. Then a logistic regression is implemented to detect malicious and benign URLs, which can generate detection models without a manually feature engineering. We have implemented our network model using Keras, a high-level neural networks API with a Tensor-flow backend, an open source deep learning library. 5 datasets were used and 4 other method were compared with our model. In the result, our architecture achieves an accuracy of 98.25% and a micro-averaged F1 score of 0.98, tested on a mixed dataset containing around 2 million samples.
AB - As the source of spamming, phishing, malware and many more such attacks, malicious URL is a chronic and complicated problem on the Internet. Machine learning approaches have taken effect and obtained high accuracy in detecting malicious URL. But the tedious process of extracting features from URL and the high dimension of feature vector makes the implementing time consuming. This paper presents a deep learning method using Stacked denoising autoencoders model to learn and detect intrinsic malicious features. We employ an SdA network to analyze URLs and extract features automatically. Then a logistic regression is implemented to detect malicious and benign URLs, which can generate detection models without a manually feature engineering. We have implemented our network model using Keras, a high-level neural networks API with a Tensor-flow backend, an open source deep learning library. 5 datasets were used and 4 other method were compared with our model. In the result, our architecture achieves an accuracy of 98.25% and a micro-averaged F1 score of 0.98, tested on a mixed dataset containing around 2 million samples.
KW - Deep learning
KW - Malicious URL detection
KW - Network security
KW - Stacked denoising autoencoder
UR - http://www.scopus.com/inward/record.url?scp=85060258728&partnerID=8YFLogxK
U2 - 10.1007/978-981-13-5913-2_23
DO - 10.1007/978-981-13-5913-2_23
M3 - Conference contribution
AN - SCOPUS:85060258728
SN - 9789811359125
T3 - Communications in Computer and Information Science
SP - 372
EP - 388
BT - Trusted Computing and Information Security - 12th Chinese Conference, CTCIS 2018, Revised Selected Papers
A2 - Zhang, Huanguo
A2 - Zhao, Bo
A2 - Yan, Fei
PB - Springer Verlag
T2 - 12th Chinese Conference on Trusted Computing and Information Security, CTCIS 2018
Y2 - 18 October 2018 through 18 October 2018
ER -