Detecting malicious urls using a deep learning approach based on stacked denoising autoencoder

Huaizhi Yan; Xin Zhang; Jiangwei Xie; Changzhen Hu

doi:10.1007/978-981-13-5913-2_23

Detecting malicious urls using a deep learning approach based on stacked denoising autoencoder

Huaizhi Yan^*, Xin Zhang, Jiangwei Xie, Changzhen Hu

^*此作品的通讯作者

Beijing Institute of Technology

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

5 引用（Scopus）

摘要

As the source of spamming, phishing, malware and many more such attacks, malicious URL is a chronic and complicated problem on the Internet. Machine learning approaches have taken effect and obtained high accuracy in detecting malicious URL. But the tedious process of extracting features from URL and the high dimension of feature vector makes the implementing time consuming. This paper presents a deep learning method using Stacked denoising autoencoders model to learn and detect intrinsic malicious features. We employ an SdA network to analyze URLs and extract features automatically. Then a logistic regression is implemented to detect malicious and benign URLs, which can generate detection models without a manually feature engineering. We have implemented our network model using Keras, a high-level neural networks API with a Tensor-flow backend, an open source deep learning library. 5 datasets were used and 4 other method were compared with our model. In the result, our architecture achieves an accuracy of 98.25% and a micro-averaged F1 score of 0.98, tested on a mixed dataset containing around 2 million samples.

源语言	英语
主期刊名	Trusted Computing and Information Security - 12th Chinese Conference, CTCIS 2018, Revised Selected Papers
编辑	Huanguo Zhang, Bo Zhao, Fei Yan
出版商	Springer Verlag
页	372-388
页数	17
ISBN（印刷版）	9789811359125
DOI	https://doi.org/10.1007/978-981-13-5913-2_23
出版状态	已出版 - 2019
活动	12th Chinese Conference on Trusted Computing and Information Security, CTCIS 2018 - Wuhan, 中国期限: 18 10月 2018 → 18 10月 2018

出版系列

姓名	Communications in Computer and Information Science
卷	960
ISSN（印刷版）	1865-0929

会议

会议	12th Chinese Conference on Trusted Computing and Information Security, CTCIS 2018
国家/地区	中国
市	Wuhan
时期	18/10/18 → 18/10/18

访问文件

10.1007/978-981-13-5913-2_23

其它文件与链接

链接到 Scopus 的出版物

引用此

Yan, H., Zhang, X., Xie, J., & Hu, C. (2019). Detecting malicious urls using a deep learning approach based on stacked denoising autoencoder. 在 H. Zhang, B. Zhao, & F. Yan (编辑), Trusted Computing and Information Security - 12th Chinese Conference, CTCIS 2018, Revised Selected Papers (页码 372-388). (Communications in Computer and Information Science; 卷 960). Springer Verlag. https://doi.org/10.1007/978-981-13-5913-2_23

Yan, Huaizhi ; Zhang, Xin ; Xie, Jiangwei 等. / Detecting malicious urls using a deep learning approach based on stacked denoising autoencoder. Trusted Computing and Information Security - 12th Chinese Conference, CTCIS 2018, Revised Selected Papers. 编辑 / Huanguo Zhang ; Bo Zhao ; Fei Yan. Springer Verlag, 2019. 页码 372-388 (Communications in Computer and Information Science).

@inproceedings{8e229d72e57d4e4397b77c694df8501f,

title = "Detecting malicious urls using a deep learning approach based on stacked denoising autoencoder",

abstract = "As the source of spamming, phishing, malware and many more such attacks, malicious URL is a chronic and complicated problem on the Internet. Machine learning approaches have taken effect and obtained high accuracy in detecting malicious URL. But the tedious process of extracting features from URL and the high dimension of feature vector makes the implementing time consuming. This paper presents a deep learning method using Stacked denoising autoencoders model to learn and detect intrinsic malicious features. We employ an SdA network to analyze URLs and extract features automatically. Then a logistic regression is implemented to detect malicious and benign URLs, which can generate detection models without a manually feature engineering. We have implemented our network model using Keras, a high-level neural networks API with a Tensor-flow backend, an open source deep learning library. 5 datasets were used and 4 other method were compared with our model. In the result, our architecture achieves an accuracy of 98.25% and a micro-averaged F1 score of 0.98, tested on a mixed dataset containing around 2 million samples.",

keywords = "Deep learning, Malicious URL detection, Network security, Stacked denoising autoencoder",

author = "Huaizhi Yan and Xin Zhang and Jiangwei Xie and Changzhen Hu",

note = "Publisher Copyright: {\textcopyright} 2019, Springer Nature Singapore Pte Ltd.; 12th Chinese Conference on Trusted Computing and Information Security, CTCIS 2018 ; Conference date: 18-10-2018 Through 18-10-2018",

year = "2019",

doi = "10.1007/978-981-13-5913-2_23",

language = "English",

isbn = "9789811359125",

series = "Communications in Computer and Information Science",

publisher = "Springer Verlag",

pages = "372--388",

editor = "Huanguo Zhang and Bo Zhao and Fei Yan",

booktitle = "Trusted Computing and Information Security - 12th Chinese Conference, CTCIS 2018, Revised Selected Papers",

address = "Germany",

}

Yan, H, Zhang, X, Xie, J & Hu, C 2019, Detecting malicious urls using a deep learning approach based on stacked denoising autoencoder. 在 H Zhang, B Zhao & F Yan (编辑), Trusted Computing and Information Security - 12th Chinese Conference, CTCIS 2018, Revised Selected Papers. Communications in Computer and Information Science, 卷 960, Springer Verlag, 页码 372-388, 12th Chinese Conference on Trusted Computing and Information Security, CTCIS 2018, Wuhan, 中国, 18/10/18. https://doi.org/10.1007/978-981-13-5913-2_23

Detecting malicious urls using a deep learning approach based on stacked denoising autoencoder. / Yan, Huaizhi; Zhang, Xin; Xie, Jiangwei 等.
Trusted Computing and Information Security - 12th Chinese Conference, CTCIS 2018, Revised Selected Papers. 编辑 / Huanguo Zhang; Bo Zhao; Fei Yan. Springer Verlag, 2019. 页码 372-388 (Communications in Computer and Information Science; 卷 960).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Detecting malicious urls using a deep learning approach based on stacked denoising autoencoder

AU - Yan, Huaizhi

AU - Zhang, Xin

AU - Xie, Jiangwei

AU - Hu, Changzhen

PY - 2019

Y1 - 2019

N2 - As the source of spamming, phishing, malware and many more such attacks, malicious URL is a chronic and complicated problem on the Internet. Machine learning approaches have taken effect and obtained high accuracy in detecting malicious URL. But the tedious process of extracting features from URL and the high dimension of feature vector makes the implementing time consuming. This paper presents a deep learning method using Stacked denoising autoencoders model to learn and detect intrinsic malicious features. We employ an SdA network to analyze URLs and extract features automatically. Then a logistic regression is implemented to detect malicious and benign URLs, which can generate detection models without a manually feature engineering. We have implemented our network model using Keras, a high-level neural networks API with a Tensor-flow backend, an open source deep learning library. 5 datasets were used and 4 other method were compared with our model. In the result, our architecture achieves an accuracy of 98.25% and a micro-averaged F1 score of 0.98, tested on a mixed dataset containing around 2 million samples.

AB - As the source of spamming, phishing, malware and many more such attacks, malicious URL is a chronic and complicated problem on the Internet. Machine learning approaches have taken effect and obtained high accuracy in detecting malicious URL. But the tedious process of extracting features from URL and the high dimension of feature vector makes the implementing time consuming. This paper presents a deep learning method using Stacked denoising autoencoders model to learn and detect intrinsic malicious features. We employ an SdA network to analyze URLs and extract features automatically. Then a logistic regression is implemented to detect malicious and benign URLs, which can generate detection models without a manually feature engineering. We have implemented our network model using Keras, a high-level neural networks API with a Tensor-flow backend, an open source deep learning library. 5 datasets were used and 4 other method were compared with our model. In the result, our architecture achieves an accuracy of 98.25% and a micro-averaged F1 score of 0.98, tested on a mixed dataset containing around 2 million samples.

KW - Deep learning

KW - Malicious URL detection

KW - Network security

KW - Stacked denoising autoencoder

UR - http://www.scopus.com/inward/record.url?scp=85060258728&partnerID=8YFLogxK

U2 - 10.1007/978-981-13-5913-2_23

DO - 10.1007/978-981-13-5913-2_23

M3 - Conference contribution

AN - SCOPUS:85060258728

SN - 9789811359125

T3 - Communications in Computer and Information Science

SP - 372

EP - 388

BT - Trusted Computing and Information Security - 12th Chinese Conference, CTCIS 2018, Revised Selected Papers

A2 - Zhang, Huanguo

A2 - Zhao, Bo

A2 - Yan, Fei

PB - Springer Verlag

T2 - 12th Chinese Conference on Trusted Computing and Information Security, CTCIS 2018

Y2 - 18 October 2018 through 18 October 2018

ER -

Yan H, Zhang X, Xie J, Hu C. Detecting malicious urls using a deep learning approach based on stacked denoising autoencoder. 在 Zhang H, Zhao B, Yan F, 编辑, Trusted Computing and Information Security - 12th Chinese Conference, CTCIS 2018, Revised Selected Papers. Springer Verlag. 2019. 页码 372-388. (Communications in Computer and Information Science). doi: 10.1007/978-981-13-5913-2_23

Detecting malicious urls using a deep learning approach based on stacked denoising autoencoder

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此