Detecting adversarial examples via prediction difference for deep neural networks

Feng Guo; Qingjie Zhao; Xuan Li; Xiaohui Kuang; Jianwei Zhang; Yahong Han; Yu an Tan

doi:10.1016/j.ins.2019.05.084

Detecting adversarial examples via prediction difference for deep neural networks

Feng Guo, Qingjie Zhao^*, Xuan Li, Xiaohui Kuang, Jianwei Zhang, Yahong Han, Yu an Tan

^*此作品的通讯作者

科研成果: 期刊稿件 › 文章 › 同行评审

42 引用（Scopus）

摘要

Deep neural networks (DNNs) perform effectively in many computer vision tasks. However, DNNs are found to be vulnerable to adversarial examples which are generated by adding imperceptible perturbations to original images. To address this problem, we propose a novel defense method, transferability prediction difference (TPD), to drastically improve the adversarial robustness of DNNs with small sacrificing verified accuracy. We find out that the adversarial examples have lager prediction difference for various DNN models due to their various complicated decision boundaries, which can be used to identify the adversarial examples by converging decision boundaries to a prediction difference threshold. We adopt the K-means clustering algorithm on benign data to determine transferability prediction difference threshold, by which we can detect adversarial examples accurately and efficiently. Furthermore, TPD method neither modifies the target model nor needs to take knowledge of adversarial attacks. We perform four state-of-the-art adversarial attacks (FGSM, BIM, JSMA and C&W) to evaluate TPD models trained on MNIST and CIFAR-10 and the average detection accuracy is 96.74% and 86.61%. The results show that TPD model has high detection ratio on the demonstrably advanced white-box adversarial examples while keeping low false positive rate on benign examples.

源语言	英语
页（从-至）	182-192
页数	11
期刊	Information Sciences
卷	501
DOI	https://doi.org/10.1016/j.ins.2019.05.084
出版状态	已出版 - 10月 2019

访问文件

10.1016/j.ins.2019.05.084

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{cd6dd75e24eb4cf9b06abeee97d7ef24,

title = "Detecting adversarial examples via prediction difference for deep neural networks",

abstract = "Deep neural networks (DNNs) perform effectively in many computer vision tasks. However, DNNs are found to be vulnerable to adversarial examples which are generated by adding imperceptible perturbations to original images. To address this problem, we propose a novel defense method, transferability prediction difference (TPD), to drastically improve the adversarial robustness of DNNs with small sacrificing verified accuracy. We find out that the adversarial examples have lager prediction difference for various DNN models due to their various complicated decision boundaries, which can be used to identify the adversarial examples by converging decision boundaries to a prediction difference threshold. We adopt the K-means clustering algorithm on benign data to determine transferability prediction difference threshold, by which we can detect adversarial examples accurately and efficiently. Furthermore, TPD method neither modifies the target model nor needs to take knowledge of adversarial attacks. We perform four state-of-the-art adversarial attacks (FGSM, BIM, JSMA and C&W) to evaluate TPD models trained on MNIST and CIFAR-10 and the average detection accuracy is 96.74% and 86.61%. The results show that TPD model has high detection ratio on the demonstrably advanced white-box adversarial examples while keeping low false positive rate on benign examples.",

keywords = "Adversarial example, Deep neural network, Image recognition, Prediction difference",

author = "Feng Guo and Qingjie Zhao and Xuan Li and Xiaohui Kuang and Jianwei Zhang and Yahong Han and Tan, {Yu an}",

note = "Publisher Copyright: {\textcopyright} 2019 Elsevier Inc.",

year = "2019",

month = oct,

doi = "10.1016/j.ins.2019.05.084",

language = "English",

volume = "501",

pages = "182--192",

journal = "Information Sciences",

issn = "0020-0255",

publisher = "Elsevier Inc.",

}

TY - JOUR

T1 - Detecting adversarial examples via prediction difference for deep neural networks

AU - Guo, Feng

AU - Zhao, Qingjie

AU - Li, Xuan

AU - Kuang, Xiaohui

AU - Zhang, Jianwei

AU - Han, Yahong

AU - Tan, Yu an

PY - 2019/10

Y1 - 2019/10

N2 - Deep neural networks (DNNs) perform effectively in many computer vision tasks. However, DNNs are found to be vulnerable to adversarial examples which are generated by adding imperceptible perturbations to original images. To address this problem, we propose a novel defense method, transferability prediction difference (TPD), to drastically improve the adversarial robustness of DNNs with small sacrificing verified accuracy. We find out that the adversarial examples have lager prediction difference for various DNN models due to their various complicated decision boundaries, which can be used to identify the adversarial examples by converging decision boundaries to a prediction difference threshold. We adopt the K-means clustering algorithm on benign data to determine transferability prediction difference threshold, by which we can detect adversarial examples accurately and efficiently. Furthermore, TPD method neither modifies the target model nor needs to take knowledge of adversarial attacks. We perform four state-of-the-art adversarial attacks (FGSM, BIM, JSMA and C&W) to evaluate TPD models trained on MNIST and CIFAR-10 and the average detection accuracy is 96.74% and 86.61%. The results show that TPD model has high detection ratio on the demonstrably advanced white-box adversarial examples while keeping low false positive rate on benign examples.

AB - Deep neural networks (DNNs) perform effectively in many computer vision tasks. However, DNNs are found to be vulnerable to adversarial examples which are generated by adding imperceptible perturbations to original images. To address this problem, we propose a novel defense method, transferability prediction difference (TPD), to drastically improve the adversarial robustness of DNNs with small sacrificing verified accuracy. We find out that the adversarial examples have lager prediction difference for various DNN models due to their various complicated decision boundaries, which can be used to identify the adversarial examples by converging decision boundaries to a prediction difference threshold. We adopt the K-means clustering algorithm on benign data to determine transferability prediction difference threshold, by which we can detect adversarial examples accurately and efficiently. Furthermore, TPD method neither modifies the target model nor needs to take knowledge of adversarial attacks. We perform four state-of-the-art adversarial attacks (FGSM, BIM, JSMA and C&W) to evaluate TPD models trained on MNIST and CIFAR-10 and the average detection accuracy is 96.74% and 86.61%. The results show that TPD model has high detection ratio on the demonstrably advanced white-box adversarial examples while keeping low false positive rate on benign examples.

KW - Adversarial example

KW - Deep neural network

KW - Image recognition

KW - Prediction difference

UR - http://www.scopus.com/inward/record.url?scp=85066954252&partnerID=8YFLogxK

U2 - 10.1016/j.ins.2019.05.084

DO - 10.1016/j.ins.2019.05.084

M3 - Article

AN - SCOPUS:85066954252

SN - 0020-0255

VL - 501

SP - 182

EP - 192

JO - Information Sciences

JF - Information Sciences

ER -

Detecting adversarial examples via prediction difference for deep neural networks

摘要

访问文件

其它文件与链接

指纹

引用此