Detecting Adversarial Data by Probing Multiple Perturbations Using Expected Perturbation Score

Shuhai Zhang, Feng Liu, Jiahao Yang, Yifan Yang, Changsheng Li*, Bo Han*, Mingkui Tan*

*此作品的通讯作者

科研成果: 期刊稿件会议文章同行评审

5 引用 (Scopus)

摘要

Adversarial detection aims to determine whether a given sample is an adversarial one based on the discrepancy between natural and adversarial distributions. Unfortunately, estimating or comparing two data distributions is extremely difficult, especially in high-dimension spaces. Recently, the gradient of log probability density (a.k.a., score) w.r.t. the sample is used as an alternative statistic to compute. However, we find that the score is sensitive in identifying adversarial samples due to insufficient information with one sample only. In this paper, we propose a new statistic called expected perturbation score (EPS), which is essentially the expected score of a sample after various perturbations. Specifically, to obtain adequate information regarding one sample, we perturb it by adding various noises to capture its multi-view observations. We theoretically prove that EPS is a proper statistic to compute the discrepancy between two samples under mild conditions. In practice, we can use a pre-trained diffusion model to estimate EPS for each sample. Last, we propose an EPS-based adversarial detection (EPS-AD) method, in which we develop EPS-based maximum mean discrepancy (MMD) as a metric to measure the discrepancy between the test sample and natural samples. We also prove that the EPS-based MMD between natural and adversarial samples is larger than that among natural samples. Extensive experiments show the superior adversarial detection performance of our EPS-AD.

源语言英语
页(从-至)41429-41451
页数23
期刊Proceedings of Machine Learning Research
202
出版状态已出版 - 2023
活动40th International Conference on Machine Learning, ICML 2023 - Honolulu, 美国
期限: 23 7月 202329 7月 2023

指纹

探究 'Detecting Adversarial Data by Probing Multiple Perturbations Using Expected Perturbation Score' 的科研主题。它们共同构成独一无二的指纹。

引用此