TY - JOUR
T1 - Making models more secure
T2 - An efficient model stealing detection method
AU - Zhang, Chenlong
AU - Luo, Senlin
AU - Pan, Limin
AU - Lu, Chuan
AU - Zhang, Zhao
N1 - Publisher Copyright:
© 2024 Elsevier Ltd
PY - 2024/7
Y1 - 2024/7
N2 - Reduced distinguishability significantly challenges the detection of Model Stealing (MS). Existing methods for identifying MS attacks exhibit key limitations: (1) Sample-level detection methods that use fixed feature thresholds often inadvertently include benign samples or overlook malicious ones; (2) Distribution-level detection methods with static divergence benchmarks may misclassify benign query samples that deviate from these benchmarks This paper introduces GuardNet, an innovative model stealing detection method. By combining boundary features with inter-sample distance features, GuardNet more precisely identifies malicious sample pairs and employs distribution divergences to adjust decision thresholds, thus enhancing its detection capabilities. The method incorporates a variational autoencoder to reconstruct query samples and uses the Wasserstein distance between pre- and post-reconstruction samples as a measure of distribution divergences, effectively minimizing the influence of distribution shifts on benign query samples. Experimental results indicate that this approach significantly reduces the number of adversarial queries and markedly decreases false positives.
AB - Reduced distinguishability significantly challenges the detection of Model Stealing (MS). Existing methods for identifying MS attacks exhibit key limitations: (1) Sample-level detection methods that use fixed feature thresholds often inadvertently include benign samples or overlook malicious ones; (2) Distribution-level detection methods with static divergence benchmarks may misclassify benign query samples that deviate from these benchmarks This paper introduces GuardNet, an innovative model stealing detection method. By combining boundary features with inter-sample distance features, GuardNet more precisely identifies malicious sample pairs and employs distribution divergences to adjust decision thresholds, thus enhancing its detection capabilities. The method incorporates a variational autoencoder to reconstruct query samples and uses the Wasserstein distance between pre- and post-reconstruction samples as a measure of distribution divergences, effectively minimizing the influence of distribution shifts on benign query samples. Experimental results indicate that this approach significantly reduces the number of adversarial queries and markedly decreases false positives.
KW - Machine Learning as a Service
KW - Model stealing attack
KW - Model stealing detection
KW - Security and privacy
UR - http://www.scopus.com/inward/record.url?scp=85191940463&partnerID=8YFLogxK
U2 - 10.1016/j.compeleceng.2024.109266
DO - 10.1016/j.compeleceng.2024.109266
M3 - Article
AN - SCOPUS:85191940463
SN - 0045-7906
VL - 117
JO - Computers and Electrical Engineering
JF - Computers and Electrical Engineering
M1 - 109266
ER -