Abstract
Reduced distinguishability significantly challenges the detection of Model Stealing (MS). Existing methods for identifying MS attacks exhibit key limitations: (1) Sample-level detection methods that use fixed feature thresholds often inadvertently include benign samples or overlook malicious ones; (2) Distribution-level detection methods with static divergence benchmarks may misclassify benign query samples that deviate from these benchmarks This paper introduces GuardNet, an innovative model stealing detection method. By combining boundary features with inter-sample distance features, GuardNet more precisely identifies malicious sample pairs and employs distribution divergences to adjust decision thresholds, thus enhancing its detection capabilities. The method incorporates a variational autoencoder to reconstruct query samples and uses the Wasserstein distance between pre- and post-reconstruction samples as a measure of distribution divergences, effectively minimizing the influence of distribution shifts on benign query samples. Experimental results indicate that this approach significantly reduces the number of adversarial queries and markedly decreases false positives.
| Original language | English |
|---|---|
| Article number | 109266 |
| Journal | Computers and Electrical Engineering |
| Volume | 117 |
| DOIs | |
| Publication status | Published - Jul 2024 |
Keywords
- Machine Learning as a Service
- Model stealing attack
- Model stealing detection
- Security and privacy
Fingerprint
Dive into the research topics of 'Making models more secure: An efficient model stealing detection method'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver