TY - JOUR
T1 - A new model stealing defense based on DNN retraining for decision boundary protection
AU - Zhang, Chenlong
AU - Luo, Senlin
AU - Pan, Limin
AU - Gu, Dujuan
AU - Yuan, Jun
N1 - Publisher Copyright:
© 2026 Elsevier B.V.
PY - 2026/9/7
Y1 - 2026/9/7
N2 - Deep neural networks (DNNs) are vulnerable to input transformations, posing challenges in thwarting model stealing attacks. Existing methods predominantly analyze the distribution differences of attack samples; however, those based on decision boundary approximation often mimic the distributions of benign samples, thereby circumventing defenses. Furthermore, the addition of deceptive perturbations to the output posterior by complex defense processing modules external to the victim model increases both computational costs and processing latency. In response, this paper proposes a novel training technique named PDB (Protecting Decision Boundaries) that robustly counters model stealing without relying on presumptions about the distribution of attack samples. Instead, PDB secures the primary targets of these attacks— the decision boundaries. It integrates an input gradient penalty into the loss function to displace the decision boundaries away from benign samples. To further enhance protection, samples near these boundaries—referred to as transition samples—are explicitly recategorized into a new, dedicated class. This recategorization is implemented by adding a corresponding neuron to the output layer, thereby fortifying the defense mechanism. Crucially, PDB discards the requirement for complex defense processing modules by employing straightforward mechanisms such as normal prediction processes and selective label flipping for a minimal number of cases. Experimental evidence confirms that PDB surpasses leading methods and marks a pioneering advance in safeguarding decision boundaries against potential breaches.
AB - Deep neural networks (DNNs) are vulnerable to input transformations, posing challenges in thwarting model stealing attacks. Existing methods predominantly analyze the distribution differences of attack samples; however, those based on decision boundary approximation often mimic the distributions of benign samples, thereby circumventing defenses. Furthermore, the addition of deceptive perturbations to the output posterior by complex defense processing modules external to the victim model increases both computational costs and processing latency. In response, this paper proposes a novel training technique named PDB (Protecting Decision Boundaries) that robustly counters model stealing without relying on presumptions about the distribution of attack samples. Instead, PDB secures the primary targets of these attacks— the decision boundaries. It integrates an input gradient penalty into the loss function to displace the decision boundaries away from benign samples. To further enhance protection, samples near these boundaries—referred to as transition samples—are explicitly recategorized into a new, dedicated class. This recategorization is implemented by adding a corresponding neuron to the output layer, thereby fortifying the defense mechanism. Crucially, PDB discards the requirement for complex defense processing modules by employing straightforward mechanisms such as normal prediction processes and selective label flipping for a minimal number of cases. Experimental evidence confirms that PDB surpasses leading methods and marks a pioneering advance in safeguarding decision boundaries against potential breaches.
KW - Deep neural networks training method
KW - Model stealing attack
KW - Model stealing defense
KW - Security and privacy
UR - https://www.scopus.com/pages/publications/105037917861
U2 - 10.1016/j.neucom.2026.133816
DO - 10.1016/j.neucom.2026.133816
M3 - Article
AN - SCOPUS:105037917861
SN - 0925-2312
VL - 693
JO - Neurocomputing
JF - Neurocomputing
M1 - 133816
ER -