Abstract
Identifying feature-concealed backdoor samples that entangle with benign semantics of target-class or possess dynamic triggers challenges backdoor attack detection. Existing methods focus on sample distribution differences in latent space of victim model. However, backdoor samples containing target-class discriminative semantics tend to produce distribution overlapped with those of benign samples, prone to missed detection. Moreover, these methods assume distribution of backdoor samples is continuous and clustered, which have difficulty in precisely fitting fractured distribution arising from dynamic-trigger backdoor samples, resulting in a high false detection of benign samples located between distribution sub-clusters. A Layer Frozen Multi-Net and Multi-Latent Space backdoor samples detection method LFMN-LS is proposed. Trigger-Net and Benign-Net are constructed by knowledge refinement towards victim model, which capture distributions of backdoor and benign samples separately, lessening the adverse effects of inter-class distributional overlap. Furthermore, Relative Cosine Distance is designed to jointly measure the distribution difference between backdoor and benign samples across multiple latent spaces, mitigating distributional fractures in single latent space. Experimental results demonstrate LFMN-LS outperforms state-of-the-art methods. LFMN-LS innovatively integrates model hidden layer frozen into knowledge refinement, adequately preserving high-order features of samples.
Original language | English |
---|---|
Article number | 107497 |
Journal | Neural Networks |
Volume | 188 |
DOIs | |
Publication status | Published - Aug 2025 |
Keywords
- Backdoor attack
- Backdoor samples detection
- Deep learning
- Security and reliability