Layer Frozen Multi-Net & Latent Space Feature-Concealed Backdoor Samples Detection

Jiawei Li, Senlin Luo, Limin Pan*, Chenlong Zhang, Zhao Zhang, Chuan Lu

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Identifying feature-concealed backdoor samples that entangle with benign semantics of target-class or possess dynamic triggers challenges backdoor attack detection. Existing methods focus on sample distribution differences in latent space of victim model. However, backdoor samples containing target-class discriminative semantics tend to produce distribution overlapped with those of benign samples, prone to missed detection. Moreover, these methods assume distribution of backdoor samples is continuous and clustered, which have difficulty in precisely fitting fractured distribution arising from dynamic-trigger backdoor samples, resulting in a high false detection of benign samples located between distribution sub-clusters. A Layer Frozen Multi-Net and Multi-Latent Space backdoor samples detection method LFMN-LS is proposed. Trigger-Net and Benign-Net are constructed by knowledge refinement towards victim model, which capture distributions of backdoor and benign samples separately, lessening the adverse effects of inter-class distributional overlap. Furthermore, Relative Cosine Distance is designed to jointly measure the distribution difference between backdoor and benign samples across multiple latent spaces, mitigating distributional fractures in single latent space. Experimental results demonstrate LFMN-LS outperforms state-of-the-art methods. LFMN-LS innovatively integrates model hidden layer frozen into knowledge refinement, adequately preserving high-order features of samples.

Original languageEnglish
Article number107497
JournalNeural Networks
Volume188
DOIs
Publication statusPublished - Aug 2025

Keywords

  • Backdoor attack
  • Backdoor samples detection
  • Deep learning
  • Security and reliability

Cite this