Defense Against Textual Backdoors via Elastic Weighted Consolidation-Based Machine Unlearning

Haojun Xuan, Yajie Wang, Huishu Wu, Tao Liu*, Chuan Zhang, Liehuang Zhu

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Backdoor attacks pose significant threats to Natural Language Processing (NLP) models. Various backdoor defense methods for NLP models primarily function by identifying and subsequently manipulating backdoor triggers within provided samples. However, such methods predominantly operate at the level of data filtering, essentially failing to cleanse the affected model. To solve this problem, we present ELUDE—a groundbreaking method designed to excise the backdoor triggers embedded within the corrupted model. ELUDE’s architecture comprises two core components: the backdoor trigger identifier and the backdoor trigger remover, operating synergistically in a pipeline procedure. While the former employs a perplexity-based approach to locate the backdoor trigger, the latter eradicates the inserted backdoor’s influence on the tainted model using machine unlearning. To counteract the issue of catastrophic forgetting engendered by machine unlearning, we incorporate Elastic Weight Consolidation (EWC) within the backdoor trigger remover. Our experiments on SST-2, OLID, and AG News text classification datasets exemplify the efficacy of ELUDE, as comparative results indicate that ELUDE effectively reduces the success rate of three cutting-edge backdoor attack methods by an average of 60%—simultaneously maintaining comparable performance on the original task.

Original languageEnglish
Title of host publicationAlgorithms and Architectures for Parallel Processing - 24th International Conference, ICA3PP 2024, Macau, China, October 29–31, 2024, Proceedings
EditorsTianqing Zhu, Jin Li, Aniello Castiglione
PublisherSpringer Science and Business Media Deutschland GmbH
Pages108-121
Number of pages14
ISBN (Print)9789819615506
DOIs
Publication statusPublished - 2025
Event24th International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP 2024 - Macau, China
Duration: 29 Oct 202431 Oct 2024

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume15256 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference24th International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP 2024
Country/TerritoryChina
CityMacau
Period29/10/2431/10/24

Keywords

  • Backdoor Defense
  • Machine Unlearning
  • Natural Language Processing

Cite this