TY - GEN
T1 - Defense Against Textual Backdoors via Elastic Weighted Consolidation-Based Machine Unlearning
AU - Xuan, Haojun
AU - Wang, Yajie
AU - Wu, Huishu
AU - Liu, Tao
AU - Zhang, Chuan
AU - Zhu, Liehuang
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
PY - 2025
Y1 - 2025
N2 - Backdoor attacks pose significant threats to Natural Language Processing (NLP) models. Various backdoor defense methods for NLP models primarily function by identifying and subsequently manipulating backdoor triggers within provided samples. However, such methods predominantly operate at the level of data filtering, essentially failing to cleanse the affected model. To solve this problem, we present ELUDE—a groundbreaking method designed to excise the backdoor triggers embedded within the corrupted model. ELUDE’s architecture comprises two core components: the backdoor trigger identifier and the backdoor trigger remover, operating synergistically in a pipeline procedure. While the former employs a perplexity-based approach to locate the backdoor trigger, the latter eradicates the inserted backdoor’s influence on the tainted model using machine unlearning. To counteract the issue of catastrophic forgetting engendered by machine unlearning, we incorporate Elastic Weight Consolidation (EWC) within the backdoor trigger remover. Our experiments on SST-2, OLID, and AG News text classification datasets exemplify the efficacy of ELUDE, as comparative results indicate that ELUDE effectively reduces the success rate of three cutting-edge backdoor attack methods by an average of 60%—simultaneously maintaining comparable performance on the original task.
AB - Backdoor attacks pose significant threats to Natural Language Processing (NLP) models. Various backdoor defense methods for NLP models primarily function by identifying and subsequently manipulating backdoor triggers within provided samples. However, such methods predominantly operate at the level of data filtering, essentially failing to cleanse the affected model. To solve this problem, we present ELUDE—a groundbreaking method designed to excise the backdoor triggers embedded within the corrupted model. ELUDE’s architecture comprises two core components: the backdoor trigger identifier and the backdoor trigger remover, operating synergistically in a pipeline procedure. While the former employs a perplexity-based approach to locate the backdoor trigger, the latter eradicates the inserted backdoor’s influence on the tainted model using machine unlearning. To counteract the issue of catastrophic forgetting engendered by machine unlearning, we incorporate Elastic Weight Consolidation (EWC) within the backdoor trigger remover. Our experiments on SST-2, OLID, and AG News text classification datasets exemplify the efficacy of ELUDE, as comparative results indicate that ELUDE effectively reduces the success rate of three cutting-edge backdoor attack methods by an average of 60%—simultaneously maintaining comparable performance on the original task.
KW - Backdoor Defense
KW - Machine Unlearning
KW - Natural Language Processing
UR - http://www.scopus.com/inward/record.url?scp=85218920695&partnerID=8YFLogxK
U2 - 10.1007/978-981-96-1551-3_9
DO - 10.1007/978-981-96-1551-3_9
M3 - Conference contribution
AN - SCOPUS:85218920695
SN - 9789819615506
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 108
EP - 121
BT - Algorithms and Architectures for Parallel Processing - 24th International Conference, ICA3PP 2024, Macau, China, October 29–31, 2024, Proceedings
A2 - Zhu, Tianqing
A2 - Li, Jin
A2 - Castiglione, Aniello
PB - Springer Science and Business Media Deutschland GmbH
T2 - 24th International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP 2024
Y2 - 29 October 2024 through 31 October 2024
ER -