联邦学习系统中针对大群体后门客户端的鲁棒聚合算法

Yong Kang Wang; Di Hua Zhai; Yuan Qing Xia

doi:10.11897/SP.J.1016.2023.01302

联邦学习系统中针对大群体后门客户端的鲁棒聚合算法

Yong Kang Wang, Di Hua Zhai^*, Yuan Qing Xia

^*此作品的通讯作者

自动化学院

Beijing Institute of Technology

科研成果: 期刊稿件 › 文章 › 同行评审

摘要

With the explosion of data and concerns about privacy among businesses and individuals, traditional centralized machine learning is no longer able to satisfy the existing needs. Federated learning (FL) is a burgeoning distributed machine learning framework, in which multiple diverse clients collaboratively train a global model without sharing the private data, so as to solve the data silos and privacy problems. However, existing studies have demonstrated that FL is extremely vulnerable to all kinds of attacks due to its distributed and privacy-preserving inherent characteristics. Backdoor attack is one of the most prominent attacks in the FL system. To defend against the backdoor attacks in the FL system, a large number of algorithms robust aggregation algorithms are proposed. Nevertheless, these robust aggregation algorithms are restricted by some strong assumptions, such as the number of malicious clients and the data distribution across the diverse clients. Our study shows that the existing robust aggregation algorithms fully failed under a large group of malicious backdoor clients or non-independently identically distributed ( Non-llD) scenarios. To address this problem, we propose a robust aggregation algorithm called Poly which contains two crucial components: one component uses similarity matrix and clustering algorithm to handle the gradients of all clients; another component selects the optimal clusters containing benign clients to aggregate the global model based on the cosine similarity metric. Our proposed Poly can completely remove all malicious backdoor clients in the aggregation process, thereby avoiding the backdoor inserting into the global model. To test the effectiveness of defending against backdoor attack of our proposed Poly, we leverage MN1ST, Fashion-MNIST, CIFAR-10 and Reddit datasets to conduct a series of experiments under both data imbalance and class imbalance NonTID scenarios, as well as the independently identically distributed scenario. In addition to this, we also consider a large group of malicious backdoor clients scenario in which the number of malicious backdoor clients ranges from 50% to 90% with a step 10%, as well as the scenario where the number of malicious backdoor clients is less than that of benign clients. Our experimental results indicate that our proposed Poly outperforms the existing robust aggregation algorithms, and can also effectively defend against backdoor attacks with only about 1 % attack success rate (even 0 % attack success rate in some scenarios) under the testing scenarios, even under the data imbalance and class imbalance NonTID scenarios and a large group of malicious backdoor clients scenario. Beyond that, our proposed Poly can also achieve satisfying primary task accuracy, which indicates that our algorithm Poly does not affect the performance on the primary task that we care about while defending against the backdoor attack. By contrast, the existing robust aggregation algorithms can hardly defend against the backdoor attack under Non-IID scenarios and a large group of malicious backdoor clients, achieving nearly 100% attack success rate.

投稿的翻译标题	A Robust Aggregated Algorithms against a Large Group Backdoor Clients in Federated Learning System
源语言	繁体中文
页（从-至）	1302-1314
页数	13
期刊	Jisuanji Xuebao/Chinese Journal of Computers
卷	46
期	6
DOI	https://doi.org/10.11897/SP.J.1016.2023.01302
出版状态	已出版 - 6月 2023

关键词

backdoor attacks
clustering
federated learning
heterogeneous
robust

访问文件

10.11897/SP.J.1016.2023.01302

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{db11b48a036949ffbcbd83cd84b0af64,

title = "联邦学习系统中针对大群体后门客户端的鲁棒聚合算法",

abstract = "With the explosion of data and concerns about privacy among businesses and individuals, traditional centralized machine learning is no longer able to satisfy the existing needs. Federated learning (FL) is a burgeoning distributed machine learning framework, in which multiple diverse clients collaboratively train a global model without sharing the private data, so as to solve the data silos and privacy problems. However, existing studies have demonstrated that FL is extremely vulnerable to all kinds of attacks due to its distributed and privacy-preserving inherent characteristics. Backdoor attack is one of the most prominent attacks in the FL system. To defend against the backdoor attacks in the FL system, a large number of algorithms robust aggregation algorithms are proposed. Nevertheless, these robust aggregation algorithms are restricted by some strong assumptions, such as the number of malicious clients and the data distribution across the diverse clients. Our study shows that the existing robust aggregation algorithms fully failed under a large group of malicious backdoor clients or non-independently identically distributed ( Non-llD) scenarios. To address this problem, we propose a robust aggregation algorithm called Poly which contains two crucial components: one component uses similarity matrix and clustering algorithm to handle the gradients of all clients; another component selects the optimal clusters containing benign clients to aggregate the global model based on the cosine similarity metric. Our proposed Poly can completely remove all malicious backdoor clients in the aggregation process, thereby avoiding the backdoor inserting into the global model. To test the effectiveness of defending against backdoor attack of our proposed Poly, we leverage MN1ST, Fashion-MNIST, CIFAR-10 and Reddit datasets to conduct a series of experiments under both data imbalance and class imbalance NonTID scenarios, as well as the independently identically distributed scenario. In addition to this, we also consider a large group of malicious backdoor clients scenario in which the number of malicious backdoor clients ranges from 50% to 90% with a step 10%, as well as the scenario where the number of malicious backdoor clients is less than that of benign clients. Our experimental results indicate that our proposed Poly outperforms the existing robust aggregation algorithms, and can also effectively defend against backdoor attacks with only about 1 % attack success rate (even 0 % attack success rate in some scenarios) under the testing scenarios, even under the data imbalance and class imbalance NonTID scenarios and a large group of malicious backdoor clients scenario. Beyond that, our proposed Poly can also achieve satisfying primary task accuracy, which indicates that our algorithm Poly does not affect the performance on the primary task that we care about while defending against the backdoor attack. By contrast, the existing robust aggregation algorithms can hardly defend against the backdoor attack under Non-IID scenarios and a large group of malicious backdoor clients, achieving nearly 100% attack success rate.",

keywords = "backdoor attacks, clustering, federated learning, heterogeneous, robust",

author = "Wang, {Yong Kang} and Zhai, {Di Hua} and Xia, {Yuan Qing}",

year = "2023",

month = jun,

doi = "10.11897/SP.J.1016.2023.01302",

language = "繁体中文",

volume = "46",

pages = "1302--1314",

journal = "Jisuanji Xuebao/Chinese Journal of Computers",

issn = "0254-4164",

publisher = "Science Press",

number = "6",

}

TY - JOUR

T1 - 联邦学习系统中针对大群体后门客户端的鲁棒聚合算法

AU - Wang, Yong Kang

AU - Zhai, Di Hua

AU - Xia, Yuan Qing

PY - 2023/6

Y1 - 2023/6

N2 - With the explosion of data and concerns about privacy among businesses and individuals, traditional centralized machine learning is no longer able to satisfy the existing needs. Federated learning (FL) is a burgeoning distributed machine learning framework, in which multiple diverse clients collaboratively train a global model without sharing the private data, so as to solve the data silos and privacy problems. However, existing studies have demonstrated that FL is extremely vulnerable to all kinds of attacks due to its distributed and privacy-preserving inherent characteristics. Backdoor attack is one of the most prominent attacks in the FL system. To defend against the backdoor attacks in the FL system, a large number of algorithms robust aggregation algorithms are proposed. Nevertheless, these robust aggregation algorithms are restricted by some strong assumptions, such as the number of malicious clients and the data distribution across the diverse clients. Our study shows that the existing robust aggregation algorithms fully failed under a large group of malicious backdoor clients or non-independently identically distributed ( Non-llD) scenarios. To address this problem, we propose a robust aggregation algorithm called Poly which contains two crucial components: one component uses similarity matrix and clustering algorithm to handle the gradients of all clients; another component selects the optimal clusters containing benign clients to aggregate the global model based on the cosine similarity metric. Our proposed Poly can completely remove all malicious backdoor clients in the aggregation process, thereby avoiding the backdoor inserting into the global model. To test the effectiveness of defending against backdoor attack of our proposed Poly, we leverage MN1ST, Fashion-MNIST, CIFAR-10 and Reddit datasets to conduct a series of experiments under both data imbalance and class imbalance NonTID scenarios, as well as the independently identically distributed scenario. In addition to this, we also consider a large group of malicious backdoor clients scenario in which the number of malicious backdoor clients ranges from 50% to 90% with a step 10%, as well as the scenario where the number of malicious backdoor clients is less than that of benign clients. Our experimental results indicate that our proposed Poly outperforms the existing robust aggregation algorithms, and can also effectively defend against backdoor attacks with only about 1 % attack success rate (even 0 % attack success rate in some scenarios) under the testing scenarios, even under the data imbalance and class imbalance NonTID scenarios and a large group of malicious backdoor clients scenario. Beyond that, our proposed Poly can also achieve satisfying primary task accuracy, which indicates that our algorithm Poly does not affect the performance on the primary task that we care about while defending against the backdoor attack. By contrast, the existing robust aggregation algorithms can hardly defend against the backdoor attack under Non-IID scenarios and a large group of malicious backdoor clients, achieving nearly 100% attack success rate.

AB - With the explosion of data and concerns about privacy among businesses and individuals, traditional centralized machine learning is no longer able to satisfy the existing needs. Federated learning (FL) is a burgeoning distributed machine learning framework, in which multiple diverse clients collaboratively train a global model without sharing the private data, so as to solve the data silos and privacy problems. However, existing studies have demonstrated that FL is extremely vulnerable to all kinds of attacks due to its distributed and privacy-preserving inherent characteristics. Backdoor attack is one of the most prominent attacks in the FL system. To defend against the backdoor attacks in the FL system, a large number of algorithms robust aggregation algorithms are proposed. Nevertheless, these robust aggregation algorithms are restricted by some strong assumptions, such as the number of malicious clients and the data distribution across the diverse clients. Our study shows that the existing robust aggregation algorithms fully failed under a large group of malicious backdoor clients or non-independently identically distributed ( Non-llD) scenarios. To address this problem, we propose a robust aggregation algorithm called Poly which contains two crucial components: one component uses similarity matrix and clustering algorithm to handle the gradients of all clients; another component selects the optimal clusters containing benign clients to aggregate the global model based on the cosine similarity metric. Our proposed Poly can completely remove all malicious backdoor clients in the aggregation process, thereby avoiding the backdoor inserting into the global model. To test the effectiveness of defending against backdoor attack of our proposed Poly, we leverage MN1ST, Fashion-MNIST, CIFAR-10 and Reddit datasets to conduct a series of experiments under both data imbalance and class imbalance NonTID scenarios, as well as the independently identically distributed scenario. In addition to this, we also consider a large group of malicious backdoor clients scenario in which the number of malicious backdoor clients ranges from 50% to 90% with a step 10%, as well as the scenario where the number of malicious backdoor clients is less than that of benign clients. Our experimental results indicate that our proposed Poly outperforms the existing robust aggregation algorithms, and can also effectively defend against backdoor attacks with only about 1 % attack success rate (even 0 % attack success rate in some scenarios) under the testing scenarios, even under the data imbalance and class imbalance NonTID scenarios and a large group of malicious backdoor clients scenario. Beyond that, our proposed Poly can also achieve satisfying primary task accuracy, which indicates that our algorithm Poly does not affect the performance on the primary task that we care about while defending against the backdoor attack. By contrast, the existing robust aggregation algorithms can hardly defend against the backdoor attack under Non-IID scenarios and a large group of malicious backdoor clients, achieving nearly 100% attack success rate.

KW - backdoor attacks

KW - clustering

KW - federated learning

KW - heterogeneous

KW - robust

UR - http://www.scopus.com/inward/record.url?scp=85166738450&partnerID=8YFLogxK

U2 - 10.11897/SP.J.1016.2023.01302

DO - 10.11897/SP.J.1016.2023.01302

M3 - 文章

AN - SCOPUS:85166738450

SN - 0254-4164

VL - 46

SP - 1302

EP - 1314

JO - Jisuanji Xuebao/Chinese Journal of Computers

JF - Jisuanji Xuebao/Chinese Journal of Computers

IS - 6

ER -

联邦学习系统中针对大群体后门客户端的鲁棒聚合算法

摘要

关键词

访问文件

其它文件与链接

指纹

引用此