TY - JOUR
T1 - 联邦学习系统中针对大群体后门客户端的鲁棒聚合算法
AU - Wang, Yong Kang
AU - Zhai, Di Hua
AU - Xia, Yuan Qing
N1 - Publisher Copyright:
© 2023 Science Press. All rights reserved.
PY - 2023/6
Y1 - 2023/6
N2 - With the explosion of data and concerns about privacy among businesses and individuals, traditional centralized machine learning is no longer able to satisfy the existing needs. Federated learning (FL) is a burgeoning distributed machine learning framework, in which multiple diverse clients collaboratively train a global model without sharing the private data, so as to solve the data silos and privacy problems. However, existing studies have demonstrated that FL is extremely vulnerable to all kinds of attacks due to its distributed and privacy-preserving inherent characteristics. Backdoor attack is one of the most prominent attacks in the FL system. To defend against the backdoor attacks in the FL system, a large number of algorithms robust aggregation algorithms are proposed. Nevertheless, these robust aggregation algorithms are restricted by some strong assumptions, such as the number of malicious clients and the data distribution across the diverse clients. Our study shows that the existing robust aggregation algorithms fully failed under a large group of malicious backdoor clients or non-independently identically distributed ( Non-llD) scenarios. To address this problem, we propose a robust aggregation algorithm called Poly which contains two crucial components: one component uses similarity matrix and clustering algorithm to handle the gradients of all clients; another component selects the optimal clusters containing benign clients to aggregate the global model based on the cosine similarity metric. Our proposed Poly can completely remove all malicious backdoor clients in the aggregation process, thereby avoiding the backdoor inserting into the global model. To test the effectiveness of defending against backdoor attack of our proposed Poly, we leverage MN1ST, Fashion-MNIST, CIFAR-10 and Reddit datasets to conduct a series of experiments under both data imbalance and class imbalance NonTID scenarios, as well as the independently identically distributed scenario. In addition to this, we also consider a large group of malicious backdoor clients scenario in which the number of malicious backdoor clients ranges from 50% to 90% with a step 10%, as well as the scenario where the number of malicious backdoor clients is less than that of benign clients. Our experimental results indicate that our proposed Poly outperforms the existing robust aggregation algorithms, and can also effectively defend against backdoor attacks with only about 1 % attack success rate (even 0 % attack success rate in some scenarios) under the testing scenarios, even under the data imbalance and class imbalance NonTID scenarios and a large group of malicious backdoor clients scenario. Beyond that, our proposed Poly can also achieve satisfying primary task accuracy, which indicates that our algorithm Poly does not affect the performance on the primary task that we care about while defending against the backdoor attack. By contrast, the existing robust aggregation algorithms can hardly defend against the backdoor attack under Non-IID scenarios and a large group of malicious backdoor clients, achieving nearly 100% attack success rate.
AB - With the explosion of data and concerns about privacy among businesses and individuals, traditional centralized machine learning is no longer able to satisfy the existing needs. Federated learning (FL) is a burgeoning distributed machine learning framework, in which multiple diverse clients collaboratively train a global model without sharing the private data, so as to solve the data silos and privacy problems. However, existing studies have demonstrated that FL is extremely vulnerable to all kinds of attacks due to its distributed and privacy-preserving inherent characteristics. Backdoor attack is one of the most prominent attacks in the FL system. To defend against the backdoor attacks in the FL system, a large number of algorithms robust aggregation algorithms are proposed. Nevertheless, these robust aggregation algorithms are restricted by some strong assumptions, such as the number of malicious clients and the data distribution across the diverse clients. Our study shows that the existing robust aggregation algorithms fully failed under a large group of malicious backdoor clients or non-independently identically distributed ( Non-llD) scenarios. To address this problem, we propose a robust aggregation algorithm called Poly which contains two crucial components: one component uses similarity matrix and clustering algorithm to handle the gradients of all clients; another component selects the optimal clusters containing benign clients to aggregate the global model based on the cosine similarity metric. Our proposed Poly can completely remove all malicious backdoor clients in the aggregation process, thereby avoiding the backdoor inserting into the global model. To test the effectiveness of defending against backdoor attack of our proposed Poly, we leverage MN1ST, Fashion-MNIST, CIFAR-10 and Reddit datasets to conduct a series of experiments under both data imbalance and class imbalance NonTID scenarios, as well as the independently identically distributed scenario. In addition to this, we also consider a large group of malicious backdoor clients scenario in which the number of malicious backdoor clients ranges from 50% to 90% with a step 10%, as well as the scenario where the number of malicious backdoor clients is less than that of benign clients. Our experimental results indicate that our proposed Poly outperforms the existing robust aggregation algorithms, and can also effectively defend against backdoor attacks with only about 1 % attack success rate (even 0 % attack success rate in some scenarios) under the testing scenarios, even under the data imbalance and class imbalance NonTID scenarios and a large group of malicious backdoor clients scenario. Beyond that, our proposed Poly can also achieve satisfying primary task accuracy, which indicates that our algorithm Poly does not affect the performance on the primary task that we care about while defending against the backdoor attack. By contrast, the existing robust aggregation algorithms can hardly defend against the backdoor attack under Non-IID scenarios and a large group of malicious backdoor clients, achieving nearly 100% attack success rate.
KW - backdoor attacks
KW - clustering
KW - federated learning
KW - heterogeneous
KW - robust
UR - http://www.scopus.com/inward/record.url?scp=85166738450&partnerID=8YFLogxK
U2 - 10.11897/SP.J.1016.2023.01302
DO - 10.11897/SP.J.1016.2023.01302
M3 - 文章
AN - SCOPUS:85166738450
SN - 0254-4164
VL - 46
SP - 1302
EP - 1314
JO - Jisuanji Xuebao/Chinese Journal of Computers
JF - Jisuanji Xuebao/Chinese Journal of Computers
IS - 6
ER -