TY - GEN
T1 - Hwamei
T2 - 43rd IEEE International Conference on Distributed Computing Systems, ICDCS 2023
AU - Qi, Tianyu
AU - Zhan, Yufeng
AU - Li, Peng
AU - Guo, Jingcai
AU - Xia, Yuanqing
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Federated learning (FL) enables collaborative model training among distributed devices without data sharing, but existing FL suffers from poor scalability because of global model synchronization. To address this issue, hierarchical federated learning (HFL) has been recently proposed to let edge servers aggregate models of devices in proximity, while synchronizing via the cloud periodically. However, a critical open challenge about how to design a good synchronization scheme (when devices and edges should be synchronized) is still unsolved. Devices are heterogeneous in computing and communication capability, and their data could be non-IID. No existing work can well synchronize various roles (e.g., devices and edge) in HFL to guarantee high learning efficiency and accuracy. In this paper, we propose a learning-based synchronization scheme for HFL systems. By collecting data such as edge models, CPU usage, communication time, etc., we design a deep reinforcement learning-based approach to decide the frequencies of cloud aggregation and edge aggregation, respectively. The proposed scheme well considers device heterogeneity, non-IID data and device mobility, to maximize the training model accuracy while minimizing the energy overhead. We build an HFL testbed and conduct experiments using real data obtained from Raspberry Pi and Alibaba Cloud. Extensive experimental results have confirmed the effectiveness of Hwamei.
AB - Federated learning (FL) enables collaborative model training among distributed devices without data sharing, but existing FL suffers from poor scalability because of global model synchronization. To address this issue, hierarchical federated learning (HFL) has been recently proposed to let edge servers aggregate models of devices in proximity, while synchronizing via the cloud periodically. However, a critical open challenge about how to design a good synchronization scheme (when devices and edges should be synchronized) is still unsolved. Devices are heterogeneous in computing and communication capability, and their data could be non-IID. No existing work can well synchronize various roles (e.g., devices and edge) in HFL to guarantee high learning efficiency and accuracy. In this paper, we propose a learning-based synchronization scheme for HFL systems. By collecting data such as edge models, CPU usage, communication time, etc., we design a deep reinforcement learning-based approach to decide the frequencies of cloud aggregation and edge aggregation, respectively. The proposed scheme well considers device heterogeneity, non-IID data and device mobility, to maximize the training model accuracy while minimizing the energy overhead. We build an HFL testbed and conduct experiments using real data obtained from Raspberry Pi and Alibaba Cloud. Extensive experimental results have confirmed the effectiveness of Hwamei.
KW - deep reinforcement learning
KW - hierarchical federated learning
KW - statistical heterogeneity
KW - system heterogeneity
UR - http://www.scopus.com/inward/record.url?scp=85175073512&partnerID=8YFLogxK
U2 - 10.1109/ICDCS57875.2023.00047
DO - 10.1109/ICDCS57875.2023.00047
M3 - Conference contribution
AN - SCOPUS:85175073512
T3 - Proceedings - International Conference on Distributed Computing Systems
SP - 534
EP - 544
BT - Proceedings - 2023 IEEE 43rd International Conference on Distributed Computing Systems, ICDCS 2023
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 18 July 2023 through 21 July 2023
ER -