TY - JOUR
T1 - Anchor Model-Based Hybrid Hierarchical Federated Learning with Overlap SGD
AU - Manjang, Ousman
AU - Zhai, Yanlong
AU - Shen, Jun
AU - Tchaye-Kondi, Jude
AU - Zhu, Liehuang
N1 - Publisher Copyright:
© 2002-2012 IEEE.
PY - 2024
Y1 - 2024
N2 - Federated learning (FL) is a distributed machine learning framework where multiple clients collaboratively train a model without sharing their data. Despite advancements, traditional FL methods encounter challenges including communication overhead, extended latency, and slow convergence. To address these issues, this paper introduces Anchor-HHFL, a novel approach that combines the strengths of synchronous and asynchronous FL. Anchor-HHFL employs multi-tier edge servers which conduct partial model aggregation and reduce the frequency of communication with the central server. Anchor-HHFL implements a novel divergence control method through hierarchical pullback. It orchestrates the sequence of each client's stochastic gradient descent (SGD) updates to pull the locally trained models towards an anchor model, ensuring alignment and minimizing divergence. Simultaneously, a secondary process collects client models without disrupting their ongoing local computations and transmits them to edge servers, thereby overlapping computation with communication, substantially enhancing the training speed. Additionally, to effectively handle asynchronous updates across clusters, Anchor-HHFL uses a heuristic weight assignment for global aggregation, weighting clients' updates based on the degree of their divergence from the global model. Extensive experiments on MNIST and CIFAR-10 datasets demonstrate Anchor-HHFL's superiority, achieving up to 3× faster convergence and higher test accuracy compared to the baselines.
AB - Federated learning (FL) is a distributed machine learning framework where multiple clients collaboratively train a model without sharing their data. Despite advancements, traditional FL methods encounter challenges including communication overhead, extended latency, and slow convergence. To address these issues, this paper introduces Anchor-HHFL, a novel approach that combines the strengths of synchronous and asynchronous FL. Anchor-HHFL employs multi-tier edge servers which conduct partial model aggregation and reduce the frequency of communication with the central server. Anchor-HHFL implements a novel divergence control method through hierarchical pullback. It orchestrates the sequence of each client's stochastic gradient descent (SGD) updates to pull the locally trained models towards an anchor model, ensuring alignment and minimizing divergence. Simultaneously, a secondary process collects client models without disrupting their ongoing local computations and transmits them to edge servers, thereby overlapping computation with communication, substantially enhancing the training speed. Additionally, to effectively handle asynchronous updates across clusters, Anchor-HHFL uses a heuristic weight assignment for global aggregation, weighting clients' updates based on the degree of their divergence from the global model. Extensive experiments on MNIST and CIFAR-10 datasets demonstrate Anchor-HHFL's superiority, achieving up to 3× faster convergence and higher test accuracy compared to the baselines.
KW - Anchor model
KW - divergence control
KW - federated learning
KW - hierarchical pullback
KW - overlap computation and communication
UR - http://www.scopus.com/inward/record.url?scp=85196719892&partnerID=8YFLogxK
U2 - 10.1109/TMC.2024.3414999
DO - 10.1109/TMC.2024.3414999
M3 - Article
AN - SCOPUS:85196719892
SN - 1536-1233
VL - 23
SP - 12540
EP - 12557
JO - IEEE Transactions on Mobile Computing
JF - IEEE Transactions on Mobile Computing
IS - 12
ER -