TY - JOUR
T1 - DetFed
T2 - Dynamic Resource Scheduling for Deterministic Federated Learning Over Time-Sensitive Networks
AU - Yang, Dong
AU - Zhang, Weiting
AU - Ye, Qiang
AU - Zhang, Chuan
AU - Zhang, Ning
AU - Huang, Chuan
AU - Zhang, Hongke
AU - Shen, Xuemin
N1 - Publisher Copyright:
© 2002-2012 IEEE.
PY - 2024/5/1
Y1 - 2024/5/1
N2 - In this paper, we present a three-layer (i.e., device, field, and factory layers) deterministic federated learning (FL) framework, named DetFed, which accelerates collaborative learning process for ultra-reliable and low-latency industrial Internet of Things (IoT) via integrating 6G-oriented Time-sensitive Networks (TSN). Utilizing dispersive local data, industrial IoT devices distributively train a deep neural network (DNN) model, and the updated model parameters are aggregated at their associated field servers every round or at a centralized factory server every a few rounds. Aiming at optimizing the learning accuracy of FL without affecting the co-transmission of burst traffic (e.g., safety-critical traffic), an integrated TSN is considered to establish connections among the three layers, where a cyclic queuing and forwarding mechanism is deployed in each switch to support deterministic model parameter transmission with microsecond-level delay and near-zero packet loss requirements. To improve the FL performance, we formulate a multi-objective stochastic optimization problem to simultaneously maximize the scheduling success ratio and learning accuracy while satisfying the deterministic requirements of delay, jitter, and packet loss. Since the objective function is implicit and the available time slots of the considered TSN in each FL round are temporally correlated, the problem is difficult to solve in real time. Therefore, we transform the problem into a Markov decision process formulation and propose a dynamic resource scheduling algorithm, based on deep reinforcement learning, to make optimal resource scheduling decisions while adapting to device heterogeneity and network dynamics. Experimental results based on real-world dataset demonstrate that the proposed DetFed significantly accelerates FL convergence and improves learning accuracy as compared to state-of-the-art benchmarks.
AB - In this paper, we present a three-layer (i.e., device, field, and factory layers) deterministic federated learning (FL) framework, named DetFed, which accelerates collaborative learning process for ultra-reliable and low-latency industrial Internet of Things (IoT) via integrating 6G-oriented Time-sensitive Networks (TSN). Utilizing dispersive local data, industrial IoT devices distributively train a deep neural network (DNN) model, and the updated model parameters are aggregated at their associated field servers every round or at a centralized factory server every a few rounds. Aiming at optimizing the learning accuracy of FL without affecting the co-transmission of burst traffic (e.g., safety-critical traffic), an integrated TSN is considered to establish connections among the three layers, where a cyclic queuing and forwarding mechanism is deployed in each switch to support deterministic model parameter transmission with microsecond-level delay and near-zero packet loss requirements. To improve the FL performance, we formulate a multi-objective stochastic optimization problem to simultaneously maximize the scheduling success ratio and learning accuracy while satisfying the deterministic requirements of delay, jitter, and packet loss. Since the objective function is implicit and the available time slots of the considered TSN in each FL round are temporally correlated, the problem is difficult to solve in real time. Therefore, we transform the problem into a Markov decision process formulation and propose a dynamic resource scheduling algorithm, based on deep reinforcement learning, to make optimal resource scheduling decisions while adapting to device heterogeneity and network dynamics. Experimental results based on real-world dataset demonstrate that the proposed DetFed significantly accelerates FL convergence and improves learning accuracy as compared to state-of-the-art benchmarks.
KW - Co-transmission
KW - deep reinforcement learning
KW - deterministic federated learning
KW - industrial Internet of Things
KW - resource scheduling
UR - http://www.scopus.com/inward/record.url?scp=85167814698&partnerID=8YFLogxK
U2 - 10.1109/TMC.2023.3303017
DO - 10.1109/TMC.2023.3303017
M3 - Article
AN - SCOPUS:85167814698
SN - 1536-1233
VL - 23
SP - 5162
EP - 5178
JO - IEEE Transactions on Mobile Computing
JF - IEEE Transactions on Mobile Computing
IS - 5
ER -