TY - JOUR
T1 - Accelerating Gossip-Based Deep Learning in Heterogeneous Edge Computing Platforms
AU - Han, Rui
AU - Li, Shilin
AU - Wang, Xiangwei
AU - Liu, Chi Harold
AU - Xin, Gaofeng
AU - Chen, Lydia Y.
N1 - Publisher Copyright:
© 1990-2012 IEEE.
PY - 2021/7/1
Y1 - 2021/7/1
N2 - With the exponential growth of data created at the network edge, decentralized and Gossip-based training of deep learning (DL) models on edge computing (EC) gains tremendous research momentum, owing to its capability to learn from resource-strenuous edge nodes with limited network connectivity. Today's edge devices are extremely heterogeneous, e.g., hardware and software stacks, and result in high performance variation of training time and inducing extra delay to synchronize and converge. The large body of prior art accelerates DL, being data or model parallelization, via a centralized server, e.g., parameter server scheme, which may easily turn into the system bottleneck or single point of failure. In this artice, we propose EdgeGossip, a framework specifically designed to accelerate the training process of decentralized and Gossip-based DL training for heterogeneous EC platforms. EdgeGossip features on: (i) low performance variation among multiple EC platforms during iterative training, and (ii) accuracy-aware training to fastly obtain best possible model accuracy. We implement EdgeGossip based on popular Gossip algorithms and demonstrate its effectiveness using real-world DL workloads, i.e., considerably reducing model training time by an average of 2.70 times while only incurring accuracy losses of 0.78 percent.
AB - With the exponential growth of data created at the network edge, decentralized and Gossip-based training of deep learning (DL) models on edge computing (EC) gains tremendous research momentum, owing to its capability to learn from resource-strenuous edge nodes with limited network connectivity. Today's edge devices are extremely heterogeneous, e.g., hardware and software stacks, and result in high performance variation of training time and inducing extra delay to synchronize and converge. The large body of prior art accelerates DL, being data or model parallelization, via a centralized server, e.g., parameter server scheme, which may easily turn into the system bottleneck or single point of failure. In this artice, we propose EdgeGossip, a framework specifically designed to accelerate the training process of decentralized and Gossip-based DL training for heterogeneous EC platforms. EdgeGossip features on: (i) low performance variation among multiple EC platforms during iterative training, and (ii) accuracy-aware training to fastly obtain best possible model accuracy. We implement EdgeGossip based on popular Gossip algorithms and demonstrate its effectiveness using real-world DL workloads, i.e., considerably reducing model training time by an average of 2.70 times while only incurring accuracy losses of 0.78 percent.
KW - Deep learning
KW - decentralized training
KW - edge computing
KW - gossip
UR - http://www.scopus.com/inward/record.url?scp=85098757067&partnerID=8YFLogxK
U2 - 10.1109/TPDS.2020.3046440
DO - 10.1109/TPDS.2020.3046440
M3 - Article
AN - SCOPUS:85098757067
SN - 1045-9219
VL - 32
SP - 1591
EP - 1602
JO - IEEE Transactions on Parallel and Distributed Systems
JF - IEEE Transactions on Parallel and Distributed Systems
IS - 7
M1 - 9303468
ER -