On Scheduling Early-exit Layers for Model Pipeline in 6G-based Edge Inference

Yuxiao Liu, Rui Han*, Qinglong Zhang, Haiting Hou, Chi Harold Liu, Lydia Y. Chen

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

When running edge intelligence applications with 6G networks, model pipeline effectively reduces inference latency via parallelizing layers across multiple edge devices. Today's edge inference systems usually employ static architecture of layers in pipeline parallelism but dynamically skip part of layers in early-exit, which may significantly degrade system throughput. In this paper, we introduce DensePipe, an online layer scheduling approach that optimally allocates early-exit layers to edge devices to maximize their throughput in model pipeline. To this end, DensePipe profiles all network layers' skipping probabilities in early-exit. At run-time, DensePipe maximizes the pipeline throughput by balancing the processing of all unskipped layers among devices according to the current loads and device resource utilizations. We implement DensePipe with Transformer models and demonstrate its effectiveness against state-of-the-art pipeline methods. Comparative experiments show that DensePiple successfully finds the best devices for most of the layers and significantly improves through-put by 3.09x.

Original languageEnglish
JournalIEEE Network
DOIs
Publication statusAccepted/In press - 2024

Keywords

  • 6G
  • early exit
  • edge devices
  • inference
  • pipeline parallelism

Fingerprint

Dive into the research topics of 'On Scheduling Early-exit Layers for Model Pipeline in 6G-based Edge Inference'. Together they form a unique fingerprint.

Cite this

Liu, Y., Han, R., Zhang, Q., Hou, H., Liu, C. H., & Chen, L. Y. (Accepted/In press). On Scheduling Early-exit Layers for Model Pipeline in 6G-based Edge Inference. IEEE Network. https://doi.org/10.1109/MNET.2024.3520555