On Scheduling Early-exit Layers for Model Pipeline in 6G-based Edge Inference

Yuxiao Liu; Rui Han; Qinglong Zhang; Haiting Hou; Chi Harold Liu; Lydia Y. Chen

doi:10.1109/MNET.2024.3520555

On Scheduling Early-exit Layers for Model Pipeline in 6G-based Edge Inference

Yuxiao Liu, Rui Han^*, Qinglong Zhang, Haiting Hou, Chi Harold Liu, Lydia Y. Chen

^*Corresponding author for this work

School of Computer Science and Technology

Research output: Contribution to journal › Article › peer-review

Abstract

When running edge intelligence applications with 6G networks, model pipeline effectively reduces inference latency via parallelizing layers across multiple edge devices. Today's edge inference systems usually employ static architecture of layers in pipeline parallelism but dynamically skip part of layers in early-exit, which may significantly degrade system throughput. In this paper, we introduce DensePipe, an online layer scheduling approach that optimally allocates early-exit layers to edge devices to maximize their throughput in model pipeline. To this end, DensePipe profiles all network layers' skipping probabilities in early-exit. At run-time, DensePipe maximizes the pipeline throughput by balancing the processing of all unskipped layers among devices according to the current loads and device resource utilizations. We implement DensePipe with Transformer models and demonstrate its effectiveness against state-of-the-art pipeline methods. Comparative experiments show that DensePiple successfully finds the best devices for most of the layers and significantly improves through-put by 3.09x.

Original language	English
Journal	IEEE Network
DOIs	https://doi.org/10.1109/MNET.2024.3520555
Publication status	Accepted/In press - 2024

Keywords

6G
early exit
edge devices
inference
pipeline parallelism

Access to Document

10.1109/MNET.2024.3520555

Cite this

Liu, Y., Han, R., Zhang, Q., Hou, H., Liu, C. H., & Chen, L. Y. (Accepted/In press). On Scheduling Early-exit Layers for Model Pipeline in 6G-based Edge Inference. IEEE Network. https://doi.org/10.1109/MNET.2024.3520555

@article{8369bff961144127a692ebc04f29bbc4,

title = "On Scheduling Early-exit Layers for Model Pipeline in 6G-based Edge Inference",

abstract = "When running edge intelligence applications with 6G networks, model pipeline effectively reduces inference latency via parallelizing layers across multiple edge devices. Today's edge inference systems usually employ static architecture of layers in pipeline parallelism but dynamically skip part of layers in early-exit, which may significantly degrade system throughput. In this paper, we introduce DensePipe, an online layer scheduling approach that optimally allocates early-exit layers to edge devices to maximize their throughput in model pipeline. To this end, DensePipe profiles all network layers' skipping probabilities in early-exit. At run-time, DensePipe maximizes the pipeline throughput by balancing the processing of all unskipped layers among devices according to the current loads and device resource utilizations. We implement DensePipe with Transformer models and demonstrate its effectiveness against state-of-the-art pipeline methods. Comparative experiments show that DensePiple successfully finds the best devices for most of the layers and significantly improves through-put by 3.09x.",

keywords = "6G, early exit, edge devices, inference, pipeline parallelism",

author = "Yuxiao Liu and Rui Han and Qinglong Zhang and Haiting Hou and Liu, {Chi Harold} and Chen, {Lydia Y.}",

note = "Publisher Copyright: {\textcopyright} 2024 IEEE.",

year = "2024",

doi = "10.1109/MNET.2024.3520555",

language = "English",

journal = "IEEE Network",

issn = "0890-8044",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - On Scheduling Early-exit Layers for Model Pipeline in 6G-based Edge Inference

AU - Liu, Yuxiao

AU - Han, Rui

AU - Zhang, Qinglong

AU - Hou, Haiting

AU - Liu, Chi Harold

AU - Chen, Lydia Y.

PY - 2024

Y1 - 2024

N2 - When running edge intelligence applications with 6G networks, model pipeline effectively reduces inference latency via parallelizing layers across multiple edge devices. Today's edge inference systems usually employ static architecture of layers in pipeline parallelism but dynamically skip part of layers in early-exit, which may significantly degrade system throughput. In this paper, we introduce DensePipe, an online layer scheduling approach that optimally allocates early-exit layers to edge devices to maximize their throughput in model pipeline. To this end, DensePipe profiles all network layers' skipping probabilities in early-exit. At run-time, DensePipe maximizes the pipeline throughput by balancing the processing of all unskipped layers among devices according to the current loads and device resource utilizations. We implement DensePipe with Transformer models and demonstrate its effectiveness against state-of-the-art pipeline methods. Comparative experiments show that DensePiple successfully finds the best devices for most of the layers and significantly improves through-put by 3.09x.

AB - When running edge intelligence applications with 6G networks, model pipeline effectively reduces inference latency via parallelizing layers across multiple edge devices. Today's edge inference systems usually employ static architecture of layers in pipeline parallelism but dynamically skip part of layers in early-exit, which may significantly degrade system throughput. In this paper, we introduce DensePipe, an online layer scheduling approach that optimally allocates early-exit layers to edge devices to maximize their throughput in model pipeline. To this end, DensePipe profiles all network layers' skipping probabilities in early-exit. At run-time, DensePipe maximizes the pipeline throughput by balancing the processing of all unskipped layers among devices according to the current loads and device resource utilizations. We implement DensePipe with Transformer models and demonstrate its effectiveness against state-of-the-art pipeline methods. Comparative experiments show that DensePiple successfully finds the best devices for most of the layers and significantly improves through-put by 3.09x.

KW - 6G

KW - early exit

KW - edge devices

KW - inference

KW - pipeline parallelism

UR - http://www.scopus.com/inward/record.url?scp=85212934564&partnerID=8YFLogxK

U2 - 10.1109/MNET.2024.3520555

DO - 10.1109/MNET.2024.3520555

M3 - Article

AN - SCOPUS:85212934564

SN - 0890-8044

JO - IEEE Network

JF - IEEE Network

ER -

On Scheduling Early-exit Layers for Model Pipeline in 6G-based Edge Inference

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this