Partitioned Scheduling and Parallelism Assignment for Real-Time DNN Inference Tasks on Multi-TPU

Binqi Sun; Tomasz Kloda; Chu Ge Wu; Marco Caccamo

doi:10.1145/3649329.3655979

Partitioned Scheduling and Parallelism Assignment for Real-Time DNN Inference Tasks on Multi-TPU

Binqi Sun^*, Tomasz Kloda, Chu Ge Wu, Marco Caccamo

^*此作品的通讯作者

自动化学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

摘要

Pipelining on Edge Tensor Processing Units (TPUs) optimizes the deep neural network (DNN) inference by breaking it down into multiple stages processed concurrently on multiple accelerators. Such DNN inference tasks can be modeled as sporadic non-preemptive gangs with execution times that vary with their parallelism levels. This paper proposes a strict partitioning strategy for deploying DNN inferences in real-time systems. The strategy determines tasks' parallelism levels and assigns tasks to disjoint processor partitions. Configuring the tasks in the same partition with a uniform parallelism level avoids scheduling anomalies and enables schedulability verification using well-understood uniprocessor analyses. Evaluation using real-world Edge TPU benchmarks demonstrated that the proposed method achieves a higher schedulability ratio than state-of-the-art gang scheduling techniques.

源语言	英语
主期刊名	Proceedings of the 61st ACM/IEEE Design Automation Conference, DAC 2024
出版商	Institute of Electrical and Electronics Engineers Inc.
ISBN（电子版）	9798400706011
DOI	https://doi.org/10.1145/3649329.3655979
出版状态	已出版 - 7 11月 2024
活动	61st ACM/IEEE Design Automation Conference, DAC 2024 - San Francisco, 美国期限: 23 6月 2024 → 27 6月 2024

出版系列

姓名	Proceedings - Design Automation Conference
ISSN（印刷版）	0738-100X

会议

会议	61st ACM/IEEE Design Automation Conference, DAC 2024
国家/地区	美国
市	San Francisco
时期	23/06/24 → 27/06/24

访问文件

10.1145/3649329.3655979

其它文件与链接

链接到 Scopus 的出版物

引用此

Sun, B., Kloda, T., Wu, C. G., & Caccamo, M. (2024). Partitioned Scheduling and Parallelism Assignment for Real-Time DNN Inference Tasks on Multi-TPU. 在 Proceedings of the 61st ACM/IEEE Design Automation Conference, DAC 2024 文章 333 (Proceedings - Design Automation Conference). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1145/3649329.3655979

@inproceedings{69714cf7409c418f97f5ef5389922b4c,

title = "Partitioned Scheduling and Parallelism Assignment for Real-Time DNN Inference Tasks on Multi-TPU",

abstract = "Pipelining on Edge Tensor Processing Units (TPUs) optimizes the deep neural network (DNN) inference by breaking it down into multiple stages processed concurrently on multiple accelerators. Such DNN inference tasks can be modeled as sporadic non-preemptive gangs with execution times that vary with their parallelism levels. This paper proposes a strict partitioning strategy for deploying DNN inferences in real-time systems. The strategy determines tasks' parallelism levels and assigns tasks to disjoint processor partitions. Configuring the tasks in the same partition with a uniform parallelism level avoids scheduling anomalies and enables schedulability verification using well-understood uniprocessor analyses. Evaluation using real-world Edge TPU benchmarks demonstrated that the proposed method achieves a higher schedulability ratio than state-of-the-art gang scheduling techniques.",

author = "Binqi Sun and Tomasz Kloda and Wu, {Chu Ge} and Marco Caccamo",

note = "Publisher Copyright: {\textcopyright} 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.; 61st ACM/IEEE Design Automation Conference, DAC 2024 ; Conference date: 23-06-2024 Through 27-06-2024",

year = "2024",

month = nov,

day = "7",

doi = "10.1145/3649329.3655979",

language = "English",

series = "Proceedings - Design Automation Conference",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

booktitle = "Proceedings of the 61st ACM/IEEE Design Automation Conference, DAC 2024",

address = "United States",

}

Sun, B, Kloda, T, Wu, CG & Caccamo, M 2024, Partitioned Scheduling and Parallelism Assignment for Real-Time DNN Inference Tasks on Multi-TPU. 在 Proceedings of the 61st ACM/IEEE Design Automation Conference, DAC 2024., 333, Proceedings - Design Automation Conference, Institute of Electrical and Electronics Engineers Inc., 61st ACM/IEEE Design Automation Conference, DAC 2024, San Francisco, 美国, 23/06/24. https://doi.org/10.1145/3649329.3655979

Partitioned Scheduling and Parallelism Assignment for Real-Time DNN Inference Tasks on Multi-TPU. / Sun, Binqi; Kloda, Tomasz; Wu, Chu Ge 等.
Proceedings of the 61st ACM/IEEE Design Automation Conference, DAC 2024. Institute of Electrical and Electronics Engineers Inc., 2024. 333 (Proceedings - Design Automation Conference).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Partitioned Scheduling and Parallelism Assignment for Real-Time DNN Inference Tasks on Multi-TPU

AU - Sun, Binqi

AU - Kloda, Tomasz

AU - Wu, Chu Ge

AU - Caccamo, Marco

PY - 2024/11/7

Y1 - 2024/11/7

N2 - Pipelining on Edge Tensor Processing Units (TPUs) optimizes the deep neural network (DNN) inference by breaking it down into multiple stages processed concurrently on multiple accelerators. Such DNN inference tasks can be modeled as sporadic non-preemptive gangs with execution times that vary with their parallelism levels. This paper proposes a strict partitioning strategy for deploying DNN inferences in real-time systems. The strategy determines tasks' parallelism levels and assigns tasks to disjoint processor partitions. Configuring the tasks in the same partition with a uniform parallelism level avoids scheduling anomalies and enables schedulability verification using well-understood uniprocessor analyses. Evaluation using real-world Edge TPU benchmarks demonstrated that the proposed method achieves a higher schedulability ratio than state-of-the-art gang scheduling techniques.

AB - Pipelining on Edge Tensor Processing Units (TPUs) optimizes the deep neural network (DNN) inference by breaking it down into multiple stages processed concurrently on multiple accelerators. Such DNN inference tasks can be modeled as sporadic non-preemptive gangs with execution times that vary with their parallelism levels. This paper proposes a strict partitioning strategy for deploying DNN inferences in real-time systems. The strategy determines tasks' parallelism levels and assigns tasks to disjoint processor partitions. Configuring the tasks in the same partition with a uniform parallelism level avoids scheduling anomalies and enables schedulability verification using well-understood uniprocessor analyses. Evaluation using real-world Edge TPU benchmarks demonstrated that the proposed method achieves a higher schedulability ratio than state-of-the-art gang scheduling techniques.

UR - http://www.scopus.com/inward/record.url?scp=85211145451&partnerID=8YFLogxK

U2 - 10.1145/3649329.3655979

DO - 10.1145/3649329.3655979

M3 - Conference contribution

AN - SCOPUS:85211145451

T3 - Proceedings - Design Automation Conference

BT - Proceedings of the 61st ACM/IEEE Design Automation Conference, DAC 2024

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 61st ACM/IEEE Design Automation Conference, DAC 2024

Y2 - 23 June 2024 through 27 June 2024

ER -

Partitioned Scheduling and Parallelism Assignment for Real-Time DNN Inference Tasks on Multi-TPU

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此