Partitioned Scheduling and Parallelism Assignment for Real-Time DNN Inference Tasks on Multi-TPU

Binqi Sun*, Tomasz Kloda, Chu Ge Wu, Marco Caccamo

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Pipelining on Edge Tensor Processing Units (TPUs) optimizes the deep neural network (DNN) inference by breaking it down into multiple stages processed concurrently on multiple accelerators. Such DNN inference tasks can be modeled as sporadic non-preemptive gangs with execution times that vary with their parallelism levels. This paper proposes a strict partitioning strategy for deploying DNN inferences in real-time systems. The strategy determines tasks' parallelism levels and assigns tasks to disjoint processor partitions. Configuring the tasks in the same partition with a uniform parallelism level avoids scheduling anomalies and enables schedulability verification using well-understood uniprocessor analyses. Evaluation using real-world Edge TPU benchmarks demonstrated that the proposed method achieves a higher schedulability ratio than state-of-the-art gang scheduling techniques.

源语言英语
主期刊名Proceedings of the 61st ACM/IEEE Design Automation Conference, DAC 2024
出版商Institute of Electrical and Electronics Engineers Inc.
ISBN(电子版)9798400706011
DOI
出版状态已出版 - 7 11月 2024
活动61st ACM/IEEE Design Automation Conference, DAC 2024 - San Francisco, 美国
期限: 23 6月 202427 6月 2024

出版系列

姓名Proceedings - Design Automation Conference
ISSN(印刷版)0738-100X

会议

会议61st ACM/IEEE Design Automation Conference, DAC 2024
国家/地区美国
San Francisco
时期23/06/2427/06/24

指纹

探究 'Partitioned Scheduling and Parallelism Assignment for Real-Time DNN Inference Tasks on Multi-TPU' 的科研主题。它们共同构成独一无二的指纹。

引用此