Partitioned Scheduling and Parallelism Assignment for Real-Time DNN Inference Tasks on Multi-TPU

Binqi Sun*, Tomasz Kloda, Chu Ge Wu, Marco Caccamo

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

Pipelining on Edge Tensor Processing Units (TPUs) optimizes the deep neural network (DNN) inference by breaking it down into multiple stages processed concurrently on multiple accelerators. Such DNN inference tasks can be modeled as sporadic non-preemptive gangs with execution times that vary with their parallelism levels. This paper proposes a strict partitioning strategy for deploying DNN inferences in real-time systems. The strategy determines tasks' parallelism levels and assigns tasks to disjoint processor partitions. Configuring the tasks in the same partition with a uniform parallelism level avoids scheduling anomalies and enables schedulability verification using well-understood uniprocessor analyses. Evaluation using real-world Edge TPU benchmarks demonstrated that the proposed method achieves a higher schedulability ratio than state-of-the-art gang scheduling techniques.

Original languageEnglish
Title of host publicationProceedings of the 61st ACM/IEEE Design Automation Conference, DAC 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798400706011
DOIs
Publication statusPublished - 7 Nov 2024
Event61st ACM/IEEE Design Automation Conference, DAC 2024 - San Francisco, United States
Duration: 23 Jun 202427 Jun 2024

Publication series

NameProceedings - Design Automation Conference
ISSN (Print)0738-100X

Conference

Conference61st ACM/IEEE Design Automation Conference, DAC 2024
Country/TerritoryUnited States
CitySan Francisco
Period23/06/2427/06/24

Fingerprint

Dive into the research topics of 'Partitioned Scheduling and Parallelism Assignment for Real-Time DNN Inference Tasks on Multi-TPU'. Together they form a unique fingerprint.

Cite this