Interference-Aware Latency Prediction With Kernels For Deep Neural Network

Pei Jie Huang*, Xiufeng Sui, Dawei Liu, Liyue Zhu

*Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    Abstract

    With the popularity of artificial intelligence applications, deep neural network (DNN) inference workloads are becoming more common in cloud servers. To improve GPU utilization, a GPU executes multiple workloads simultaneously, inevitably leading to resource contention and increasing inference latency. We propose a kernel-based latency prediction method that can more accurately predict the latency in the case of mutual interference between multiple workloads. The method uses the kernel parameters decomposed during the DNN inference to predict the latency of each kernel. It obtains the impact of interference on each model by the amount of data exchanged between the L1 cache, L2 cache, and GPU memory during the execution of each model. We conduct experiments on popular models. The results show that compared with the state-of-the-art multi-model coexistence prediction method, our method reduces the average error by 52% when predicting the latency of a single model and by 62%, 51%, and 58% when predicting the co-location of two, three, and four models.

    Original languageEnglish
    Title of host publication2022 4th International Academic Exchange Conference on Science and Technology Innovation, IAECST 2022
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    Pages1232-1238
    Number of pages7
    ISBN (Electronic)9798350320008
    DOIs
    Publication statusPublished - 2022
    Event4th International Academic Exchange Conference on Science and Technology Innovation, IAECST 2022 - Virtual, Online, China
    Duration: 9 Dec 202211 Dec 2022

    Publication series

    Name2022 4th International Academic Exchange Conference on Science and Technology Innovation, IAECST 2022

    Conference

    Conference4th International Academic Exchange Conference on Science and Technology Innovation, IAECST 2022
    Country/TerritoryChina
    CityVirtual, Online
    Period9/12/2211/12/22

    Keywords

    • DNN
    • Deep learning
    • GPU
    • Kernel
    • Latency prediction
    • MPS

    Fingerprint

    Dive into the research topics of 'Interference-Aware Latency Prediction With Kernels For Deep Neural Network'. Together they form a unique fingerprint.

    Cite this