Interference-Aware Latency Prediction With Kernels For Deep Neural Network

Pei Jie Huang; Xiufeng Sui; Dawei Liu; Liyue Zhu

doi:10.1109/IAECST57965.2022.10062171

Interference-Aware Latency Prediction With Kernels For Deep Neural Network

Pei Jie Huang^*, Xiufeng Sui, Dawei Liu, Liyue Zhu

^*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

With the popularity of artificial intelligence applications, deep neural network (DNN) inference workloads are becoming more common in cloud servers. To improve GPU utilization, a GPU executes multiple workloads simultaneously, inevitably leading to resource contention and increasing inference latency. We propose a kernel-based latency prediction method that can more accurately predict the latency in the case of mutual interference between multiple workloads. The method uses the kernel parameters decomposed during the DNN inference to predict the latency of each kernel. It obtains the impact of interference on each model by the amount of data exchanged between the L1 cache, L2 cache, and GPU memory during the execution of each model. We conduct experiments on popular models. The results show that compared with the state-of-the-art multi-model coexistence prediction method, our method reduces the average error by 52% when predicting the latency of a single model and by 62%, 51%, and 58% when predicting the co-location of two, three, and four models.

Original language	English
Title of host publication	2022 4th International Academic Exchange Conference on Science and Technology Innovation, IAECST 2022
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	1232-1238
Number of pages	7
ISBN (Electronic)	9798350320008
DOIs	https://doi.org/10.1109/IAECST57965.2022.10062171
Publication status	Published - 2022
Event	4th International Academic Exchange Conference on Science and Technology Innovation, IAECST 2022 - Virtual, Online, China Duration: 9 Dec 2022 → 11 Dec 2022

Publication series

Name	2022 4th International Academic Exchange Conference on Science and Technology Innovation, IAECST 2022

Conference

Conference	4th International Academic Exchange Conference on Science and Technology Innovation, IAECST 2022
Country/Territory	China
City	Virtual, Online
Period	9/12/22 → 11/12/22

Keywords

DNN
Deep learning
GPU
Kernel
Latency prediction
MPS

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.1109/IAECST57965.2022.10062171

Cite this

Huang, P. J., Sui, X., Liu, D., & Zhu, L. (2022). Interference-Aware Latency Prediction With Kernels For Deep Neural Network. In 2022 4th International Academic Exchange Conference on Science and Technology Innovation, IAECST 2022 (pp. 1232-1238). (2022 4th International Academic Exchange Conference on Science and Technology Innovation, IAECST 2022). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/IAECST57965.2022.10062171

Huang, Pei Jie ; Sui, Xiufeng ; Liu, Dawei et al. / Interference-Aware Latency Prediction With Kernels For Deep Neural Network. 2022 4th International Academic Exchange Conference on Science and Technology Innovation, IAECST 2022. Institute of Electrical and Electronics Engineers Inc., 2022. pp. 1232-1238 (2022 4th International Academic Exchange Conference on Science and Technology Innovation, IAECST 2022).

@inproceedings{b9920741e6f5439bb5c515cfc851c912,

title = "Interference-Aware Latency Prediction With Kernels For Deep Neural Network",

abstract = "With the popularity of artificial intelligence applications, deep neural network (DNN) inference workloads are becoming more common in cloud servers. To improve GPU utilization, a GPU executes multiple workloads simultaneously, inevitably leading to resource contention and increasing inference latency. We propose a kernel-based latency prediction method that can more accurately predict the latency in the case of mutual interference between multiple workloads. The method uses the kernel parameters decomposed during the DNN inference to predict the latency of each kernel. It obtains the impact of interference on each model by the amount of data exchanged between the L1 cache, L2 cache, and GPU memory during the execution of each model. We conduct experiments on popular models. The results show that compared with the state-of-the-art multi-model coexistence prediction method, our method reduces the average error by 52% when predicting the latency of a single model and by 62%, 51%, and 58% when predicting the co-location of two, three, and four models.",

keywords = "DNN, Deep learning, GPU, Kernel, Latency prediction, MPS",

author = "Huang, {Pei Jie} and Xiufeng Sui and Dawei Liu and Liyue Zhu",

note = "Publisher Copyright: {\textcopyright} 2022 IEEE.; 4th International Academic Exchange Conference on Science and Technology Innovation, IAECST 2022 ; Conference date: 09-12-2022 Through 11-12-2022",

year = "2022",

doi = "10.1109/IAECST57965.2022.10062171",

language = "English",

series = "2022 4th International Academic Exchange Conference on Science and Technology Innovation, IAECST 2022",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "1232--1238",

booktitle = "2022 4th International Academic Exchange Conference on Science and Technology Innovation, IAECST 2022",

address = "United States",

}

Huang, PJ, Sui, X, Liu, D & Zhu, L 2022, Interference-Aware Latency Prediction With Kernels For Deep Neural Network. in 2022 4th International Academic Exchange Conference on Science and Technology Innovation, IAECST 2022. 2022 4th International Academic Exchange Conference on Science and Technology Innovation, IAECST 2022, Institute of Electrical and Electronics Engineers Inc., pp. 1232-1238, 4th International Academic Exchange Conference on Science and Technology Innovation, IAECST 2022, Virtual, Online, China, 9/12/22. https://doi.org/10.1109/IAECST57965.2022.10062171

Interference-Aware Latency Prediction With Kernels For Deep Neural Network. / Huang, Pei Jie; Sui, Xiufeng; Liu, Dawei et al.
2022 4th International Academic Exchange Conference on Science and Technology Innovation, IAECST 2022. Institute of Electrical and Electronics Engineers Inc., 2022. p. 1232-1238 (2022 4th International Academic Exchange Conference on Science and Technology Innovation, IAECST 2022).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Interference-Aware Latency Prediction With Kernels For Deep Neural Network

AU - Huang, Pei Jie

AU - Sui, Xiufeng

AU - Liu, Dawei

AU - Zhu, Liyue

PY - 2022

Y1 - 2022

N2 - With the popularity of artificial intelligence applications, deep neural network (DNN) inference workloads are becoming more common in cloud servers. To improve GPU utilization, a GPU executes multiple workloads simultaneously, inevitably leading to resource contention and increasing inference latency. We propose a kernel-based latency prediction method that can more accurately predict the latency in the case of mutual interference between multiple workloads. The method uses the kernel parameters decomposed during the DNN inference to predict the latency of each kernel. It obtains the impact of interference on each model by the amount of data exchanged between the L1 cache, L2 cache, and GPU memory during the execution of each model. We conduct experiments on popular models. The results show that compared with the state-of-the-art multi-model coexistence prediction method, our method reduces the average error by 52% when predicting the latency of a single model and by 62%, 51%, and 58% when predicting the co-location of two, three, and four models.

AB - With the popularity of artificial intelligence applications, deep neural network (DNN) inference workloads are becoming more common in cloud servers. To improve GPU utilization, a GPU executes multiple workloads simultaneously, inevitably leading to resource contention and increasing inference latency. We propose a kernel-based latency prediction method that can more accurately predict the latency in the case of mutual interference between multiple workloads. The method uses the kernel parameters decomposed during the DNN inference to predict the latency of each kernel. It obtains the impact of interference on each model by the amount of data exchanged between the L1 cache, L2 cache, and GPU memory during the execution of each model. We conduct experiments on popular models. The results show that compared with the state-of-the-art multi-model coexistence prediction method, our method reduces the average error by 52% when predicting the latency of a single model and by 62%, 51%, and 58% when predicting the co-location of two, three, and four models.

KW - DNN

KW - Deep learning

KW - GPU

KW - Kernel

KW - Latency prediction

KW - MPS

UR - http://www.scopus.com/inward/record.url?scp=85150809879&partnerID=8YFLogxK

U2 - 10.1109/IAECST57965.2022.10062171

DO - 10.1109/IAECST57965.2022.10062171

M3 - Conference contribution

AN - SCOPUS:85150809879

T3 - 2022 4th International Academic Exchange Conference on Science and Technology Innovation, IAECST 2022

SP - 1232

EP - 1238

BT - 2022 4th International Academic Exchange Conference on Science and Technology Innovation, IAECST 2022

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 4th International Academic Exchange Conference on Science and Technology Innovation, IAECST 2022

Y2 - 9 December 2022 through 11 December 2022

ER -

Huang PJ, Sui X, Liu D, Zhu L. Interference-Aware Latency Prediction With Kernels For Deep Neural Network. In 2022 4th International Academic Exchange Conference on Science and Technology Innovation, IAECST 2022. Institute of Electrical and Electronics Engineers Inc. 2022. p. 1232-1238. (2022 4th International Academic Exchange Conference on Science and Technology Innovation, IAECST 2022). doi: 10.1109/IAECST57965.2022.10062171

Interference-Aware Latency Prediction With Kernels For Deep Neural Network

Abstract

Publication series

Conference

Keywords

UN SDGs

Access to Document

Other files and links

Fingerprint

Cite this