LegoDNN: Block-grained scaling of deep neural networks for mobile vision

Rui Han; Qinglong Zhang; Chi Harold Liu; Guoren Wang; Jian Tang; Lydia Y. Chen

doi:10.1145/3447993.3483249

LegoDNN: Block-grained scaling of deep neural networks for mobile vision

Rui Han, Qinglong Zhang, Chi Harold Liu, Guoren Wang, Jian Tang, Lydia Y. Chen

School of Computer Science and Technology

Research output: Contribution to conference › Paper › peer-review

33 Citations (Scopus)

Abstract

Deep neural networks (DNNs) have become ubiquitous techniques in mobile and embedded systems for applications such as image/object recognition and classification. The trend of executing multiple DNNs simultaneously exacerbate the existing limitations of meeting stringent latency/accuracy requirements on resource constrained mobile devices. The prior art sheds light on exploring the accuracy-resource tradeoff by scaling the model sizes in accordance to resource dynamics. However, such model scaling approaches face to imminent challenges: (i) large space exploration of model sizes, and (ii) prohibitively long training time for different model combinations. In this paper, we present LegoDNN, a lightweight, block-grained scaling solution for running multi-DNN workloads in mobile vision systems. LegoDNN guarantees short model training times by only extracting and training a small number of common blocks (e.g. 5 in VGG and 8 in ResNet) in a DNN. At run-Time, LegoDNN optimally combines the descendant models of these blocks to maximize accuracy under specific resources and latency constraints, while reducing switching overhead via smart block-level scaling of the DNN. We implement LegoDNN in TensorFlow Lite and extensively evaluate it against state-of-The-Art techniques (FLOP scaling, knowledge distillation and model compression) using a set of 12 popular DNN models. Evaluation results show that LegoDNN provides 1,296x to 279,936x more options in model sizes without increasing training time, thus achieving as much as 31.74% improvement in inference accuracy and 71.07% reduction in scaling energy consumptions.

Original language	English
Pages	406-419
Number of pages	14
DOIs	https://doi.org/10.1145/3447993.3483249
Publication status	Published - 2021
Event	27th ACM Annual International Conference On Mobile Computing And Networking, MobiCom 2021 - New Orleans, United States Duration: 25 Oct 2021 → 29 Oct 2021

Conference

Conference	27th ACM Annual International Conference On Mobile Computing And Networking, MobiCom 2021
Country/Territory	United States
City	New Orleans
Period	25/10/21 → 29/10/21

Keywords

block-grained scaling
mobile vision
neural networks

Access to Document

10.1145/3447993.3483249

Cite this

Han, R., Zhang, Q., Liu, C. H., Wang, G., Tang, J., & Chen, L. Y. (2021). LegoDNN: Block-grained scaling of deep neural networks for mobile vision. 406-419. Paper presented at 27th ACM Annual International Conference On Mobile Computing And Networking, MobiCom 2021, New Orleans, United States. https://doi.org/10.1145/3447993.3483249

@conference{b73c8990c6b64420bcd49c9692827dc7,

title = "LegoDNN: Block-grained scaling of deep neural networks for mobile vision",

abstract = "Deep neural networks (DNNs) have become ubiquitous techniques in mobile and embedded systems for applications such as image/object recognition and classification. The trend of executing multiple DNNs simultaneously exacerbate the existing limitations of meeting stringent latency/accuracy requirements on resource constrained mobile devices. The prior art sheds light on exploring the accuracy-resource tradeoff by scaling the model sizes in accordance to resource dynamics. However, such model scaling approaches face to imminent challenges: (i) large space exploration of model sizes, and (ii) prohibitively long training time for different model combinations. In this paper, we present LegoDNN, a lightweight, block-grained scaling solution for running multi-DNN workloads in mobile vision systems. LegoDNN guarantees short model training times by only extracting and training a small number of common blocks (e.g. 5 in VGG and 8 in ResNet) in a DNN. At run-Time, LegoDNN optimally combines the descendant models of these blocks to maximize accuracy under specific resources and latency constraints, while reducing switching overhead via smart block-level scaling of the DNN. We implement LegoDNN in TensorFlow Lite and extensively evaluate it against state-of-The-Art techniques (FLOP scaling, knowledge distillation and model compression) using a set of 12 popular DNN models. Evaluation results show that LegoDNN provides 1,296x to 279,936x more options in model sizes without increasing training time, thus achieving as much as 31.74% improvement in inference accuracy and 71.07% reduction in scaling energy consumptions.",

keywords = "block-grained scaling, mobile vision, neural networks",

author = "Rui Han and Qinglong Zhang and Liu, {Chi Harold} and Guoren Wang and Jian Tang and Chen, {Lydia Y.}",

note = "Publisher Copyright: {\textcopyright} 2021 ACM.; 27th ACM Annual International Conference On Mobile Computing And Networking, MobiCom 2021 ; Conference date: 25-10-2021 Through 29-10-2021",

year = "2021",

doi = "10.1145/3447993.3483249",

language = "English",

pages = "406--419",

}

TY - CONF

T1 - LegoDNN

T2 - 27th ACM Annual International Conference On Mobile Computing And Networking, MobiCom 2021

AU - Han, Rui

AU - Zhang, Qinglong

AU - Liu, Chi Harold

AU - Wang, Guoren

AU - Tang, Jian

AU - Chen, Lydia Y.

PY - 2021

Y1 - 2021

N2 - Deep neural networks (DNNs) have become ubiquitous techniques in mobile and embedded systems for applications such as image/object recognition and classification. The trend of executing multiple DNNs simultaneously exacerbate the existing limitations of meeting stringent latency/accuracy requirements on resource constrained mobile devices. The prior art sheds light on exploring the accuracy-resource tradeoff by scaling the model sizes in accordance to resource dynamics. However, such model scaling approaches face to imminent challenges: (i) large space exploration of model sizes, and (ii) prohibitively long training time for different model combinations. In this paper, we present LegoDNN, a lightweight, block-grained scaling solution for running multi-DNN workloads in mobile vision systems. LegoDNN guarantees short model training times by only extracting and training a small number of common blocks (e.g. 5 in VGG and 8 in ResNet) in a DNN. At run-Time, LegoDNN optimally combines the descendant models of these blocks to maximize accuracy under specific resources and latency constraints, while reducing switching overhead via smart block-level scaling of the DNN. We implement LegoDNN in TensorFlow Lite and extensively evaluate it against state-of-The-Art techniques (FLOP scaling, knowledge distillation and model compression) using a set of 12 popular DNN models. Evaluation results show that LegoDNN provides 1,296x to 279,936x more options in model sizes without increasing training time, thus achieving as much as 31.74% improvement in inference accuracy and 71.07% reduction in scaling energy consumptions.

AB - Deep neural networks (DNNs) have become ubiquitous techniques in mobile and embedded systems for applications such as image/object recognition and classification. The trend of executing multiple DNNs simultaneously exacerbate the existing limitations of meeting stringent latency/accuracy requirements on resource constrained mobile devices. The prior art sheds light on exploring the accuracy-resource tradeoff by scaling the model sizes in accordance to resource dynamics. However, such model scaling approaches face to imminent challenges: (i) large space exploration of model sizes, and (ii) prohibitively long training time for different model combinations. In this paper, we present LegoDNN, a lightweight, block-grained scaling solution for running multi-DNN workloads in mobile vision systems. LegoDNN guarantees short model training times by only extracting and training a small number of common blocks (e.g. 5 in VGG and 8 in ResNet) in a DNN. At run-Time, LegoDNN optimally combines the descendant models of these blocks to maximize accuracy under specific resources and latency constraints, while reducing switching overhead via smart block-level scaling of the DNN. We implement LegoDNN in TensorFlow Lite and extensively evaluate it against state-of-The-Art techniques (FLOP scaling, knowledge distillation and model compression) using a set of 12 popular DNN models. Evaluation results show that LegoDNN provides 1,296x to 279,936x more options in model sizes without increasing training time, thus achieving as much as 31.74% improvement in inference accuracy and 71.07% reduction in scaling energy consumptions.

KW - block-grained scaling

KW - mobile vision

KW - neural networks

UR - http://www.scopus.com/inward/record.url?scp=85133003483&partnerID=8YFLogxK

U2 - 10.1145/3447993.3483249

DO - 10.1145/3447993.3483249

M3 - Paper

AN - SCOPUS:85133003483

SP - 406

EP - 419

Y2 - 25 October 2021 through 29 October 2021

ER -

LegoDNN: Block-grained scaling of deep neural networks for mobile vision

Abstract

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this