Towards efficient full 8-bit integer DNN online training on resource-limited devices without batch normalization

Yukuan Yang; Xiaowei Chi; Lei Deng; Tianyi Yan; Feng Gao; Guoqi Li

doi:10.1016/j.neucom.2022.08.045

Towards efficient full 8-bit integer DNN online training on resource-limited devices without batch normalization

Yukuan Yang, Xiaowei Chi, Lei Deng, Tianyi Yan, Feng Gao, Guoqi Li^*

^*此作品的通讯作者

医学技术学院

科研成果: 期刊稿件 › 文章 › 同行评审

2 引用（Scopus）

摘要

Huge computational costs brought by convolution and batch normalization (BN) have caused great challenges for the online training and corresponding applications of deep neural networks (DNNs), especially in resource-limited devices. Existing works only focus on the convolution or BN acceleration and no solution can alleviate both problems with satisfactory performance. Online training has gradually become a trend in resource-limited devices like mobile phones while there is still no complete technical scheme with acceptable model performance, processing speed, and computational cost. In this research, an efficient online-training quantization framework termed EOQ for abbreviation is proposed by combining Fixup initialization and a novel quantization scheme for the online training in resource-limited devices. Based on the proposed framework, we have successfully realized full 8-bit integer network training and removed BN in large-scale DNNs. Especially, weight updates are quantized to 8-bit integers for the first time. Theoretical analyses of EOQ utilizing Fixup initialization for removing BN have been further given using a novel Block Dynamical Isometry theory with weaker assumptions. Benefiting from rational quantization strategies and the absence of BN, the full 8-bit networks based on EOQ can achieve state-of-the-art accuracy and immense advantages in computational cost and processing speed. Experiments show that the 8-bit EOQ networks achieve 2.78%, 3.85%, and 4.31% accuracy improvements compared with existing full 8-bit integer networks in ResNet-18/34/50. At the same time, the 8-bit EOQ networks can improve the computing speed greatly, and decrease the power consumption and circuit area by about an order of magnitude compared with 32-bit floating-point vanilla networks. In addition to the huge advantages brought by quantization in convolution operations, 8-bit networks based on EOQ without BN can realize >66× lower in power, >13× faster in the processing speed compared with the traditional 32-bit floating-point BN in the inference process. What's more, the design of deep learning chips can be profoundly simplified in the absence of unfriendly square root operations in BN. Beyond this, EOQ has been evidenced to be more advantageous in small-batch online training with fewer batch samples. In summary, the EOQ framework is specially designed for reducing the high cost of convolution and BN in network training, demonstrating a broad application prospect of online training in resource-limited devices.

源语言	英语
页（从-至）	175-186
页数	12
期刊	Neurocomputing
卷	511
DOI	https://doi.org/10.1016/j.neucom.2022.08.045
出版状态	已出版 - 28 10月 2022

访问文件

10.1016/j.neucom.2022.08.045

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{c44a2877dd3249898d3ac22647567bb6,

title = "Towards efficient full 8-bit integer DNN online training on resource-limited devices without batch normalization",

abstract = "Huge computational costs brought by convolution and batch normalization (BN) have caused great challenges for the online training and corresponding applications of deep neural networks (DNNs), especially in resource-limited devices. Existing works only focus on the convolution or BN acceleration and no solution can alleviate both problems with satisfactory performance. Online training has gradually become a trend in resource-limited devices like mobile phones while there is still no complete technical scheme with acceptable model performance, processing speed, and computational cost. In this research, an efficient online-training quantization framework termed EOQ for abbreviation is proposed by combining Fixup initialization and a novel quantization scheme for the online training in resource-limited devices. Based on the proposed framework, we have successfully realized full 8-bit integer network training and removed BN in large-scale DNNs. Especially, weight updates are quantized to 8-bit integers for the first time. Theoretical analyses of EOQ utilizing Fixup initialization for removing BN have been further given using a novel Block Dynamical Isometry theory with weaker assumptions. Benefiting from rational quantization strategies and the absence of BN, the full 8-bit networks based on EOQ can achieve state-of-the-art accuracy and immense advantages in computational cost and processing speed. Experiments show that the 8-bit EOQ networks achieve 2.78%, 3.85%, and 4.31% accuracy improvements compared with existing full 8-bit integer networks in ResNet-18/34/50. At the same time, the 8-bit EOQ networks can improve the computing speed greatly, and decrease the power consumption and circuit area by about an order of magnitude compared with 32-bit floating-point vanilla networks. In addition to the huge advantages brought by quantization in convolution operations, 8-bit networks based on EOQ without BN can realize >66× lower in power, >13× faster in the processing speed compared with the traditional 32-bit floating-point BN in the inference process. What's more, the design of deep learning chips can be profoundly simplified in the absence of unfriendly square root operations in BN. Beyond this, EOQ has been evidenced to be more advantageous in small-batch online training with fewer batch samples. In summary, the EOQ framework is specially designed for reducing the high cost of convolution and BN in network training, demonstrating a broad application prospect of online training in resource-limited devices.",

keywords = "Full 8-bit quantization, Network without batch normalization, Online training, Resource-limited devices, Small batch",

author = "Yukuan Yang and Xiaowei Chi and Lei Deng and Tianyi Yan and Feng Gao and Guoqi Li",

note = "Publisher Copyright: {\textcopyright} 2022",

year = "2022",

month = oct,

day = "28",

doi = "10.1016/j.neucom.2022.08.045",

language = "English",

volume = "511",

pages = "175--186",

journal = "Neurocomputing",

issn = "0925-2312",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - Towards efficient full 8-bit integer DNN online training on resource-limited devices without batch normalization

AU - Yang, Yukuan

AU - Chi, Xiaowei

AU - Deng, Lei

AU - Yan, Tianyi

AU - Gao, Feng

AU - Li, Guoqi

PY - 2022/10/28

Y1 - 2022/10/28

N2 - Huge computational costs brought by convolution and batch normalization (BN) have caused great challenges for the online training and corresponding applications of deep neural networks (DNNs), especially in resource-limited devices. Existing works only focus on the convolution or BN acceleration and no solution can alleviate both problems with satisfactory performance. Online training has gradually become a trend in resource-limited devices like mobile phones while there is still no complete technical scheme with acceptable model performance, processing speed, and computational cost. In this research, an efficient online-training quantization framework termed EOQ for abbreviation is proposed by combining Fixup initialization and a novel quantization scheme for the online training in resource-limited devices. Based on the proposed framework, we have successfully realized full 8-bit integer network training and removed BN in large-scale DNNs. Especially, weight updates are quantized to 8-bit integers for the first time. Theoretical analyses of EOQ utilizing Fixup initialization for removing BN have been further given using a novel Block Dynamical Isometry theory with weaker assumptions. Benefiting from rational quantization strategies and the absence of BN, the full 8-bit networks based on EOQ can achieve state-of-the-art accuracy and immense advantages in computational cost and processing speed. Experiments show that the 8-bit EOQ networks achieve 2.78%, 3.85%, and 4.31% accuracy improvements compared with existing full 8-bit integer networks in ResNet-18/34/50. At the same time, the 8-bit EOQ networks can improve the computing speed greatly, and decrease the power consumption and circuit area by about an order of magnitude compared with 32-bit floating-point vanilla networks. In addition to the huge advantages brought by quantization in convolution operations, 8-bit networks based on EOQ without BN can realize >66× lower in power, >13× faster in the processing speed compared with the traditional 32-bit floating-point BN in the inference process. What's more, the design of deep learning chips can be profoundly simplified in the absence of unfriendly square root operations in BN. Beyond this, EOQ has been evidenced to be more advantageous in small-batch online training with fewer batch samples. In summary, the EOQ framework is specially designed for reducing the high cost of convolution and BN in network training, demonstrating a broad application prospect of online training in resource-limited devices.

AB - Huge computational costs brought by convolution and batch normalization (BN) have caused great challenges for the online training and corresponding applications of deep neural networks (DNNs), especially in resource-limited devices. Existing works only focus on the convolution or BN acceleration and no solution can alleviate both problems with satisfactory performance. Online training has gradually become a trend in resource-limited devices like mobile phones while there is still no complete technical scheme with acceptable model performance, processing speed, and computational cost. In this research, an efficient online-training quantization framework termed EOQ for abbreviation is proposed by combining Fixup initialization and a novel quantization scheme for the online training in resource-limited devices. Based on the proposed framework, we have successfully realized full 8-bit integer network training and removed BN in large-scale DNNs. Especially, weight updates are quantized to 8-bit integers for the first time. Theoretical analyses of EOQ utilizing Fixup initialization for removing BN have been further given using a novel Block Dynamical Isometry theory with weaker assumptions. Benefiting from rational quantization strategies and the absence of BN, the full 8-bit networks based on EOQ can achieve state-of-the-art accuracy and immense advantages in computational cost and processing speed. Experiments show that the 8-bit EOQ networks achieve 2.78%, 3.85%, and 4.31% accuracy improvements compared with existing full 8-bit integer networks in ResNet-18/34/50. At the same time, the 8-bit EOQ networks can improve the computing speed greatly, and decrease the power consumption and circuit area by about an order of magnitude compared with 32-bit floating-point vanilla networks. In addition to the huge advantages brought by quantization in convolution operations, 8-bit networks based on EOQ without BN can realize >66× lower in power, >13× faster in the processing speed compared with the traditional 32-bit floating-point BN in the inference process. What's more, the design of deep learning chips can be profoundly simplified in the absence of unfriendly square root operations in BN. Beyond this, EOQ has been evidenced to be more advantageous in small-batch online training with fewer batch samples. In summary, the EOQ framework is specially designed for reducing the high cost of convolution and BN in network training, demonstrating a broad application prospect of online training in resource-limited devices.

KW - Full 8-bit quantization

KW - Network without batch normalization

KW - Online training

KW - Resource-limited devices

KW - Small batch

UR - http://www.scopus.com/inward/record.url?scp=85138023955&partnerID=8YFLogxK

U2 - 10.1016/j.neucom.2022.08.045

DO - 10.1016/j.neucom.2022.08.045

M3 - Article

AN - SCOPUS:85138023955

SN - 0925-2312

VL - 511

SP - 175

EP - 186

JO - Neurocomputing

JF - Neurocomputing

ER -

Towards efficient full 8-bit integer DNN online training on resource-limited devices without batch normalization

摘要

访问文件

其它文件与链接

指纹

引用此