TY - JOUR
T1 - Training high-performance and large-scale deep neural networks with full 8-bit integers
AU - Yang, Yukuan
AU - Deng, Lei
AU - Wu, Shuang
AU - Yan, Tianyi
AU - Xie, Yuan
AU - Li, Guoqi
N1 - Publisher Copyright:
© 2020 Elsevier Ltd
PY - 2020/5
Y1 - 2020/5
N2 - Deep neural network (DNN) quantization converting floating-point (FP) data in the network to integers (INT) is an effective way to shrink the model size for memory saving and simplify the operations for compute acceleration. Recently, researches on DNN quantization develop from inference to training, laying a foundation for the online training on accelerators. However, existing schemes leaving batch normalization (BN) untouched during training are mostly incomplete quantization that still adopts high precision FP in some parts of the data paths. Currently, there is no solution that can use only low bit-width INT data during the whole training process of large-scale DNNs with acceptable accuracy. In this work, through decomposing all the computation steps in DNNs and fusing three special quantization functions to satisfy the different precision requirements, we propose a unified complete quantization framework termed as “WAGEUBN” to quantize DNNs involving all data paths including W (Weights), A (Activation), G (Gradient), E (Error), U (Update), and BN. Moreover, the Momentum optimizer is also quantized to realize a completely quantized framework. Experiments on ResNet18/34/50 models demonstrate that WAGEUBN can achieve competitive accuracy on the ImageNet dataset. For the first time, the study of quantization in large-scale DNNs is advanced to the full 8-bit INT level. In this way, all the operations in the training and inference can be bit-wise operations, pushing towards faster processing speed, decreased memory cost, and higher energy efficiency. Our throughout quantization framework has great potential for future efficient portable devices with online learning ability.
AB - Deep neural network (DNN) quantization converting floating-point (FP) data in the network to integers (INT) is an effective way to shrink the model size for memory saving and simplify the operations for compute acceleration. Recently, researches on DNN quantization develop from inference to training, laying a foundation for the online training on accelerators. However, existing schemes leaving batch normalization (BN) untouched during training are mostly incomplete quantization that still adopts high precision FP in some parts of the data paths. Currently, there is no solution that can use only low bit-width INT data during the whole training process of large-scale DNNs with acceptable accuracy. In this work, through decomposing all the computation steps in DNNs and fusing three special quantization functions to satisfy the different precision requirements, we propose a unified complete quantization framework termed as “WAGEUBN” to quantize DNNs involving all data paths including W (Weights), A (Activation), G (Gradient), E (Error), U (Update), and BN. Moreover, the Momentum optimizer is also quantized to realize a completely quantized framework. Experiments on ResNet18/34/50 models demonstrate that WAGEUBN can achieve competitive accuracy on the ImageNet dataset. For the first time, the study of quantization in large-scale DNNs is advanced to the full 8-bit INT level. In this way, all the operations in the training and inference can be bit-wise operations, pushing towards faster processing speed, decreased memory cost, and higher energy efficiency. Our throughout quantization framework has great potential for future efficient portable devices with online learning ability.
KW - 8-bit training
KW - Full quantization
KW - Neural network quantization
KW - Online learning device
UR - http://www.scopus.com/inward/record.url?scp=85079516063&partnerID=8YFLogxK
U2 - 10.1016/j.neunet.2019.12.027
DO - 10.1016/j.neunet.2019.12.027
M3 - Article
C2 - 32070857
AN - SCOPUS:85079516063
SN - 0893-6080
VL - 125
SP - 70
EP - 82
JO - Neural Networks
JF - Neural Networks
ER -