TY - JOUR
T1 - Space Efficient Quantization for Deep Convolutional Neural Networks
AU - Zhao, Dong Di
AU - Li, Fan
AU - Sharif, Kashif
AU - Xia, Guang Min
AU - Wang, Yu
N1 - Publisher Copyright:
© 2019, Springer Science+Business Media, LLC & Science Press, China.
PY - 2019/3/1
Y1 - 2019/3/1
N2 - Deep convolutional neural networks (DCNNs) have shown outstanding performance in the fields of computer vision, natural language processing, and complex system analysis. With the improvement of performance with deeper layers, DCNNs incur higher computational complexity and larger storage requirement, making it extremely difficult to deploy DCNNs on resource-limited embedded systems (such as mobile devices or Internet of Things devices). Network quantization efficiently reduces storage space required by DCNNs. However, the performance of DCNNs often drops rapidly as the quantization bit reduces. In this article, we propose a space efficient quantization scheme which uses eight or less bits to represent the original 32-bit weights. We adopt singular value decomposition (SVD) method to decrease the parameter size of fully-connected layers for further compression. Additionally, we propose a weight clipping method based on dynamic boundary to improve the performance when using lower precision. Experimental results demonstrate that our approach can achieve up to approximately 14x compression while preserving almost the same accuracy compared with the full-precision models. The proposed weight clipping method can also significantly improve the performance of DCNNs when lower precision is required.
AB - Deep convolutional neural networks (DCNNs) have shown outstanding performance in the fields of computer vision, natural language processing, and complex system analysis. With the improvement of performance with deeper layers, DCNNs incur higher computational complexity and larger storage requirement, making it extremely difficult to deploy DCNNs on resource-limited embedded systems (such as mobile devices or Internet of Things devices). Network quantization efficiently reduces storage space required by DCNNs. However, the performance of DCNNs often drops rapidly as the quantization bit reduces. In this article, we propose a space efficient quantization scheme which uses eight or less bits to represent the original 32-bit weights. We adopt singular value decomposition (SVD) method to decrease the parameter size of fully-connected layers for further compression. Additionally, we propose a weight clipping method based on dynamic boundary to improve the performance when using lower precision. Experimental results demonstrate that our approach can achieve up to approximately 14x compression while preserving almost the same accuracy compared with the full-precision models. The proposed weight clipping method can also significantly improve the performance of DCNNs when lower precision is required.
KW - convolutional neural network
KW - memory compression
KW - network quantization
UR - http://www.scopus.com/inward/record.url?scp=85063640640&partnerID=8YFLogxK
U2 - 10.1007/s11390-019-1912-1
DO - 10.1007/s11390-019-1912-1
M3 - Article
AN - SCOPUS:85063640640
SN - 1000-9000
VL - 34
SP - 305
EP - 317
JO - Journal of Computer Science and Technology
JF - Journal of Computer Science and Technology
IS - 2
ER -