Space Efficient Quantization for Deep Convolutional Neural Networks

Dong Di Zhao; Fan Li; Kashif Sharif; Guang Min Xia; Yu Wang

doi:10.1007/s11390-019-1912-1

Space Efficient Quantization for Deep Convolutional Neural Networks

Dong Di Zhao, Fan Li^*, Kashif Sharif, Guang Min Xia, Yu Wang

^*此作品的通讯作者

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

5 引用（Scopus）

摘要

Deep convolutional neural networks (DCNNs) have shown outstanding performance in the fields of computer vision, natural language processing, and complex system analysis. With the improvement of performance with deeper layers, DCNNs incur higher computational complexity and larger storage requirement, making it extremely difficult to deploy DCNNs on resource-limited embedded systems (such as mobile devices or Internet of Things devices). Network quantization efficiently reduces storage space required by DCNNs. However, the performance of DCNNs often drops rapidly as the quantization bit reduces. In this article, we propose a space efficient quantization scheme which uses eight or less bits to represent the original 32-bit weights. We adopt singular value decomposition (SVD) method to decrease the parameter size of fully-connected layers for further compression. Additionally, we propose a weight clipping method based on dynamic boundary to improve the performance when using lower precision. Experimental results demonstrate that our approach can achieve up to approximately 14x compression while preserving almost the same accuracy compared with the full-precision models. The proposed weight clipping method can also significantly improve the performance of DCNNs when lower precision is required.

源语言	英语
页（从-至）	305-317
页数	13
期刊	Journal of Computer Science and Technology
卷	34
期	2
DOI	https://doi.org/10.1007/s11390-019-1912-1
出版状态	已出版 - 1 3月 2019

访问文件

10.1007/s11390-019-1912-1

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{109f8e2d3a6b4702ac859047a87f8cd2,

title = "Space Efficient Quantization for Deep Convolutional Neural Networks",

abstract = "Deep convolutional neural networks (DCNNs) have shown outstanding performance in the fields of computer vision, natural language processing, and complex system analysis. With the improvement of performance with deeper layers, DCNNs incur higher computational complexity and larger storage requirement, making it extremely difficult to deploy DCNNs on resource-limited embedded systems (such as mobile devices or Internet of Things devices). Network quantization efficiently reduces storage space required by DCNNs. However, the performance of DCNNs often drops rapidly as the quantization bit reduces. In this article, we propose a space efficient quantization scheme which uses eight or less bits to represent the original 32-bit weights. We adopt singular value decomposition (SVD) method to decrease the parameter size of fully-connected layers for further compression. Additionally, we propose a weight clipping method based on dynamic boundary to improve the performance when using lower precision. Experimental results demonstrate that our approach can achieve up to approximately 14x compression while preserving almost the same accuracy compared with the full-precision models. The proposed weight clipping method can also significantly improve the performance of DCNNs when lower precision is required.",

keywords = "convolutional neural network, memory compression, network quantization",

author = "Zhao, {Dong Di} and Fan Li and Kashif Sharif and Xia, {Guang Min} and Yu Wang",

note = "Publisher Copyright: {\textcopyright} 2019, Springer Science+Business Media, LLC & Science Press, China.",

year = "2019",

month = mar,

day = "1",

doi = "10.1007/s11390-019-1912-1",

language = "English",

volume = "34",

pages = "305--317",

journal = "Journal of Computer Science and Technology",

issn = "1000-9000",

publisher = "Springer New York",

number = "2",

}

TY - JOUR

T1 - Space Efficient Quantization for Deep Convolutional Neural Networks

AU - Zhao, Dong Di

AU - Li, Fan

AU - Sharif, Kashif

AU - Xia, Guang Min

AU - Wang, Yu

PY - 2019/3/1

Y1 - 2019/3/1

N2 - Deep convolutional neural networks (DCNNs) have shown outstanding performance in the fields of computer vision, natural language processing, and complex system analysis. With the improvement of performance with deeper layers, DCNNs incur higher computational complexity and larger storage requirement, making it extremely difficult to deploy DCNNs on resource-limited embedded systems (such as mobile devices or Internet of Things devices). Network quantization efficiently reduces storage space required by DCNNs. However, the performance of DCNNs often drops rapidly as the quantization bit reduces. In this article, we propose a space efficient quantization scheme which uses eight or less bits to represent the original 32-bit weights. We adopt singular value decomposition (SVD) method to decrease the parameter size of fully-connected layers for further compression. Additionally, we propose a weight clipping method based on dynamic boundary to improve the performance when using lower precision. Experimental results demonstrate that our approach can achieve up to approximately 14x compression while preserving almost the same accuracy compared with the full-precision models. The proposed weight clipping method can also significantly improve the performance of DCNNs when lower precision is required.

AB - Deep convolutional neural networks (DCNNs) have shown outstanding performance in the fields of computer vision, natural language processing, and complex system analysis. With the improvement of performance with deeper layers, DCNNs incur higher computational complexity and larger storage requirement, making it extremely difficult to deploy DCNNs on resource-limited embedded systems (such as mobile devices or Internet of Things devices). Network quantization efficiently reduces storage space required by DCNNs. However, the performance of DCNNs often drops rapidly as the quantization bit reduces. In this article, we propose a space efficient quantization scheme which uses eight or less bits to represent the original 32-bit weights. We adopt singular value decomposition (SVD) method to decrease the parameter size of fully-connected layers for further compression. Additionally, we propose a weight clipping method based on dynamic boundary to improve the performance when using lower precision. Experimental results demonstrate that our approach can achieve up to approximately 14x compression while preserving almost the same accuracy compared with the full-precision models. The proposed weight clipping method can also significantly improve the performance of DCNNs when lower precision is required.

KW - convolutional neural network

KW - memory compression

KW - network quantization

UR - http://www.scopus.com/inward/record.url?scp=85063640640&partnerID=8YFLogxK

U2 - 10.1007/s11390-019-1912-1

DO - 10.1007/s11390-019-1912-1

M3 - Article

AN - SCOPUS:85063640640

SN - 1000-9000

VL - 34

SP - 305

EP - 317

JO - Journal of Computer Science and Technology

JF - Journal of Computer Science and Technology

IS - 2

ER -

Space Efficient Quantization for Deep Convolutional Neural Networks

摘要

访问文件

其它文件与链接

指纹

引用此