TY - JOUR
T1 - A 28-nm Computing-in-Memory-Based Super-Resolution Accelerator Incorporating Macro-Level Pipeline and Texture/Algebraic Sparsity
AU - Wu, Hao
AU - Chen, Yong
AU - Yuan, Yiyang
AU - Yue, Jinshan
AU - Fu, Xiangqu
AU - Ren, Qirui
AU - Luo, Qing
AU - Mak, Pui In
AU - Wang, Xinghua
AU - Zhang, Feng
N1 - Publisher Copyright:
© 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
PY - 2024/2/1
Y1 - 2024/2/1
N2 - Super-resolution (SR) task using the convolutional neural network is a crucial task in improving image and video quality. The introduction of the residual block (RB) raises the depth of the algorithm to perform better reconstruction. The processing of the RB leads to a decrease in hardware utilization and frequent off-chip communications. It is hard to apply such algorithms on edge devices with limited performance. Computing-in-memory (CiM) is one promising method to reduce high power caused by massive data movement in multiply-accumulation computation. The algebraic sparsity (AS) is the structured sparsity (SS) optimization for imaging computing. However, it is an unsolved problem to simultaneously realize the texture sparsity (TS) of the image and the SS of the algorithm in the CiM scheme while maintaining high hardware utilization. Thus, we propose a CiM-based SR task accelerator. There are three key contributions: first, a texture-aware workflow and a dynamic grouping CiM engine can concurrently support TS coupling with AS. Second, a macro-level pipeline scheme together with two custom-sized CiM macros and a high reuse-rate Hadamard transformation circuit reaches 91% hardware utilization. Third, a novel weight update strategy is devised to reduce the performance loss induced by the weight updating. The accelerator prototype is fabricated in a 28-nm CMOS. It scores a 22.8-44.3-TOPS/W peak energy efficiency at the voltage supply of 0.54-1.1 V and the operating frequency of 50-200 MHz, indicating 1.8-6.8x higher compared to the state-of-the-art CiM processors.
AB - Super-resolution (SR) task using the convolutional neural network is a crucial task in improving image and video quality. The introduction of the residual block (RB) raises the depth of the algorithm to perform better reconstruction. The processing of the RB leads to a decrease in hardware utilization and frequent off-chip communications. It is hard to apply such algorithms on edge devices with limited performance. Computing-in-memory (CiM) is one promising method to reduce high power caused by massive data movement in multiply-accumulation computation. The algebraic sparsity (AS) is the structured sparsity (SS) optimization for imaging computing. However, it is an unsolved problem to simultaneously realize the texture sparsity (TS) of the image and the SS of the algorithm in the CiM scheme while maintaining high hardware utilization. Thus, we propose a CiM-based SR task accelerator. There are three key contributions: first, a texture-aware workflow and a dynamic grouping CiM engine can concurrently support TS coupling with AS. Second, a macro-level pipeline scheme together with two custom-sized CiM macros and a high reuse-rate Hadamard transformation circuit reaches 91% hardware utilization. Third, a novel weight update strategy is devised to reduce the performance loss induced by the weight updating. The accelerator prototype is fabricated in a 28-nm CMOS. It scores a 22.8-44.3-TOPS/W peak energy efficiency at the voltage supply of 0.54-1.1 V and the operating frequency of 50-200 MHz, indicating 1.8-6.8x higher compared to the state-of-the-art CiM processors.
KW - CMOS
KW - algebraic sparsity (AS)
KW - computing-in-memory (CiM)
KW - multiply-accumulation (MAC)
KW - structured sparsity (SS)
KW - super-resolution (SR)
KW - texture sparsity (TS)
UR - http://www.scopus.com/inward/record.url?scp=85177082564&partnerID=8YFLogxK
U2 - 10.1109/TCSI.2023.3325850
DO - 10.1109/TCSI.2023.3325850
M3 - Article
AN - SCOPUS:85177082564
SN - 1549-8328
VL - 71
SP - 689
EP - 702
JO - IEEE Transactions on Circuits and Systems I: Regular Papers
JF - IEEE Transactions on Circuits and Systems I: Regular Papers
IS - 2
ER -