TY - JOUR
T1 - A 28-nm 19.9-to-258.5-TOPS/W 8b Digital Computing-in-Memory Processor With Two-Cycle Macro Featuring Winograd-Domain Convolution and Macro-Level Parallel Dual-Side Sparsity
AU - Wu, Hao
AU - Chen, Yong
AU - Yuan, Yiyang
AU - Yue, Jinshan
AU - Wang, Xinghua
AU - Li, Xiaoran
AU - Zhang, Feng
N1 - Publisher Copyright:
© 1966-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - Recently, computing in memory (CiM) has been proven to be an energy-efficient and promising architecture for artificial intelligence (AI) algorithms. And yet, current CiM schemes generally suffer from limited throughput compared to their digital counterparts, and the key reason is that the CiM macro calculation must iterate through multiple cycles. Thus, the need to reduce the calculation cycle of the macro while keeping high energy efficiency and the necessity of developing acceleration methods for the universal CiM-based processor have become major issues faced by the current CiM architectures. To surmount these critical problems, we propose a processor based on a two-cycle CiM macro. Our work makes three main contributions: 1) we present a Radix16-based digital-CiM macro with look-up table (LUT) optimization to reduce dynamic power consumption; 2) we devise a hybrid Winograd microarchitecture and dataflow that supports (2, 3) and (4, 3) Winograd convolution, meaning that a good compromise can be reached between the accuracy of the algorithm and the reduction in workload; and 3) we propose a macrolevel parallel dual-side sparse CiM core that uses a horizontal direction compression method to reduce the input cycle of activation data and improve the mapping efficiency of the weight data in the macros. A prototype of the processor is fabricated in a 28-nm CMOS, which achieves a peak system energy efficiency of 19.9-258.5-TOPS/W for a voltage supply of 0.6-1.1 V, and an operating frequency of 78-287 MHz, a 2.55-7.08× higher than other state-of-the-art CiM processors.
AB - Recently, computing in memory (CiM) has been proven to be an energy-efficient and promising architecture for artificial intelligence (AI) algorithms. And yet, current CiM schemes generally suffer from limited throughput compared to their digital counterparts, and the key reason is that the CiM macro calculation must iterate through multiple cycles. Thus, the need to reduce the calculation cycle of the macro while keeping high energy efficiency and the necessity of developing acceleration methods for the universal CiM-based processor have become major issues faced by the current CiM architectures. To surmount these critical problems, we propose a processor based on a two-cycle CiM macro. Our work makes three main contributions: 1) we present a Radix16-based digital-CiM macro with look-up table (LUT) optimization to reduce dynamic power consumption; 2) we devise a hybrid Winograd microarchitecture and dataflow that supports (2, 3) and (4, 3) Winograd convolution, meaning that a good compromise can be reached between the accuracy of the algorithm and the reduction in workload; and 3) we propose a macrolevel parallel dual-side sparse CiM core that uses a horizontal direction compression method to reduce the input cycle of activation data and improve the mapping efficiency of the weight data in the macros. A prototype of the processor is fabricated in a 28-nm CMOS, which achieves a peak system energy efficiency of 19.9-258.5-TOPS/W for a voltage supply of 0.6-1.1 V, and an operating frequency of 78-287 MHz, a 2.55-7.08× higher than other state-of-the-art CiM processors.
KW - Artificial intelligence (AI)
KW - CMOS
KW - Radix16
KW - Winograd convolution
KW - computing-in-memory (CiM)
KW - energy efficiency
KW - look-up table (LUT)
KW - multiply-accumulation (MAC)
KW - neural network (NN)
KW - unstructured sparsity
UR - http://www.scopus.com/inward/record.url?scp=85196516021&partnerID=8YFLogxK
U2 - 10.1109/JSSC.2024.3409356
DO - 10.1109/JSSC.2024.3409356
M3 - Article
AN - SCOPUS:85196516021
SN - 0018-9200
VL - 60
SP - 347
EP - 361
JO - IEEE Journal of Solid-State Circuits
JF - IEEE Journal of Solid-State Circuits
IS - 1
ER -