Abstract
Recently, computing in memory (CiM) has been proven to be an energy-efficient and promising architecture for artificial intelligence (AI) algorithms. And yet, current CiM schemes generally suffer from limited throughput compared to their digital counterparts, and the key reason is that the CiM macro calculation must iterate through multiple cycles. Thus, the need to reduce the calculation cycle of the macro while keeping high energy efficiency and the necessity of developing acceleration methods for the universal CiM-based processor have become major issues faced by the current CiM architectures. To surmount these critical problems, we propose a processor based on a two-cycle CiM macro. Our work makes three main contributions: 1) we present a Radix16-based digital-CiM macro with look-up table (LUT) optimization to reduce dynamic power consumption; 2) we devise a hybrid Winograd microarchitecture and dataflow that supports (2, 3) and (4, 3) Winograd convolution, meaning that a good compromise can be reached between the accuracy of the algorithm and the reduction in workload; and 3) we propose a macrolevel parallel dual-side sparse CiM core that uses a horizontal direction compression method to reduce the input cycle of activation data and improve the mapping efficiency of the weight data in the macros. A prototype of the processor is fabricated in a 28-nm CMOS, which achieves a peak system energy efficiency of 19.9–258.5-TOPS/W for a voltage supply of 0.6–1.1 V, and an operating frequency of 78–287 MHz, a 2.55–7.08<inline-formula> <tex-math notation="LaTeX">$\times$</tex-math> </inline-formula> higher than other state-of-the-art CiM processors.
Original language | English |
---|---|
Pages (from-to) | 1-15 |
Number of pages | 15 |
Journal | IEEE Journal of Solid-State Circuits |
DOIs | |
Publication status | Accepted/In press - 2024 |
Keywords
- Accuracy
- Artificial intelligence
- Artificial intelligence (AI)
- Circuits
- CMOS
- computing-in-memory (CiM)
- Energy efficiency
- energy efficiency
- look-up table (LUT)
- multiply-accumulation (MAC)
- neural network (NN)
- Power demand
- Radix16
- Table lookup
- Throughput
- unstructured sparsity
- Winograd convolution