A Multicore Programmable Variable-Precision Near-Memory Accelerator for CNN and Transformer Models

  • Yiming Yang
  • , Yiyang Yuan
  • , Xinghua Wang
  • , Xiaoran Li*
  • , Hao Wu
  • , Qihao Liu
  • , Weiye Tang
  • , Xiangqu Fu
  • , Feng Zhang*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Convolutional neural network (CNN) and transformer are the most popular neural network models in computer vision (CV) and natural language processing (NLP). It is quite common to use both these two models in multimodal scenarios, such as text-to-image generation. However, these two models have very different memory mappings, dataflows and mathematical operators, making it difficult to accelerate both types of models simultaneously. To address the forementioned challenges, we propose a multi-core programmable near-memory accelerator and introduce an arbitration-free multi-port static random-access memory (SRAM) array to improve storage utilization while maintaining flexibility. To achieve performance comparable to computing-in-memory (CIM) designs, we use near-memory variable-precision multiplier-accumulators (NVMACs) to perform multiply-accumulate (MAC) operations tightly close to the memory to maximize the memory access throughput and support the mixed-precision neural network inference. We use a fine-grained instruction set architecture (ISA) to support software sparsity and reduce overhead caused by coarse-grained non-MAC operations with low utilization. A chip is fabricated in a 28 nm process and achieves 6.3-to-101.4TOPS/W energy efficiency for transformer model and 7.3-to-194.6 TOPS/W for CNN model, 1.2× to 4.2× compared with other state-of-the-art designs, while efficiently supporting both CNN and transformer overloads.

Original languageEnglish
JournalIEEE Journal of Solid-State Circuits
DOIs
Publication statusAccepted/In press - 2025
Externally publishedYes

Keywords

  • Arbitration-free multi-port static random-access memory (SRAM) array
  • compute-in-memory process-near-memory (PNM)
  • convolutional neural network (CNN)
  • instruction set architecture (ISA)
  • near-memory variable-precision multiplier-accumulator (NVMAC)
  • transformer

Fingerprint

Dive into the research topics of 'A Multicore Programmable Variable-Precision Near-Memory Accelerator for CNN and Transformer Models'. Together they form a unique fingerprint.

Cite this