Skip to main navigation Skip to search Skip to main content

A 28-nm Digital Transpose SRAM Compute-in-Memory Macro With Accurate/Approximate Dual Mode for Floating-Point Edge Training and Inference

  • Yiyang Yuan
  • , Bingxin Zhang
  • , Yiming Yang
  • , Yishan Luo
  • , Qirui Chen
  • , Haitao Wang
  • , Qihao Liu
  • , Zhiming Chen
  • , Hao Wu
  • , Jinshan Yue
  • , Shidong Lv
  • , Xinghua Wang*
  • , Pui In Mak
  • , Xiaoran Li*
  • , Feng Zhang*
  • *Corresponding author for this work
  • CAS - Institute of Microelectronics
  • Beijing Institute of Technology
  • Shandong SinoChip Semiconductors Company Ltd.
  • University of Macau

Research output: Contribution to journalArticlepeer-review

Abstract

Static random-access memory (SRAM)-based computing-in-memory (CIM) macros have been widely studied to improve the energy efficiency of edge artificial intelligence (AI) inference tasks. However, less attention has been given to AI training, which requires CIM macros to not only perform matrix multiply-accumulate (MAC) operations but also support matrix transposition. To address the limitations of previous analog transpose and digital non-transpose SRAM CIM macros, this work features: 1) a cyclic-weight-mapping SRAM array that enables matrix transposition and reuse of MAC circuits during both feed-forward (FF) and back-propagation (BP) phases; 2) a digital CIM architecture employing signed fixed-point mantissa encode and a vector-wise pre-alignment (VWPA) scheme, supporting multiple data formats including INT4/8, FP8, and BF16; and 3) an accurate/approximate dual-mode bit-parallel MAC circuit (DMBP-MAC) designed to provide a tradeoff between computational accuracy and energy efficiency. A fabricated 28-nm 32-kB transpose SRAM CIM macro achieved average energy efficiency of 70.2–285.4 TOPS/W in INT4, 17.5–71.4 TOPS/W in INT8, 51.1–192.3 TFLOPS/W in FP8, and 12.8–48 TFLOPS/W in BF16.

Original languageEnglish
JournalIEEE Journal of Solid-State Circuits
DOIs
Publication statusAccepted/In press - 2026

Keywords

  • Approximate computing
  • artificial intelligence (AI)
  • computing-in-memory (CIM)
  • floating point (FP)
  • static random-access memory (SRAM)

Fingerprint

Dive into the research topics of 'A 28-nm Digital Transpose SRAM Compute-in-Memory Macro With Accurate/Approximate Dual Mode for Floating-Point Edge Training and Inference'. Together they form a unique fingerprint.

Cite this