Abstract
Static random-access memory (SRAM)-based computing-in-memory (CIM) macros have been widely studied to improve the energy efficiency of edge artificial intelligence (AI) inference tasks. However, less attention has been given to AI training, which requires CIM macros to not only perform matrix multiply-accumulate (MAC) operations but also support matrix transposition. To address the limitations of previous analog transpose and digital non-transpose SRAM CIM macros, this work features: 1) a cyclic-weight-mapping SRAM array that enables matrix transposition and reuse of MAC circuits during both feed-forward (FF) and back-propagation (BP) phases; 2) a digital CIM architecture employing signed fixed-point mantissa encode and a vector-wise pre-alignment (VWPA) scheme, supporting multiple data formats including INT4/8, FP8, and BF16; and 3) an accurate/approximate dual-mode bit-parallel MAC circuit (DMBP-MAC) designed to provide a tradeoff between computational accuracy and energy efficiency. A fabricated 28-nm 32-kB transpose SRAM CIM macro achieved average energy efficiency of 70.2–285.4 TOPS/W in INT4, 17.5–71.4 TOPS/W in INT8, 51.1–192.3 TFLOPS/W in FP8, and 12.8–48 TFLOPS/W in BF16.
| Original language | English |
|---|---|
| Journal | IEEE Journal of Solid-State Circuits |
| DOIs | |
| Publication status | Accepted/In press - 2026 |
Keywords
- Approximate computing
- artificial intelligence (AI)
- computing-in-memory (CIM)
- floating point (FP)
- static random-access memory (SRAM)
Fingerprint
Dive into the research topics of 'A 28-nm Digital Transpose SRAM Compute-in-Memory Macro With Accurate/Approximate Dual Mode for Floating-Point Edge Training and Inference'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver