{"title":"A 65 nm General-Purpose Compute-in-Memory Processor Supporting Both General Programming and Deep Learning Tasks","authors":"Yuhao Ju;Yijie Wei;Jie Gu","doi":"10.1109/JSSC.2024.3453114","DOIUrl":null,"url":null,"abstract":"This work presents a special unified compute-in-memory (CIM) processor supporting both general-purpose computing and deep neural network (DNN) operations, referred to as the general-purpose CIM (GPCIM) processor. By implementing a unique CIM macro with two different bitcell arrays and a central compute unit (CCU), GPCIM can be reconfigured to a CIM DNN accelerator or a CIM vector central processing unit (CPU). By using special reconfigurability, dataflow, and support of a customized vector instruction set, GPCIM achieves SOTA performance for end-to-end deep learning tasks with enhanced CPU efficiency and data locality. A 65 nm test chip was fabricated demonstrating a 28.3 TOPS/W DNN macro efficiency and a best-in-class peak CPU efficiency of 802 GOPS/W. Benefit from a data locality flow, 37%–55% end-to-end latency reduction on artificial intelligence (AI)-related applications is achieved by eliminating inter-core data transfer in traditional heterogeneous system-on-chip (SoC). An averaged <inline-formula> <tex-math>$17.8{\\times }$ </tex-math></inline-formula> CPU energy efficiency improvement is achieved compared with vector RISC-V CPUs in the existing machine learning (ML) SoCs.","PeriodicalId":13129,"journal":{"name":"IEEE Journal of Solid-state Circuits","volume":"60 4","pages":"1500-1511"},"PeriodicalIF":5.6000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Solid-state Circuits","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10679157/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
This work presents a special unified compute-in-memory (CIM) processor supporting both general-purpose computing and deep neural network (DNN) operations, referred to as the general-purpose CIM (GPCIM) processor. By implementing a unique CIM macro with two different bitcell arrays and a central compute unit (CCU), GPCIM can be reconfigured to a CIM DNN accelerator or a CIM vector central processing unit (CPU). By using special reconfigurability, dataflow, and support of a customized vector instruction set, GPCIM achieves SOTA performance for end-to-end deep learning tasks with enhanced CPU efficiency and data locality. A 65 nm test chip was fabricated demonstrating a 28.3 TOPS/W DNN macro efficiency and a best-in-class peak CPU efficiency of 802 GOPS/W. Benefit from a data locality flow, 37%–55% end-to-end latency reduction on artificial intelligence (AI)-related applications is achieved by eliminating inter-core data transfer in traditional heterogeneous system-on-chip (SoC). An averaged $17.8{\times }$ CPU energy efficiency improvement is achieved compared with vector RISC-V CPUs in the existing machine learning (ML) SoCs.
期刊介绍:
The IEEE Journal of Solid-State Circuits publishes papers each month in the broad area of solid-state circuits with particular emphasis on transistor-level design of integrated circuits. It also provides coverage of topics such as circuits modeling, technology, systems design, layout, and testing that relate directly to IC design. Integrated circuits and VLSI are of principal interest; material related to discrete circuit design is seldom published. Experimental verification is strongly encouraged.