{"title":"CRPIM:基于 ReRAM 的 \"内存处理 DNN \"加速器的高效计算重复使用方案","authors":"Shihao Hong , Yeh-Ching Chung","doi":"10.1016/j.sysarc.2024.103192","DOIUrl":null,"url":null,"abstract":"<div><p>Resistive random access memory (ReRAM) is a promising technology for AI Processing-in-Memory (PIM) hardware because of its compatibility with CMOS, small footprint, and ability to complete matrix–vector multiplication workloads inside the memory device itself. However, redundant computations are brought on by duplicate weights and inputs when an MVM has to be split into smaller-granularity sequential sub-works in the real world. Recent studies have proposed repetition-pruning to address this issue, but the buffer allocation strategy for enhancing buffer device utilization remains understudied. In preliminary experiments observing input patterns of neural layers with different datasets, the similarity of repetition allows us to transfer the buffer allocation strategy obtained from a small dataset to the computation with a large dataset. Hence, this paper proposes a practical compute-reuse mechanism for ReRAM-based PIM, called CRPIM, which replaces repetitive computations with buffering and reading. Moreover, the subsequent buffer allocation problem is resolved at both inter-layer and intra-layer levels. Our experimental results demonstrate that CRPIM significantly reduces ReRAM cells and execution time while maintaining adequate buffer and energy overhead.</p></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"153 ","pages":"Article 103192"},"PeriodicalIF":3.7000,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CRPIM: An efficient compute-reuse scheme for ReRAM-based Processing-in-Memory DNN accelerators\",\"authors\":\"Shihao Hong , Yeh-Ching Chung\",\"doi\":\"10.1016/j.sysarc.2024.103192\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Resistive random access memory (ReRAM) is a promising technology for AI Processing-in-Memory (PIM) hardware because of its compatibility with CMOS, small footprint, and ability to complete matrix–vector multiplication workloads inside the memory device itself. However, redundant computations are brought on by duplicate weights and inputs when an MVM has to be split into smaller-granularity sequential sub-works in the real world. Recent studies have proposed repetition-pruning to address this issue, but the buffer allocation strategy for enhancing buffer device utilization remains understudied. In preliminary experiments observing input patterns of neural layers with different datasets, the similarity of repetition allows us to transfer the buffer allocation strategy obtained from a small dataset to the computation with a large dataset. Hence, this paper proposes a practical compute-reuse mechanism for ReRAM-based PIM, called CRPIM, which replaces repetitive computations with buffering and reading. Moreover, the subsequent buffer allocation problem is resolved at both inter-layer and intra-layer levels. Our experimental results demonstrate that CRPIM significantly reduces ReRAM cells and execution time while maintaining adequate buffer and energy overhead.</p></div>\",\"PeriodicalId\":50027,\"journal\":{\"name\":\"Journal of Systems Architecture\",\"volume\":\"153 \",\"pages\":\"Article 103192\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2024-05-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Systems Architecture\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1383762124001292\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems Architecture","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1383762124001292","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
CRPIM: An efficient compute-reuse scheme for ReRAM-based Processing-in-Memory DNN accelerators
Resistive random access memory (ReRAM) is a promising technology for AI Processing-in-Memory (PIM) hardware because of its compatibility with CMOS, small footprint, and ability to complete matrix–vector multiplication workloads inside the memory device itself. However, redundant computations are brought on by duplicate weights and inputs when an MVM has to be split into smaller-granularity sequential sub-works in the real world. Recent studies have proposed repetition-pruning to address this issue, but the buffer allocation strategy for enhancing buffer device utilization remains understudied. In preliminary experiments observing input patterns of neural layers with different datasets, the similarity of repetition allows us to transfer the buffer allocation strategy obtained from a small dataset to the computation with a large dataset. Hence, this paper proposes a practical compute-reuse mechanism for ReRAM-based PIM, called CRPIM, which replaces repetitive computations with buffering and reading. Moreover, the subsequent buffer allocation problem is resolved at both inter-layer and intra-layer levels. Our experimental results demonstrate that CRPIM significantly reduces ReRAM cells and execution time while maintaining adequate buffer and energy overhead.
期刊介绍:
The Journal of Systems Architecture: Embedded Software Design (JSA) is a journal covering all design and architectural aspects related to embedded systems and software. It ranges from the microarchitecture level via the system software level up to the application-specific architecture level. Aspects such as real-time systems, operating systems, FPGA programming, programming languages, communications (limited to analysis and the software stack), mobile systems, parallel and distributed architectures as well as additional subjects in the computer and system architecture area will fall within the scope of this journal. Technology will not be a main focus, but its use and relevance to particular designs will be. Case studies are welcome but must contribute more than just a design for a particular piece of software.
Design automation of such systems including methodologies, techniques and tools for their design as well as novel designs of software components fall within the scope of this journal. Novel applications that use embedded systems are also central in this journal. While hardware is not a part of this journal hardware/software co-design methods that consider interplay between software and hardware components with and emphasis on software are also relevant here.