ArXrCiM：特定应用谐振式SRAM内存计算的架构探索

IF 2.8 2区工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-11-25 DOI:10.1109/TVLSI.2024.3502359

Dhandeep Challagundla;Ignatius Bezzam;Riadul Islam

{"title":"ArXrCiM：特定应用谐振式SRAM内存计算的架构探索","authors":"Dhandeep Challagundla;Ignatius Bezzam;Riadul Islam","doi":"10.1109/TVLSI.2024.3502359","DOIUrl":null,"url":null,"abstract":"While general-purpose computing follows von Neumann’s architecture, the data movement between memory and processor elements dictates the processor’s performance. The evolving compute-in-memory (CiM) paradigm tackles this issue by facilitating simultaneous processing and storage within static random-access memory (SRAM) elements. Numerous design decisions taken at different levels of hierarchy affect the figures of merit (FoMs) of SRAM, such as power, performance, area, and yield. The absence of a rapid assessment mechanism for the impact of changes at different hierarchy levels on global FoMs poses a challenge to accurately evaluating innovative SRAM designs. This article presents an automation tool designed to optimize the energy and latency of SRAM designs incorporating diverse implementation strategies for executing logic operations within the SRAM. The tool structure allows easy comparison across different array topologies and various design strategies to result in energy-efficient implementations. Our study involves a comprehensive comparison of over 6900+ distinct design implementation strategies for École Polytechnique Fédérale de Lausanne (EPFL) combinational benchmark circuits on the energy-recycling resonant CiM (rCiM) architecture designed using Taiwan Semiconductor Manufacturing Company (TSMC) 28-nm technology. When provided with a combinational circuit, the tool aims to generate an energy-efficient implementation strategy tailored to the specified input memory and latency constraints. The tool reduces 80.9% of energy consumption on average across all benchmarks while using the six-topology implementation compared with the baseline implementation of single-macro topology by considering the parallel processing capability of rCiM cache size ranging from 4 to 192 kB.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 1","pages":"179-192"},"PeriodicalIF":2.8000,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ArXrCiM: Architectural Exploration of Application-Specific Resonant SRAM Compute-in-Memory\",\"authors\":\"Dhandeep Challagundla;Ignatius Bezzam;Riadul Islam\",\"doi\":\"10.1109/TVLSI.2024.3502359\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"While general-purpose computing follows von Neumann’s architecture, the data movement between memory and processor elements dictates the processor’s performance. The evolving compute-in-memory (CiM) paradigm tackles this issue by facilitating simultaneous processing and storage within static random-access memory (SRAM) elements. Numerous design decisions taken at different levels of hierarchy affect the figures of merit (FoMs) of SRAM, such as power, performance, area, and yield. The absence of a rapid assessment mechanism for the impact of changes at different hierarchy levels on global FoMs poses a challenge to accurately evaluating innovative SRAM designs. This article presents an automation tool designed to optimize the energy and latency of SRAM designs incorporating diverse implementation strategies for executing logic operations within the SRAM. The tool structure allows easy comparison across different array topologies and various design strategies to result in energy-efficient implementations. Our study involves a comprehensive comparison of over 6900+ distinct design implementation strategies for École Polytechnique Fédérale de Lausanne (EPFL) combinational benchmark circuits on the energy-recycling resonant CiM (rCiM) architecture designed using Taiwan Semiconductor Manufacturing Company (TSMC) 28-nm technology. When provided with a combinational circuit, the tool aims to generate an energy-efficient implementation strategy tailored to the specified input memory and latency constraints. The tool reduces 80.9% of energy consumption on average across all benchmarks while using the six-topology implementation compared with the baseline implementation of single-macro topology by considering the parallel processing capability of rCiM cache size ranging from 4 to 192 kB.\",\"PeriodicalId\":13425,\"journal\":{\"name\":\"IEEE Transactions on Very Large Scale Integration (VLSI) Systems\",\"volume\":\"33 1\",\"pages\":\"179-192\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2024-11-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Very Large Scale Integration (VLSI) Systems\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10767429/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10767429/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

虽然通用计算遵循冯·诺伊曼的架构，但内存和处理器元素之间的数据移动决定了处理器的性能。不断发展的内存计算（CiM）范例通过促进静态随机存取存储器（SRAM）元素内的同步处理和存储来解决这个问题。在不同层次上所做的许多设计决策会影响SRAM的性能指标（FoMs），如功率、性能、面积和良率。缺乏一种快速评估机制来评估不同层次变化对全球FoMs的影响，这对准确评估创新SRAM设计提出了挑战。本文提出了一种自动化工具，旨在优化SRAM设计的能量和延迟，该设计结合了在SRAM内执行逻辑操作的多种实现策略。工具结构允许轻松比较不同的阵列拓扑和各种设计策略，从而实现节能。我们的研究涉及对使用台湾半导体制造公司（TSMC） 28纳米技术设计的能量回收谐振CiM （rCiM）架构上的École Polytechnique fsamdsamrale de Lausanne （EPFL）组合基准电路的6900多种不同设计实现策略的全面比较。当提供组合电路时，该工具旨在生成针对指定输入存储器和延迟限制的节能实现策略。通过考虑rCiM缓存大小范围从4到192 kB的并行处理能力，与单宏拓扑的基线实现相比，该工具在使用六拓扑实现时，在所有基准测试中平均减少了80.9%的能耗。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

ArXrCiM: Architectural Exploration of Application-Specific Resonant SRAM Compute-in-Memory

While general-purpose computing follows von Neumann’s architecture, the data movement between memory and processor elements dictates the processor’s performance. The evolving compute-in-memory (CiM) paradigm tackles this issue by facilitating simultaneous processing and storage within static random-access memory (SRAM) elements. Numerous design decisions taken at different levels of hierarchy affect the figures of merit (FoMs) of SRAM, such as power, performance, area, and yield. The absence of a rapid assessment mechanism for the impact of changes at different hierarchy levels on global FoMs poses a challenge to accurately evaluating innovative SRAM designs. This article presents an automation tool designed to optimize the energy and latency of SRAM designs incorporating diverse implementation strategies for executing logic operations within the SRAM. The tool structure allows easy comparison across different array topologies and various design strategies to result in energy-efficient implementations. Our study involves a comprehensive comparison of over 6900+ distinct design implementation strategies for École Polytechnique Fédérale de Lausanne (EPFL) combinational benchmark circuits on the energy-recycling resonant CiM (rCiM) architecture designed using Taiwan Semiconductor Manufacturing Company (TSMC) 28-nm technology. When provided with a combinational circuit, the tool aims to generate an energy-efficient implementation strategy tailored to the specified input memory and latency constraints. The tool reduces 80.9% of energy consumption on average across all benchmarks while using the six-topology implementation compared with the baseline implementation of single-macro topology by considering the parallel processing capability of rCiM cache size ranging from 4 to 192 kB.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Very Large Scale Integration (VLSI) Systems 工程技术-工程：电子与电气

CiteScore

6.40

自引率

7.10%

发文量

187

审稿时长

3.6 months

期刊介绍： The IEEE Transactions on VLSI Systems is published as a monthly journal under the co-sponsorship of the IEEE Circuits and Systems Society, the IEEE Computer Society, and the IEEE Solid-State Circuits Society. Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels. To address this critical area through a common forum, the IEEE Transactions on VLSI Systems have been founded. The editorial board, consisting of international experts, invites original papers which emphasize and merit the novel systems integration aspects of microelectronic systems including interactions among systems design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and systems level qualification. Thus, the coverage of these Transactions will focus on VLSI/ULSI microelectronic systems integration.

期刊最新文献

Table of Contents IEEE Transactions on Very Large Scale Integration (VLSI) Systems Society Information IEEE Transactions on Very Large Scale Integration (VLSI) Systems Publication Information Table of Contents IEEE Transactions on Very Large Scale Integration (VLSI) Systems Society Information