优化 NVM 内存计算架构中的解码方案和应用映射的软硬件协同设计

IF 2.9 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI:10.1109/TCAD.2024.3447216

Shanmukha Mangadahalli Siddaramu;Ali Nezhadi;Mahta Mayahinia;Seyedehmaryam Ghasemi;Mehdi B. Tahoori

{"title":"优化 NVM 内存计算架构中的解码方案和应用映射的软硬件协同设计","authors":"Shanmukha Mangadahalli Siddaramu;Ali Nezhadi;Mahta Mayahinia;Seyedehmaryam Ghasemi;Mehdi B. Tahoori","doi":"10.1109/TCAD.2024.3447216","DOIUrl":null,"url":null,"abstract":"The computation-in nonvolatile memory (NVM-CiM) approach addresses the growing computational demands and the memory-wall problem faced by traditional processor-centric architectures. Computation-in-memory (CiM) capitalizes on the parallel nature of memory arrays enabling effective computation through multirow memristor reading and sensing. In this context, the conventional design of memory decoders needs to be accordingly modified for efficient multirow activation and parallel data processing. This article presents the design and optimization of address decoders for NVM-CiM system architectures, employing a cross-layer co-optimization approach that integrates circuit and architecture design with application requirements. Our methodology starts at the circuit level, examining various decoder designs, including cascaded, hierarchical, latched, and hybrid models. An in-depth application-level characterization follows, utilizing an extended NVM-CiM-capable gem5 simulator to assess the impact of these decoders on the mapping of CiM-friendly applications and the resulting system performance, particularly in facilitating rapid and efficient activation of multirow memory configurations. This holistic analysis allows us to identify the bottlenecks and requirements from the application side and adjust the design of the decoder accordingly. Our analysis reveals that Hybrid Decoders significantly decrease latency and power consumption compared to other decoder designs within NVM-CiM systems. This highlights the crucial role of the decoder’s row selection flexibility, reducing additional system-level data movement even at the expense of its performance, can substantially improve the overall efficiency of NVM-CiM systems.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3744-3755"},"PeriodicalIF":2.9000,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Hardware and Software Co-Design for Optimized Decoding Schemes and Application Mapping in NVM Compute-in-Memory Architectures\",\"authors\":\"Shanmukha Mangadahalli Siddaramu;Ali Nezhadi;Mahta Mayahinia;Seyedehmaryam Ghasemi;Mehdi B. Tahoori\",\"doi\":\"10.1109/TCAD.2024.3447216\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The computation-in nonvolatile memory (NVM-CiM) approach addresses the growing computational demands and the memory-wall problem faced by traditional processor-centric architectures. Computation-in-memory (CiM) capitalizes on the parallel nature of memory arrays enabling effective computation through multirow memristor reading and sensing. In this context, the conventional design of memory decoders needs to be accordingly modified for efficient multirow activation and parallel data processing. This article presents the design and optimization of address decoders for NVM-CiM system architectures, employing a cross-layer co-optimization approach that integrates circuit and architecture design with application requirements. Our methodology starts at the circuit level, examining various decoder designs, including cascaded, hierarchical, latched, and hybrid models. An in-depth application-level characterization follows, utilizing an extended NVM-CiM-capable gem5 simulator to assess the impact of these decoders on the mapping of CiM-friendly applications and the resulting system performance, particularly in facilitating rapid and efficient activation of multirow memory configurations. This holistic analysis allows us to identify the bottlenecks and requirements from the application side and adjust the design of the decoder accordingly. Our analysis reveals that Hybrid Decoders significantly decrease latency and power consumption compared to other decoder designs within NVM-CiM systems. This highlights the crucial role of the decoder’s row selection flexibility, reducing additional system-level data movement even at the expense of its performance, can substantially improve the overall efficiency of NVM-CiM systems.\",\"PeriodicalId\":13251,\"journal\":{\"name\":\"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems\",\"volume\":\"43 11\",\"pages\":\"3744-3755\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2024-11-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10745789/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10745789/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

非易失性存储器中的计算（NVM-CiM）方法解决了日益增长的计算需求以及传统的以处理器为中心的架构所面临的内存墙问题。内存中计算（CiM）利用了内存阵列的并行特性，通过多行忆阻器读取和感应实现有效计算。在这种情况下，需要对传统的内存解码器设计进行相应的修改，以实现高效的多行激活和并行数据处理。本文介绍了针对 NVM-CiM 系统架构的地址解码器的设计和优化，采用了一种跨层协同优化方法，将电路和架构设计与应用需求相结合。我们的方法从电路层面入手，研究各种解码器设计，包括级联、分层、锁存和混合模型。随后是深入的应用级特性分析，利用支持 NVM-CiM 的扩展 gem5 仿真器来评估这些解码器对 CiM 友好型应用映射的影响以及由此产生的系统性能，特别是在促进多行存储器配置的快速高效激活方面。这种整体分析使我们能够从应用方面找出瓶颈和要求，并相应地调整解码器的设计。我们的分析表明，与 NVM-CiM 系统中的其他解码器设计相比，混合解码器能显著降低延迟和功耗。这凸显了解码器行选择灵活性的关键作用，减少额外的系统级数据移动，即使牺牲其性能，也能大幅提高 NVM-CiM 系统的整体效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Hardware and Software Co-Design for Optimized Decoding Schemes and Application Mapping in NVM Compute-in-Memory Architectures

The computation-in nonvolatile memory (NVM-CiM) approach addresses the growing computational demands and the memory-wall problem faced by traditional processor-centric architectures. Computation-in-memory (CiM) capitalizes on the parallel nature of memory arrays enabling effective computation through multirow memristor reading and sensing. In this context, the conventional design of memory decoders needs to be accordingly modified for efficient multirow activation and parallel data processing. This article presents the design and optimization of address decoders for NVM-CiM system architectures, employing a cross-layer co-optimization approach that integrates circuit and architecture design with application requirements. Our methodology starts at the circuit level, examining various decoder designs, including cascaded, hierarchical, latched, and hybrid models. An in-depth application-level characterization follows, utilizing an extended NVM-CiM-capable gem5 simulator to assess the impact of these decoders on the mapping of CiM-friendly applications and the resulting system performance, particularly in facilitating rapid and efficient activation of multirow memory configurations. This holistic analysis allows us to identify the bottlenecks and requirements from the application side and adjust the design of the decoder accordingly. Our analysis reveals that Hybrid Decoders significantly decrease latency and power consumption compared to other decoder designs within NVM-CiM systems. This highlights the crucial role of the decoder’s row selection flexibility, reducing additional system-level data movement even at the expense of its performance, can substantially improve the overall efficiency of NVM-CiM systems.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 工程技术-工程：电子与电气

CiteScore

5.60

自引率

13.80%

发文量

500

审稿时长

7 months

期刊介绍： The purpose of this Transactions is to publish papers of interest to individuals in the area of computer-aided design of integrated circuits and systems composed of analog, digital, mixed-signal, optical, or microwave components. The aids include methods, models, algorithms, and man-machine interfaces for system-level, physical and logical design including: planning, synthesis, partitioning, modeling, simulation, layout, verification, testing, hardware-software co-design and documentation of integrated circuit and system designs of all complexities. Design tools and techniques for evaluating and designing integrated circuits and systems for metrics such as performance, power, reliability, testability, and security are a focus.