Scaling on-chip memory capacity is one of the primary approaches to mitigate memory wall bottlenecks. Various 2.5-D/3-D integration schemes, leveraging novel partitioning, are being actively explored to improve system performance. However, fine-grained functional partitioning of memory macros is not widely reported. As static RAM (SRAM) scaling stagnates with emerging CMOS logic roadmap, we propose a partitioning of low-level (faster access) caches in 3-D using an array under CMOS (AuC) technology paradigm. Our study focuses on partitioning and optimization of SRAM bit-cells and peripheral circuits, enabling heterogeneous integration, achieving up to 12% higher operating frequency with 50% leakage power reduction in the memory macros. Applied on a 64-bit mobile system-on-chip (SoC) CPU core, we achieve up to 60% higher energy efficiency compared with 2-D baseline and 14% increase in operating frequency compared with standard memory-on-logic 3-D partitioning scheme.
{"title":"Toward Fine-Grained Partitioning of Low-Level SRAM Caches for Emerging 3D-IC Designs","authors":"Sudipta Das;Bhawana Kumari;Siva Satyendra Sahoo;Yukai Chen;James Myers;Dragomir Milojevic;Dwaipayan Biswas;Julien Ryckaert","doi":"10.1109/JXCDC.2024.3468386","DOIUrl":"https://doi.org/10.1109/JXCDC.2024.3468386","url":null,"abstract":"Scaling on-chip memory capacity is one of the primary approaches to mitigate memory wall bottlenecks. Various 2.5-D/3-D integration schemes, leveraging novel partitioning, are being actively explored to improve system performance. However, fine-grained functional partitioning of memory macros is not widely reported. As static RAM (SRAM) scaling stagnates with emerging CMOS logic roadmap, we propose a partitioning of low-level (faster access) caches in 3-D using an array under CMOS (AuC) technology paradigm. Our study focuses on partitioning and optimization of SRAM bit-cells and peripheral circuits, enabling heterogeneous integration, achieving up to 12% higher operating frequency with 50% leakage power reduction in the memory macros. Applied on a 64-bit mobile system-on-chip (SoC) CPU core, we achieve up to 60% higher energy efficiency compared with 2-D baseline and 14% increase in operating frequency compared with standard memory-on-logic 3-D partitioning scheme.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"10 ","pages":"67-74"},"PeriodicalIF":2.0,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10695147","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142440855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-16DOI: 10.1109/JXCDC.2024.3461471
Harrison Liew;Farhana Sheikh;Jong-Ru Guo;Zuoguo Wu;Borivoje Nikolić
A 3-D heterogeneous integration (3-D-HI) is poised to enable a new era of high-performance integrated circuits via a multitude of benefits, including a reduction in I/O power consumption and ability to tightly couple disparate technologies. However, a significant hurdle toward enabling a chiplet ecosystem is the standardization of 3-D die-to-die (D2D) interconnects that facilitate rapid integration. Technology-driven constraints highlighted in published works demonstrate that a unique approach to 3-D D2D interconnect design and implementation is required, while preserving the ability to customize the interconnect to accommodate future technology concerns and applications with minimal overhead. This article presents a framework to generate customized 3-D D2D interconnect physical layers (PHYs) that are simultaneously standard-compliant, physical-aware, and can be automatically integrated into all stacked chiplets. The generator framework leverages the Chisel hardware description language to allow designers to do the following: 1) compile a port list directly into a PHY; 2) automate design and physical design (PD); and 3) perform design space exploration of interconnect features (e.g., bump map pitch, clocking architecture, and others). The 3-D PHY generator framework and features detailed in this work can be used to produce a reference implementation for a standard like UCIe-3-D, representing a significant paradigm shift from current specification and design methodologies for 2.5-D D2D interconnect (e.g., UCIe) implementations. This work concludes with the results of a redundancy design space exploration tradeoff study, showing the benefits of a proposed spatial coding redundancy scheme in an example PHY using emulated 9- $mu $