Pub Date : 2025-02-05DOI: 10.1109/JXCDC.2025.3539470
Jianzi Jin;Shifan Gao;Cimang Lu;Xiang Qiu;Yi Zhao
A voltage sensing compute-in-memory (CIM) architecture has been designed to improve the analog computing accuracy, and a chip on 90-nm flash platform has been successfully fabricated, with the bidirectional operation enabled by the symmetric bitcell structure. By padding the weight sum to a global value for all bit lines (BLs), the costly multiplication postprocessing can be efficiently performed with the analog operation inside the array. The BL-differential voltage output scheme has two unique invariances. First, the so-called scaling invariance allows the weight matrix to be scaled to the full range for every BL. Second, the shifting invariance allows the weight to be tuned to a larger conductance with a better I–V linearity. Combined with the distributed padding, input voltage loss can also be reduced by suppressing the IR drop. The above schemes can significantly improve the linearity and reduce the relative weight error by >50%, as confirmed in applications from MNIST to face recognition, making it a promising solution for advanced artificial intelligence (AI) and memory computing applications.
{"title":"Device Nonideality-Aware Compute-in-Memory Array Architecting: Direct Voltage Sensing, I–V Symmetric Bitcell, and Padding Array","authors":"Jianzi Jin;Shifan Gao;Cimang Lu;Xiang Qiu;Yi Zhao","doi":"10.1109/JXCDC.2025.3539470","DOIUrl":"https://doi.org/10.1109/JXCDC.2025.3539470","url":null,"abstract":"A voltage sensing compute-in-memory (CIM) architecture has been designed to improve the analog computing accuracy, and a chip on 90-nm flash platform has been successfully fabricated, with the bidirectional operation enabled by the symmetric bitcell structure. By padding the weight sum to a global value for all bit lines (BLs), the costly multiplication postprocessing can be efficiently performed with the analog operation inside the array. The BL-differential voltage output scheme has two unique invariances. First, the so-called scaling invariance allows the weight matrix to be scaled to the full range for every BL. Second, the shifting invariance allows the weight to be tuned to a larger conductance with a better I–V linearity. Combined with the distributed padding, input voltage loss can also be reduced by suppressing the IR drop. The above schemes can significantly improve the linearity and reduce the relative weight error by >50%, as confirmed in applications from MNIST to face recognition, making it a promising solution for advanced artificial intelligence (AI) and memory computing applications.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"11 ","pages":"19-24"},"PeriodicalIF":2.0,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10876176","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143553318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-27DOI: 10.1109/JXCDC.2025.3534560
Madison Manley;James Read;Ankit Kaul;Shimeng Yu;Muhannad Bakir
Three-dimensional heterogeneous integration (3D-HI) offers promising solutions for incorporating substantial embedded memory into cutting-edge analog compute-in-memory (CIM) AI accelerators, addressing the need for on-chip acceleration of large AI models. However, this approach faces challenges with power supply noise (PSN) margins due to $V_{text {DD}}$ scaling and increased power delivery network (PDN) impedance. This study demonstrates the necessity and benefits of 3D-HI for large-scale CIM accelerators, where 2-D implementations would exceed manufacturing reticle limits. Our 3-D designs achieve 39% higher energy efficiency, $8times $ higher operation density, and improved throughput through shorter vertical interconnects. We quantify steady-state IR-drop impacts in 3D-HI CIM architectures using a framework that combines PDN modeling, 3D-HI power, performance, area estimation, and behavioral modeling. We demonstrate that a drop in supply voltage to CIM arrays increases sensitivity to process, voltage, and temperature (PVT) noise. Using our framework, we model IR-drop and simulate its impact on the accuracy of ResNet-50 and ResNet-152 when classifying images from the ImageNet 1k dataset in the presence of injected PVT noise. We analyze the impact of through-silicon via (TSV) design and placement to optimize the IR-drop and classification accuracy. For ResNet architectures in 3-D integration, we demonstrate that peripheral TSV placement provides an optimal balance between interconnect complexity and performance, achieving IR-drop below 10% of $V_{text {DD}}$ while maintaining high classification accuracy.
{"title":"Co-Optimization of Power Delivery Network Design for 3-D Heterogeneous Integration of RRAM-Based Compute In-Memory Accelerators","authors":"Madison Manley;James Read;Ankit Kaul;Shimeng Yu;Muhannad Bakir","doi":"10.1109/JXCDC.2025.3534560","DOIUrl":"https://doi.org/10.1109/JXCDC.2025.3534560","url":null,"abstract":"Three-dimensional heterogeneous integration (3D-HI) offers promising solutions for incorporating substantial embedded memory into cutting-edge analog compute-in-memory (CIM) AI accelerators, addressing the need for on-chip acceleration of large AI models. However, this approach faces challenges with power supply noise (PSN) margins due to <inline-formula> <tex-math>$V_{text {DD}}$ </tex-math></inline-formula> scaling and increased power delivery network (PDN) impedance. This study demonstrates the necessity and benefits of 3D-HI for large-scale CIM accelerators, where 2-D implementations would exceed manufacturing reticle limits. Our 3-D designs achieve 39% higher energy efficiency, <inline-formula> <tex-math>$8times $ </tex-math></inline-formula> higher operation density, and improved throughput through shorter vertical interconnects. We quantify steady-state IR-drop impacts in 3D-HI CIM architectures using a framework that combines PDN modeling, 3D-HI power, performance, area estimation, and behavioral modeling. We demonstrate that a drop in supply voltage to CIM arrays increases sensitivity to process, voltage, and temperature (PVT) noise. Using our framework, we model IR-drop and simulate its impact on the accuracy of ResNet-50 and ResNet-152 when classifying images from the ImageNet 1k dataset in the presence of injected PVT noise. We analyze the impact of through-silicon via (TSV) design and placement to optimize the IR-drop and classification accuracy. For ResNet architectures in 3-D integration, we demonstrate that peripheral TSV placement provides an optimal balance between interconnect complexity and performance, achieving IR-drop below 10% of <inline-formula> <tex-math>$V_{text {DD}}$ </tex-math></inline-formula> while maintaining high classification accuracy.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"11 ","pages":"10-18"},"PeriodicalIF":2.0,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10854426","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143379553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-17DOI: 10.1109/JXCDC.2025.3531616
{"title":"2024 Index IEEE Journal on Exploratory Solid-State Computational Devices and Circuits Vol. 10","authors":"","doi":"10.1109/JXCDC.2025.3531616","DOIUrl":"https://doi.org/10.1109/JXCDC.2025.3531616","url":null,"abstract":"","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"10 ","pages":"187-194"},"PeriodicalIF":2.0,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10845029","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142992841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-16DOI: 10.1109/JXCDC.2024.3499819
{"title":"INFORMATION FOR AUTHORS","authors":"","doi":"10.1109/JXCDC.2024.3499819","DOIUrl":"https://doi.org/10.1109/JXCDC.2024.3499819","url":null,"abstract":"","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"10 ","pages":"C3-C3"},"PeriodicalIF":2.0,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10844024","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142992840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-16DOI: 10.1109/JXCDC.2025.3530906
Tzuping Huang;Linran Zhao;Yiming Han;Hai Li;Ian A. Young;Yaoyao Jia
The magnetoelectric spin orbit (MESO), one of the emerging spin devices, represents a promising alternative to complementary metal-oxide–semiconductor (CMOS) technology. MESO provides dual functionality: each device can perform logic operations while acting as a nonvolatile memory device. MESO also offers advantages, such as an ultralow supply voltage of 100 mV and the potential to vertically integrate with CMOS, which promises significant energy and area efficiency. These features support MESO’s suitability for improving the energy efficiency and area efficiency of computing-in-memory (CIM) circuits. To harness the advantages of MESO in large-scale complex circuit systems, this article presents the development of a MESO-based standard cell library. This library is critical to realize automated design, as it allows the implementation of all the basic CMOS functions with MESO, thereby enabling MESO-CMOS hybrid design in large-scale complex circuits. This article also introduces a highly area-efficient time-multiplexing technique to optimize the complex function inside CIM. Specifically, the multiplier and multiply-and-accumulate (MAC) circuits using the MESO-CMOS hybrid time-multiplexing technique reduce the area by 85% and 81%, respectively, compared to CMOS implementations.
{"title":"MESO-CMOS Hybrid Circuits With Time-Multiplexing Technique for Energy and Area-Efficient Computing in Memory","authors":"Tzuping Huang;Linran Zhao;Yiming Han;Hai Li;Ian A. Young;Yaoyao Jia","doi":"10.1109/JXCDC.2025.3530906","DOIUrl":"https://doi.org/10.1109/JXCDC.2025.3530906","url":null,"abstract":"The magnetoelectric spin orbit (MESO), one of the emerging spin devices, represents a promising alternative to complementary metal-oxide–semiconductor (CMOS) technology. MESO provides dual functionality: each device can perform logic operations while acting as a nonvolatile memory device. MESO also offers advantages, such as an ultralow supply voltage of 100 mV and the potential to vertically integrate with CMOS, which promises significant energy and area efficiency. These features support MESO’s suitability for improving the energy efficiency and area efficiency of computing-in-memory (CIM) circuits. To harness the advantages of MESO in large-scale complex circuit systems, this article presents the development of a MESO-based standard cell library. This library is critical to realize automated design, as it allows the implementation of all the basic CMOS functions with MESO, thereby enabling MESO-CMOS hybrid design in large-scale complex circuits. This article also introduces a highly area-efficient time-multiplexing technique to optimize the complex function inside CIM. Specifically, the multiplier and multiply-and-accumulate (MAC) circuits using the MESO-CMOS hybrid time-multiplexing technique reduce the area by 85% and 81%, respectively, compared to CMOS implementations.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"11 ","pages":"1-9"},"PeriodicalIF":2.0,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10843777","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143379466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-07DOI: 10.1109/JXCDC.2024.3518312
editorial
{"title":"Special Topic on 3-D Logic and Memory for Energy Efficient Computing","authors":"editorial","doi":"10.1109/JXCDC.2024.3518312","DOIUrl":"https://doi.org/10.1109/JXCDC.2024.3518312","url":null,"abstract":"","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"10 ","pages":"iii-iv"},"PeriodicalIF":2.0,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10832462","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142938157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this article, we introduce a novel technique called E-multiplication and accumulation (MAC) (EMAC), aimed at enhancing energy efficiency, reducing latency, and improving the accuracy of analog-based in-static random access memory (SRAM) MAC accelerators. Our approach involves a digital-to-time word-line (WL) modulation technique that encodes the WL voltage while preserving the necessary linear voltage drop for precise computations. This eliminates the need for an additional digital-to-analog converter (DAC) in the design. Furthermore, the SRAM-based logical weight encoding scheme we present reduces the reliance on capacitance-based techniques, which typically introduce area overhead in the circuit. This approach ensures consistent voltage drops for all equivalent cases [i.e., $(a { times} b) = (b times a)$