Ternary weight neural networks (TWNs), with weights quantized to three states (−1, 0, and 1), have emerged as promising solutions for resource-constrained edge artificial intelligence (AI) platforms due to their high energy efficiency with acceptable inference accuracy. Further energy savings can be achieved with TWN accelerators utilizing techniques such as compute-in-memory (CiM) and scalable technologies such as ferroelectric transistors (FeFETs). Although the standard 1T-FeFET CiM design offers high density with its compactness and multilevel storage, its CiM performance in deeply scaled technology is prone to hardware nonidealities. This requires design modifications such as 2T-FeFET bitcells, offering high CiM robustness due to their differential nature at the cost of area. In this work, we conduct a design space exploration of FeFET-based TWN-CiM solutions. By utilizing FeFETs storing 1 bit (two levels) and 1.58 ($log _{2}3$ ) bits (three levels), we design three flavors of ternary CiM arrays: 1) 1T design based on 1.58-b FeFET (1T); 2) 2T differential (2T-diff) design; and 3) 2T pull-up/pull-down (2T-PUPD) design. Additionally, to increase the computational robustness of the 1T design, we propose static-weight transformation (WT) and static-weight input transformation (WIT). We then comparatively evaluate the inference accuracy and energy–area tradeoffs of these designs. For this, we use phase-field models to capture the multidomain physics and a rigorous inference simulator accounting for hardware nonidealities. Our analysis for ResNet18 trained on the CIFAR100 dataset shows that 1.58-b 1T-bitcell with WT and WIT techniques yield significant improvement in inference accuracy (73.61%) compared to the standard 1T design (i.e., without WIT). This accuracy is comparable to the 2T-diff design (76.4%), with $1.98times $ and $1.91times $ reduction in overall area and CiM energy, respectively.
{"title":"1.58-b FeFET-Based Ternary Neural Networks: Achieving Robust Compute-In-Memory With Weight-Input Transformations","authors":"Imtiaz Ahmed;Akul Malhotra;Revanth Koduru;Sumeet Kumar Gupta","doi":"10.1109/JXCDC.2025.3621160","DOIUrl":"https://doi.org/10.1109/JXCDC.2025.3621160","url":null,"abstract":"Ternary weight neural networks (TWNs), with weights quantized to three states (−1, 0, and 1), have emerged as promising solutions for resource-constrained edge artificial intelligence (AI) platforms due to their high energy efficiency with acceptable inference accuracy. Further energy savings can be achieved with TWN accelerators utilizing techniques such as compute-in-memory (CiM) and scalable technologies such as ferroelectric transistors (FeFETs). Although the standard 1T-FeFET CiM design offers high density with its compactness and multilevel storage, its CiM performance in deeply scaled technology is prone to hardware nonidealities. This requires design modifications such as 2T-FeFET bitcells, offering high CiM robustness due to their differential nature at the cost of area. In this work, we conduct a design space exploration of FeFET-based TWN-CiM solutions. By utilizing FeFETs storing 1 bit (two levels) and 1.58 (<inline-formula> <tex-math>$log _{2}3$ </tex-math></inline-formula>) bits (three levels), we design three flavors of ternary CiM arrays: 1) 1T design based on 1.58-b FeFET (1T); 2) 2T differential (2T-diff) design; and 3) 2T pull-up/pull-down (2T-PUPD) design. Additionally, to increase the computational robustness of the 1T design, we propose static-weight transformation (WT) and static-weight input transformation (WIT). We then comparatively evaluate the inference accuracy and energy–area tradeoffs of these designs. For this, we use phase-field models to capture the multidomain physics and a rigorous inference simulator accounting for hardware nonidealities. Our analysis for ResNet18 trained on the CIFAR100 dataset shows that 1.58-b 1T-bitcell with WT and WIT techniques yield significant improvement in inference accuracy (73.61%) compared to the standard 1T design (i.e., without WIT). This accuracy is comparable to the 2T-diff design (76.4%), with <inline-formula> <tex-math>$1.98times $ </tex-math></inline-formula> and <inline-formula> <tex-math>$1.91times $ </tex-math></inline-formula> reduction in overall area and CiM energy, respectively.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"11 ","pages":"157-165"},"PeriodicalIF":2.7,"publicationDate":"2025-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11202915","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145455866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ferroelectric random access memory (FeRAM) is a promising candidate for energy-efficient nonvolatile memory, particularly for logic-in-memory and compute-in-memory (CIM) applications. Among the available cell architectures, One-Transistor–n-Capacitor (1T-nC) and two-transistor–n-capacitor (2T-nC) FeRAMs each offer distinct trade-offs in density, scalability, and reliability. In this work, we present a comparative study of these two architectures under both dimensional scaling ($XY$ /Z shrinkage) and vertical integration (increasing stacked capacitors per cell). Using technology computer-aided design (TCAD) and circuit-level simulations, we analyze how scaling impacts ferroelectric capacitance, parasitic coupling, and floating-node (FN) dynamics, which together dictate sense margin (SM) and read stability. A key mitigation strategy—floating unselected capacitors—is applied to both architectures, effectively decoupling the SM from the number of stacked capacitors and enabling tractable analysis across scaling regimes. Results show that 1T-nC suffers more from charge sharing with the bitline (BL), while 2T-nC benefits from transistor isolation and stronger low-voltage sensing at the cost of increased area. By systematically evaluating these behaviors across scaling directions, this work establishes the reliability trade-offs of 1T-nC and 2T-nC cells and provides design guidelines for high-density, vertically integrated FeRAM systems.
{"title":"Understanding Reliability Trade-Offs in 1T-nC and 2T-nC FeRAM Designs","authors":"Sadik Yasir Tauki;Rudra Biswas;Rakesh Acharya;Jiahui Duan;Rajiv Joshi;Kai Ni;Vijaykrishnan Narayanan","doi":"10.1109/JXCDC.2025.3619908","DOIUrl":"https://doi.org/10.1109/JXCDC.2025.3619908","url":null,"abstract":"Ferroelectric random access memory (FeRAM) is a promising candidate for energy-efficient nonvolatile memory, particularly for logic-in-memory and compute-in-memory (CIM) applications. Among the available cell architectures, One-Transistor–n-Capacitor (1T-nC) and two-transistor–n-capacitor (2T-nC) FeRAMs each offer distinct trade-offs in density, scalability, and reliability. In this work, we present a comparative study of these two architectures under both dimensional scaling (<inline-formula> <tex-math>$XY$ </tex-math></inline-formula>/Z shrinkage) and vertical integration (increasing stacked capacitors per cell). Using technology computer-aided design (TCAD) and circuit-level simulations, we analyze how scaling impacts ferroelectric capacitance, parasitic coupling, and floating-node (FN) dynamics, which together dictate sense margin (SM) and read stability. A key mitigation strategy—floating unselected capacitors—is applied to both architectures, effectively decoupling the SM from the number of stacked capacitors and enabling tractable analysis across scaling regimes. Results show that 1T-nC suffers more from charge sharing with the bitline (BL), while 2T-nC benefits from transistor isolation and stronger low-voltage sensing at the cost of increased area. By systematically evaluating these behaviors across scaling directions, this work establishes the reliability trade-offs of 1T-nC and 2T-nC cells and provides design guidelines for high-density, vertically integrated FeRAM systems.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"11 ","pages":"148-156"},"PeriodicalIF":2.7,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11197534","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145455822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With an increasing number of transistors per circuit, the fabrication cost and the energy consumption of each integrated circuits increase exponentially, which drives the need to reduce the number of transistors. In this study, we explore a novel design for a 16-bit digital counter that utilizes a combination of complementary metal–oxide–semiconductor (CMOS) circuits and memristors (ReRAM), thereby reducing the number of transistors and finding applications in artificial intelligence (AI) circuits. Two types of a 16-bit digital counter have been designed, one of which is a classically designed D-flip-flop (DFF) using memristors as logic gates, followed by an improved design that significantly reduces the number of components. The results of the design and simulation of 16-bit digital counters are presented with an expected counter function. The simulation is based on experimentally measured parameters of memristors and a functional model. Furthermore, in-depth analyses with respect to practical memristor results are discussed, including variations in set/reset potential, endurance and retention characteristics, post-layout effects on the proposed circuit, and the associated power consumption.
{"title":"Non-Volatile ReRAM-Based Compact Event-Triggered Counters","authors":"Moin Diwan;Shengchao Zhang;Zidu Li;Alex James;Bhaskar Choubey","doi":"10.1109/JXCDC.2025.3619415","DOIUrl":"https://doi.org/10.1109/JXCDC.2025.3619415","url":null,"abstract":"With an increasing number of transistors per circuit, the fabrication cost and the energy consumption of each integrated circuits increase exponentially, which drives the need to reduce the number of transistors. In this study, we explore a novel design for a 16-bit digital counter that utilizes a combination of complementary metal–oxide–semiconductor (CMOS) circuits and memristors (ReRAM), thereby reducing the number of transistors and finding applications in artificial intelligence (AI) circuits. Two types of a 16-bit digital counter have been designed, one of which is a classically designed D-flip-flop (DFF) using memristors as logic gates, followed by an improved design that significantly reduces the number of components. The results of the design and simulation of 16-bit digital counters are presented with an expected counter function. The simulation is based on experimentally measured parameters of memristors and a functional model. Furthermore, in-depth analyses with respect to practical memristor results are discussed, including variations in set/reset potential, endurance and retention characteristics, post-layout effects on the proposed circuit, and the associated power consumption.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"11 ","pages":"131-138"},"PeriodicalIF":2.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11196921","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145352227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-07DOI: 10.1109/JXCDC.2025.3618883
Mohammad Adnaan;Saeideh Alinezhad Chamazcoti;Emil Karimov;Marie Garcia Bardon;Francky Catthoor;Jan van Houdt;Azad Naeemi
We present a framework for design technology co-optimization (DTCO) of the main memory system with one transistor-one capacitor (1T1C) ferroelectric random access memory (FERAM) as an alternative to dynamic random access memory (DRAM). We start with the ferroelectric capacitor device model and perform array-level memory circuit simulation. Then, we map the circuit-level metrics to system-level simulators to analyze the performance enhancement of using FERAM as a main memory. We demonstrate the performance boost and power savings that can be achieved at the system level by improving individual device characteristics and modifying circuit architecture. We have estimated that on average more than 14% improvement in instruction per cycle and 21% reduction in energy consumption can be achieved by substituting DRAM with FERAM equipped with a ferroelectric capacitor having an optimal polarization switching voltage of 1.5 V.
{"title":"Benchmarking of FERAM-Based Memory System by Optimizing Ferroelectric Device Model","authors":"Mohammad Adnaan;Saeideh Alinezhad Chamazcoti;Emil Karimov;Marie Garcia Bardon;Francky Catthoor;Jan van Houdt;Azad Naeemi","doi":"10.1109/JXCDC.2025.3618883","DOIUrl":"https://doi.org/10.1109/JXCDC.2025.3618883","url":null,"abstract":"We present a framework for design technology co-optimization (DTCO) of the main memory system with one transistor-one capacitor (1T1C) ferroelectric random access memory (FERAM) as an alternative to dynamic random access memory (DRAM). We start with the ferroelectric capacitor device model and perform array-level memory circuit simulation. Then, we map the circuit-level metrics to system-level simulators to analyze the performance enhancement of using FERAM as a main memory. We demonstrate the performance boost and power savings that can be achieved at the system level by improving individual device characteristics and modifying circuit architecture. We have estimated that on average more than 14% improvement in instruction per cycle and 21% reduction in energy consumption can be achieved by substituting DRAM with FERAM equipped with a ferroelectric capacitor having an optimal polarization switching voltage of 1.5 V.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"11 ","pages":"99-106"},"PeriodicalIF":2.7,"publicationDate":"2025-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11195120","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145352266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-06DOI: 10.1109/JXCDC.2025.3616007
Jianze Wang;Wei Zhang;Xuanyao Fong
The ferroelectric field-effect transistor (FeFET) is a promising memory device technology due to desirable attributes, such as fast access times, high memory cell density, good endurance, compatibility with CMOS process, and impressive scalability. While previous research has explored the impact of process variations at the device level, their effects on circuit behavior have not been comprehensively investigated due to a lack of a framework for analyzing FeFET bit-cell failures at the circuit level, which we present in this work. We studied the process parameters, including ferroelectric (FE) layer thickness, channel length, channel width, and effective oxide thickness of an FeFET bit cell. The correlations of each failure event and the write pulse voltage and write pulsewidth are studied. Our results show that the voltage applied on the FeFET bit cell dominates the performance of the bit cell for both write and read operations.
{"title":"A Bit-Cell Failure Analysis Framework for Ferroelectric Field-Effect Transistor-Based Memories","authors":"Jianze Wang;Wei Zhang;Xuanyao Fong","doi":"10.1109/JXCDC.2025.3616007","DOIUrl":"https://doi.org/10.1109/JXCDC.2025.3616007","url":null,"abstract":"The ferroelectric field-effect transistor (FeFET) is a promising memory device technology due to desirable attributes, such as fast access times, high memory cell density, good endurance, compatibility with CMOS process, and impressive scalability. While previous research has explored the impact of process variations at the device level, their effects on circuit behavior have not been comprehensively investigated due to a lack of a framework for analyzing FeFET bit-cell failures at the circuit level, which we present in this work. We studied the process parameters, including ferroelectric (FE) layer thickness, channel length, channel width, and effective oxide thickness of an FeFET bit cell. The correlations of each failure event and the write pulse voltage and write pulsewidth are studied. Our results show that the voltage applied on the FeFET bit cell dominates the performance of the bit cell for both write and read operations.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"11 ","pages":"123-130"},"PeriodicalIF":2.7,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11185162","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145405385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-06DOI: 10.1109/JXCDC.2025.3617784
Anup Ashok Kedilaya;Sirish Oruganti;Nishant Gupta;Xiuhao Zhang;Ilya Karpov;Mark A. Anders;Jaydeep P. Kulkarni
Advances in process technology enabling backside metals (BSMs) and contacts offer new design–technology co-optimization (DTCO) opportunities to further enhance power, performance, and area gains (PPA) in sub-3-nm nodes. This work exploits backside (BS) contact technology within standard cells to extend both signal and clock routing to BSM layers, enabling standard-cell height reduction options. We design electrically equivalent (EEQ) standard cells with multiple layout variants based on front versus BS pin access, achieving a 2-M0-Track height reduction in 3-nm gate-all-around field-effect transistor (GAAFET) technology. Experimental evaluation across representative industrial benchmarks—including high-performance CPUs, GPUs, and general-purpose systems-on-chip (SoCs) demonstrates significant benefits. Cell height reduction delivers up to 35% area savings and 10%–15% total power reduction for GPU and GP-SoC designs. For high-performance CPUs, maximum performance improves by 15% at iso-power compared to backside power with buried power rails (BSBPR). Incorporating BS signal routing with cell height reduction also reduces worst case IR drop by 32% relative to BSBPR. These results show that BS clock (BSCLK) and signal routing represent the next phase of technology innovation beyond BS power delivery, enabling continued standard-cell scaling, improved intracell and intercell routability, and generational PPA gains while maintaining similar core transistor geometries in sub-3-nm technologies.
{"title":"Beyond Backside Power: Backside Signal Routing as Technology Booster for Standard-Cell Scaling","authors":"Anup Ashok Kedilaya;Sirish Oruganti;Nishant Gupta;Xiuhao Zhang;Ilya Karpov;Mark A. Anders;Jaydeep P. Kulkarni","doi":"10.1109/JXCDC.2025.3617784","DOIUrl":"https://doi.org/10.1109/JXCDC.2025.3617784","url":null,"abstract":"Advances in process technology enabling backside metals (BSMs) and contacts offer new design–technology co-optimization (DTCO) opportunities to further enhance power, performance, and area gains (PPA) in sub-3-nm nodes. This work exploits backside (BS) contact technology within standard cells to extend both signal and clock routing to BSM layers, enabling standard-cell height reduction options. We design electrically equivalent (EEQ) standard cells with multiple layout variants based on front versus BS pin access, achieving a 2-M0-Track height reduction in 3-nm gate-all-around field-effect transistor (GAAFET) technology. Experimental evaluation across representative industrial benchmarks—including high-performance CPUs, GPUs, and general-purpose systems-on-chip (SoCs) demonstrates significant benefits. Cell height reduction delivers up to 35% area savings and 10%–15% total power reduction for GPU and GP-SoC designs. For high-performance CPUs, maximum performance improves by 15% at iso-power compared to backside power with buried power rails (BSBPR). Incorporating BS signal routing with cell height reduction also reduces worst case IR drop by 32% relative to BSBPR. These results show that BS clock (BSCLK) and signal routing represent the next phase of technology innovation beyond BS power delivery, enabling continued standard-cell scaling, improved intracell and intercell routability, and generational PPA gains while maintaining similar core transistor geometries in sub-3-nm technologies.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"11 ","pages":"107-115"},"PeriodicalIF":2.7,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11192533","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145352255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Advanced packaging is becoming essential for designing hardware accelerators for large language models (LLMs). Different architectures, such as 2.5-D integration of memory with logic, have been proposed; however, the bandwidth limits the throughput of the complete system. Recent works have proposed memory on logic systems, where high bandwidth memory (HBM) can be 3-D stacked on top of logic to improve the throughput by $64times $ and energy efficiency by $3times $ . However, the high-power consumption of logic dies and the high thermal resistance of HBM can result in thermal and power delivery challenges in such heterogeneously integrated stacks. In this work, we explore various design configurations, such as logic-on-memory and memory-on-logic, and consider some hybrid configurations. Furthermore, accurate modeling of DRAM dies is performed, and mitigation strategies are proposed to further improve the throughput by 16% for memory-on-logic, reduce the high resistive (IR) drop for logic-on-memory system by 640 mV, and get $4times $ higher throughput for a hybrid system compared to the 2.5-D integrated system.
{"title":"3-D Stacked HBM and Compute Accelerators for LLM: Optimizing Thermal Management and Power Delivery Efficiency","authors":"Janak Sharda;Madison Manley;Jungyoun Kwak;Chinsung Park;Muhannad Bakir;Shimeng Yu","doi":"10.1109/JXCDC.2025.3617298","DOIUrl":"https://doi.org/10.1109/JXCDC.2025.3617298","url":null,"abstract":"Advanced packaging is becoming essential for designing hardware accelerators for large language models (LLMs). Different architectures, such as 2.5-D integration of memory with logic, have been proposed; however, the bandwidth limits the throughput of the complete system. Recent works have proposed memory on logic systems, where high bandwidth memory (HBM) can be 3-D stacked on top of logic to improve the throughput by <inline-formula> <tex-math>$64times $ </tex-math></inline-formula> and energy efficiency by <inline-formula> <tex-math>$3times $ </tex-math></inline-formula>. However, the high-power consumption of logic dies and the high thermal resistance of HBM can result in thermal and power delivery challenges in such heterogeneously integrated stacks. In this work, we explore various design configurations, such as logic-on-memory and memory-on-logic, and consider some hybrid configurations. Furthermore, accurate modeling of DRAM dies is performed, and mitigation strategies are proposed to further improve the throughput by 16% for memory-on-logic, reduce the high resistive (IR) drop for logic-on-memory system by 640 mV, and get <inline-formula> <tex-math>$4times $ </tex-math></inline-formula> higher throughput for a hybrid system compared to the 2.5-D integrated system.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"11 ","pages":"116-122"},"PeriodicalIF":2.7,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11192509","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145352214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-18DOI: 10.1109/JXCDC.2025.3611365
Masanori Natsui;Tomoo Yoshida;Takahiro Hanyu
This article proposes a circuit configuration for an area- and energy-efficient nonvolatile register using magnetic tunnel junction (MTJ) devices, suitable for persistent computation in intermittent computing environments. The proposed configuration, named the reference-load sharing scheme (RLSS), stores 1 bit of information using the resistance of a dedicated MTJ device and a composite resistance formed by multiple MTJ devices, which serves as a shared reference resistance across all bits. This configuration reduces both the total number of MTJ devices and the energy consumption required for data retention while also decreasing the circuit area through simplifying the write current control circuitry. Functional simulations using a 55-nm CMOS/MTJ-hybrid process technology confirm the advantage of the RLSS across 4-, 8-, 16-, and 32-bit registers. Furthermore, post-layout simulations quantitatively demonstrate that the proposed configuration reduces the backup energy by up to 47.8% and circuit area by up to 38.1% compared to conventional designs.
{"title":"Reference-Load Sharing Scheme: An Area- and Energy-Efficient Nonvolatile Register Design Using MTJ Devices","authors":"Masanori Natsui;Tomoo Yoshida;Takahiro Hanyu","doi":"10.1109/JXCDC.2025.3611365","DOIUrl":"https://doi.org/10.1109/JXCDC.2025.3611365","url":null,"abstract":"This article proposes a circuit configuration for an area- and energy-efficient nonvolatile register using magnetic tunnel junction (MTJ) devices, suitable for persistent computation in intermittent computing environments. The proposed configuration, named the reference-load sharing scheme (RLSS), stores 1 bit of information using the resistance of a dedicated MTJ device and a composite resistance formed by multiple MTJ devices, which serves as a shared reference resistance across all bits. This configuration reduces both the total number of MTJ devices and the energy consumption required for data retention while also decreasing the circuit area through simplifying the write current control circuitry. Functional simulations using a 55-nm CMOS/MTJ-hybrid process technology confirm the advantage of the RLSS across 4-, 8-, 16-, and 32-bit registers. Furthermore, post-layout simulations quantitatively demonstrate that the proposed configuration reduces the backup energy by up to 47.8% and circuit area by up to 38.1% compared to conventional designs.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"11 ","pages":"90-98"},"PeriodicalIF":2.7,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11172316","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145210104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-29DOI: 10.1109/JXCDC.2025.3603942
Mohammad Khairul Bashar;T. H. Pantha;Z. Li;M. Farasat;S. Datta;V. Narayanan;S. Dutta;N. Shukla
In-memory compute kernels present a promising approach for addressing data-centric workloads. However, their scalability—particularly for computationally intensive tasks solving combinatorial optimization problems such as Boolean satisfiability (SAT), which are inherently difficult to decompose—remains a significant challenge. In this work, we propose a ferroelectric nonvolatile memory (NVM)-based compute-in-memory annealer for solving the Boolean MaxSAT problem. We experimentally demonstrate the computational functionality of the NVM array using a compact $20 times 10$ HZO-/IWO-based ferroelectric field-effect-transistor (FeFET) array. More importantly, through experimentally calibrated simulations, we demonstrate that our solution is compatible with a modular memory architecture, allowing the problem sizes to exceed the capacity of a single memory array. Our approach not only addresses the size limitations imposed by the read margin (RM) of individual arrays but also opens new avenues for integrating such accelerators as back-end solutions in advanced computing platforms.
{"title":"FIMA: A Scalable Ferroelectric Compute-in-Memory Annealer for Accelerating Boolean Satisfiability","authors":"Mohammad Khairul Bashar;T. H. Pantha;Z. Li;M. Farasat;S. Datta;V. Narayanan;S. Dutta;N. Shukla","doi":"10.1109/JXCDC.2025.3603942","DOIUrl":"https://doi.org/10.1109/JXCDC.2025.3603942","url":null,"abstract":"In-memory compute kernels present a promising approach for addressing data-centric workloads. However, their scalability—particularly for computationally intensive tasks solving combinatorial optimization problems such as Boolean satisfiability (SAT), which are inherently difficult to decompose—remains a significant challenge. In this work, we propose a ferroelectric nonvolatile memory (NVM)-based compute-in-memory annealer for solving the Boolean MaxSAT problem. We experimentally demonstrate the computational functionality of the NVM array using a compact <inline-formula> <tex-math>$20 times 10$ </tex-math></inline-formula> HZO-/IWO-based ferroelectric field-effect-transistor (FeFET) array. More importantly, through experimentally calibrated simulations, we demonstrate that our solution is compatible with a modular memory architecture, allowing the problem sizes to exceed the capacity of a single memory array. Our approach not only addresses the size limitations imposed by the read margin (RM) of individual arrays but also opens new avenues for integrating such accelerators as back-end solutions in advanced computing platforms.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"11 ","pages":"81-89"},"PeriodicalIF":2.7,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11143213","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145028013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-07DOI: 10.1109/JXCDC.2025.3586589
Wei Zhang;Jianze Wang;Xuanyao Fong
We utilized phase-field simulations to investigate the effects of polar-axis (PA) orientation fluctuations on the extrinsic properties of single ferroelectric (FE) grains, focusing on the coercive electrical field (EC) and the remnant polarization (Pr). The underlying mechanisms through which PA orientation fluctuations influence polarization behavior are studied to gain insights into variations in FE device performance and reliability. In addition, we used the Voronoi algorithm to simulate multigrain (MG) FE capacitors and assess the impact of PA orientation fluctuations on the device variability of polycrystalline FE capacitors. Our analysis shows that the PA orientation, which is a significant intrinsic factor, collectively contributes to device variability. We conclude that engineering the PA orientation helps to optimize FE device performance and reliability, which is crucial for the development of high-performance FE memory technologies.
{"title":"Polar-Axis Orientation Fluctuations and the Impact on the Intrinsic Variability in Ferroelectric Capacitors","authors":"Wei Zhang;Jianze Wang;Xuanyao Fong","doi":"10.1109/JXCDC.2025.3586589","DOIUrl":"https://doi.org/10.1109/JXCDC.2025.3586589","url":null,"abstract":"We utilized phase-field simulations to investigate the effects of polar-axis (PA) orientation fluctuations on the extrinsic properties of single ferroelectric (FE) grains, focusing on the coercive electrical field (EC) and the remnant polarization (Pr). The underlying mechanisms through which PA orientation fluctuations influence polarization behavior are studied to gain insights into variations in FE device performance and reliability. In addition, we used the Voronoi algorithm to simulate multigrain (MG) FE capacitors and assess the impact of PA orientation fluctuations on the device variability of polycrystalline FE capacitors. Our analysis shows that the PA orientation, which is a significant intrinsic factor, collectively contributes to device variability. We conclude that engineering the PA orientation helps to optimize FE device performance and reliability, which is crucial for the development of high-performance FE memory technologies.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"11 ","pages":"74-80"},"PeriodicalIF":2.0,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11072438","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144680903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}