This paper presents a pioneering self-SHE assisted Spin-Orbit Torque Magnetic Tunnel Junction (SOT-MTJ) design, meticulously crafted for enhancing logic-in-memory applications. A novel Non-Volatile (NV) latch based on SOT-MTJ technology is proposed, demonstrating superior energy efficiency and compactness. The proposed NV latch achieves a power dissipation of 18.87 W, energy consumption of 75.4 fJ, and a delay of 0.2 ns, setting a new benchmark in NV latch performance. When incorporated into a 1-bit NV full adder, the design achieves a power consumption of 6.55 W, a delay of 86.57 ps, and requires only 59 MOS + 1 MTJ, showcasing its compactness and efficiency compared to conventional designs. These advancements underline the proposed SOT-MTJ-based NV latch and full adder as pivotal components for energy-efficient, high-performance non-volatile logic circuits, paving the way for future innovations in LiM architectures.
{"title":"Energy-efficient non-volatile latch using SOT-MTJ for enhanced logic and memory applications","authors":"Nikhil M.L. , T.Y. Satheesha , Shashidhara M. , Abhishek Acharya","doi":"10.1016/j.memori.2025.100137","DOIUrl":"10.1016/j.memori.2025.100137","url":null,"abstract":"<div><div>This paper presents a pioneering self-SHE assisted Spin-Orbit Torque Magnetic Tunnel Junction (SOT-MTJ) design, meticulously crafted for enhancing logic-in-memory applications. A novel Non-Volatile (NV) latch based on SOT-MTJ technology is proposed, demonstrating superior energy efficiency and compactness. The proposed NV latch achieves a power dissipation of 18.87 <span><math><mi>μ</mi></math></span>W, energy consumption of 75.4 fJ, and a delay of 0.2 ns, setting a new benchmark in NV latch performance. When incorporated into a 1-bit NV full adder, the design achieves a power consumption of 6.55 <span><math><mi>μ</mi></math></span>W, a delay of 86.57 ps, and requires only 59 MOS + 1 MTJ, showcasing its compactness and efficiency compared to conventional designs. These advancements underline the proposed SOT-MTJ-based NV latch and full adder as pivotal components for energy-efficient, high-performance non-volatile logic circuits, paving the way for future innovations in LiM architectures.</div></div>","PeriodicalId":100915,"journal":{"name":"Memories - Materials, Devices, Circuits and Systems","volume":"12 ","pages":"Article 100137"},"PeriodicalIF":0.0,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145765939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-11DOI: 10.1016/j.memori.2025.100136
Rinku Rani Das , Devenderpal Singh , Alex James
The ultimate advancement beyond FinFET technology is the Gate-All-Around (GAA) Multi-Bridge-Channel FET (MBCFET) technology. GAA MBCFET features vertically stacked multiple channels, a departure from single-channel designs, thereby enhancing overall device efficiency. This study explores four configurations (C1, C2, C3, and C4) of GAA MBCFET, where various channel arrangements are investigated. The impact of these channels on DC, RF/analog performance is analyzed, revealing that the GAA MBCFET device with four thin channels (C4) exhibits robust resistance to short channel effects (SCE) parameters, such as threshold voltage variation, Subthreshold Swing (SS), and Drain-Induced Barrier Lowering (DIBL). Moreover, the GAA MBCFET demonstrates superior RF and analog performance attributes, offering promising prospects for the design of RFIC circuits. The 1T1R memory cell implementation using GAA MBC-FET technology has been analyzed to observe the DC, transient analysis. This research suggests that the future integration of GAA MBCFETs holds the potential for significant enhancements in device performance, encompassing improved power efficiency, higher speeds, and overall superior capabilities.
{"title":"Performance investigation of 1T1R memory cell using GAA MBC-FET technology","authors":"Rinku Rani Das , Devenderpal Singh , Alex James","doi":"10.1016/j.memori.2025.100136","DOIUrl":"10.1016/j.memori.2025.100136","url":null,"abstract":"<div><div>The ultimate advancement beyond FinFET technology is the Gate-All-Around (GAA) Multi-Bridge-Channel FET (MBCFET) technology. GAA MBCFET features vertically stacked multiple channels, a departure from single-channel designs, thereby enhancing overall device efficiency. This study explores four configurations (C1, C2, C3, and C4) of GAA MBCFET, where various channel arrangements are investigated. The impact of these channels on DC, RF/analog performance is analyzed, revealing that the GAA MBCFET device with four thin channels (C4) exhibits robust resistance to short channel effects (SCE) parameters, such as threshold voltage variation, Subthreshold Swing (SS), and Drain-Induced Barrier Lowering (DIBL). Moreover, the GAA MBCFET demonstrates superior RF and analog performance attributes, offering promising prospects for the design of RFIC circuits. The 1T1R memory cell implementation using GAA MBC-FET technology has been analyzed to observe the DC, transient analysis. This research suggests that the future integration of GAA MBCFETs holds the potential for significant enhancements in device performance, encompassing improved power efficiency, higher speeds, and overall superior capabilities.</div></div>","PeriodicalId":100915,"journal":{"name":"Memories - Materials, Devices, Circuits and Systems","volume":"11 ","pages":"Article 100136"},"PeriodicalIF":0.0,"publicationDate":"2025-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145519643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-10DOI: 10.1016/j.memori.2025.100134
Srinivas Rahul Sapireddy , Kazi Asifuzzaman , Rahman Mostafizur
A key step in Neural Networks is activation. Among the different types of activation functions, sigmoid, tanh, and others involve the usage of exponents for calculation. From a hardware perspective, exponential implementation implies the usage of Taylor series or repeated methods involving many addition, multiplication, and division steps, and as a result are power-hungry and consume many clock cycles. We implement a piecewise linear approximation of the sigmoid function as a replacement for standard sigmoid activation libraries. This approach provides a practical alternative by leveraging piecewise segmentation, which simplifies hardware implementation and improves computational efficiency. In this paper, we detail piecewise functions that can be implemented using linear approximations and their implications for overall model accuracy and performance gain.
Our results show that for the DenseNet, ResNet, and GoogLeNet architectures, the piecewise linear approximation of the sigmoid function provides faster execution times compared to the standard TensorFlow sigmoid implementation while maintaining comparable accuracy. Specifically, for MNIST with DenseNet, accuracy reaches 99.91% (Piecewise) vs. 99.97% (Base) with up to 1.31 speedup in execution time. For CIFAR-10 with DenseNet, accuracy improves to 98.97% (Piecewise) vs. 99.40% (Base) while achieving 1.24 faster execution. Similarly, for CIFAR-100 with DenseNet, the accuracy is 97.93% (Piecewise) vs. 98.39% (Base), with a 1.18 execution time reduction. These results confirm the proposed method’s capability to efficiently process large-scale datasets and computationally demanding tasks, offering a practical means to accelerate deep learning models, including LSTMs, without compromising accuracy.
{"title":"Simplifying activations with linear approximations in neural networks","authors":"Srinivas Rahul Sapireddy , Kazi Asifuzzaman , Rahman Mostafizur","doi":"10.1016/j.memori.2025.100134","DOIUrl":"10.1016/j.memori.2025.100134","url":null,"abstract":"<div><div>A key step in Neural Networks is activation. Among the different types of activation functions, sigmoid, tanh, and others involve the usage of exponents for calculation. From a hardware perspective, exponential implementation implies the usage of Taylor series or repeated methods involving many addition, multiplication, and division steps, and as a result are power-hungry and consume many clock cycles. We implement a piecewise linear approximation of the sigmoid function as a replacement for standard sigmoid activation libraries. This approach provides a practical alternative by leveraging piecewise segmentation, which simplifies hardware implementation and improves computational efficiency. In this paper, we detail piecewise functions that can be implemented using linear approximations and their implications for overall model accuracy and performance gain.</div><div>Our results show that for the DenseNet, ResNet, and GoogLeNet architectures, the piecewise linear approximation of the sigmoid function provides faster execution times compared to the standard TensorFlow sigmoid implementation while maintaining comparable accuracy. Specifically, for MNIST with DenseNet, accuracy reaches 99.91% (Piecewise) vs. 99.97% (Base) with up to 1.31<span><math><mo>×</mo></math></span> speedup in execution time. For CIFAR-10 with DenseNet, accuracy improves to 98.97% (Piecewise) vs. 99.40% (Base) while achieving 1.24<span><math><mo>×</mo></math></span> faster execution. Similarly, for CIFAR-100 with DenseNet, the accuracy is 97.93% (Piecewise) vs. 98.39% (Base), with a 1.18<span><math><mo>×</mo></math></span> execution time reduction. These results confirm the proposed method’s capability to efficiently process large-scale datasets and computationally demanding tasks, offering a practical means to accelerate deep learning models, including LSTMs, without compromising accuracy.</div></div>","PeriodicalId":100915,"journal":{"name":"Memories - Materials, Devices, Circuits and Systems","volume":"11 ","pages":"Article 100134"},"PeriodicalIF":0.0,"publicationDate":"2025-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145319905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-06DOI: 10.1016/j.memori.2025.100135
Marco Grossi, Martin Omaña
Wireless sensor networks based on the Internet of Things (IoT) paradigm are of paramount importance to collect and share large amount of data in different fields of application. At the same time, cyberattacks represent a serious threat for the security of IoT systems and countermeasures have been proposed to mitigate the risks of cyberattacks in IoT systems.
Physical Unclonable Functions (PUF) are devices that exploit the random variations of the device parameters introduced during the manufacturing process to generate a secret key that can be considered virtually unclonable. PUF devices can be used, for instance, to generate a secure signature for device authentication or cryptographic algorithms.
In this paper, we present a PUF device that is based on the uncertainties due to transistors’ manufacturing parameters present in a single stage voltage amplifier. We present two different PUF implementations, one implemented by using bipolar junction transistors (BJTs) and the other implemented by using metal oxide semiconductor (MOS) transistors. We compare their performance by means of experimental measurements. The experimental results have shown that the best performance is achieved by the PUF based on BJT transistors, which features acceptable values of uniqueness (44.98 %), and uniformity (52.40 %), with very high values of steadiness and reliability to temperature and power supply fluctuations (all above 99.40 %). Instead, the PUF based on MOS transistors presents a lower steadiness and reliability than the PUF based on BJTs, but it can generate responses with higher number of bits, thus increasing security.
{"title":"Physical Unclonable Function (PUF) device based on single stage voltage amplifiers for secure signature generation in the Internet of Things","authors":"Marco Grossi, Martin Omaña","doi":"10.1016/j.memori.2025.100135","DOIUrl":"10.1016/j.memori.2025.100135","url":null,"abstract":"<div><div>Wireless sensor networks based on the Internet of Things (IoT) paradigm are of paramount importance to collect and share large amount of data in different fields of application. At the same time, cyberattacks represent a serious threat for the security of IoT systems and countermeasures have been proposed to mitigate the risks of cyberattacks in IoT systems.</div><div>Physical Unclonable Functions (PUF) are devices that exploit the random variations of the device parameters introduced during the manufacturing process to generate a secret key that can be considered virtually unclonable. PUF devices can be used, for instance, to generate a secure signature for device authentication or cryptographic algorithms.</div><div>In this paper, we present a PUF device that is based on the uncertainties due to transistors’ manufacturing parameters present in a single stage voltage amplifier. We present two different PUF implementations, one implemented by using bipolar junction transistors (BJTs) and the other implemented by using metal oxide semiconductor (MOS) transistors. We compare their performance by means of experimental measurements. The experimental results have shown that the best performance is achieved by the PUF based on BJT transistors, which features acceptable values of uniqueness (44.98 %), and uniformity (52.40 %), with very high values of steadiness and reliability to temperature and power supply fluctuations (all above 99.40 %). Instead, the PUF based on MOS transistors presents a lower steadiness and reliability than the PUF based on BJTs, but it can generate responses with higher number of bits, thus increasing security.</div></div>","PeriodicalId":100915,"journal":{"name":"Memories - Materials, Devices, Circuits and Systems","volume":"11 ","pages":"Article 100135"},"PeriodicalIF":0.0,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145265207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-19DOI: 10.1016/j.memori.2025.100132
Shubhang Pandey, T.G. Venkatesh
Recent advances in 3D fabrication have allowed handling the memory bottlenecks for modern data-intensive applications by bringing the computation closer to the memory, enabling Near Memory Processing (NMP). Memory Centric Networks (MCN) are advanced memory architectures that use NMP architectures, where multiple stacks of the 3D memory units are equipped with simple processing cores, allowing numerous threads to execute concurrently. The performance of the NMP is crucially dependent upon the efficient task offloading and task-to-NMP allocation. Our work presents a multi-armed bandit (MAB) based approach in formulating an efficient resource allocation strategy for MCN. Most existing literature concentrates only on one application domain and optimizing only one metric, i.e., either execution time or power. However, our solution is more generic and can be applied to diverse application domains. In our approach, we deploy Upper Confidence Bound (UCB) policy to collect rewards and eventually use it for regret optimization. We study the following metrics-instructions per cycle, execution times, NMP core cache misses, packet latencies, and power consumption. Our study covers various applications from PARSEC and SPLASH2 benchmarks suite. The evaluation shows that the system’s performance improves by on average and an average reduction in total power consumption by .
{"title":"Multi armed bandit based resource allocation in Near Memory Processing architectures","authors":"Shubhang Pandey, T.G. Venkatesh","doi":"10.1016/j.memori.2025.100132","DOIUrl":"10.1016/j.memori.2025.100132","url":null,"abstract":"<div><div>Recent advances in 3D fabrication have allowed handling the memory bottlenecks for modern data-intensive applications by bringing the computation closer to the memory, enabling Near Memory Processing (NMP). Memory Centric Networks (MCN) are advanced memory architectures that use NMP architectures, where multiple stacks of the 3D memory units are equipped with simple processing cores, allowing numerous threads to execute concurrently. The performance of the NMP is crucially dependent upon the efficient task offloading and task-to-NMP allocation. Our work presents a multi-armed bandit (MAB) based approach in formulating an efficient resource allocation strategy for MCN. Most existing literature concentrates only on one application domain and optimizing only one metric, i.e., either execution time or power. However, our solution is more generic and can be applied to diverse application domains. In our approach, we deploy Upper Confidence Bound (UCB) policy to collect rewards and eventually use it for regret optimization. We study the following metrics-instructions per cycle, execution times, NMP core cache misses, packet latencies, and power consumption. Our study covers various applications from PARSEC and SPLASH2 benchmarks suite. The evaluation shows that the system’s performance improves by <span><math><mrow><mo>∼</mo><mn>11</mn><mtext>%</mtext></mrow></math></span> on average and an average reduction in total power consumption by <span><math><mrow><mo>∼</mo><mn>12</mn><mtext>%</mtext></mrow></math></span>.</div></div>","PeriodicalId":100915,"journal":{"name":"Memories - Materials, Devices, Circuits and Systems","volume":"11 ","pages":"Article 100132"},"PeriodicalIF":0.0,"publicationDate":"2025-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144678812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-20DOI: 10.1016/j.memori.2025.100133
Armin Gooran-Shoorakchaly, Sarah Sharif, Yaser M. Banad
This study extends the state-of-the-art TaOx-based memristors by explicitly coupling electrode-dependent thermal conductivity to the electrical-thermal solver and by treating drift, diffusion, and Soret flux on equal footing. By examining titanium (Ti), palladium (Pd), and tungsten (W) electrodes, conductive filament (CF) dynamics is studied, particularly the role of thermal and electrical properties in governing oxygen vacancy migration. The enriched model reveals that Ti's low thermal conductivity (21.9 W/m·K) lowers the forming voltage to −1.72 V and boosts the peak diffusion flux to 5.4 A/cm2, whereas W's high thermal conductivity (174 W/m·K) suppresses filament growth, requiring −2.01 V. This is the first quantitative decomposition of the three vacancy-transport mechanisms under realistic Joule-heating conditions, enabling direct correlation between electrode choice and device variability. Our systematic analysis of drift, diffusion, and Soret flux mechanisms provides deeper insight into CF formation, stability, and device reliability. The insight translates into markedly tighter resistance distributions for Ti devices (σ/μ = 0.011 in LRS) and promising 10,000-s retention at 150 °C, pointing toward electrode-engineered RRAM for reliable neuromorphic computing. These findings underscore how careful electrode material selection can significantly enhance RRAM performance, reliability, and scalability, thereby presenting a promising device platform for neuromorphic and in-memory computing applications.
{"title":"Role of electrode materials in resistive switching mechanisms of oxide-based memristors for enhanced neuromorphic computing: A comprehensive study","authors":"Armin Gooran-Shoorakchaly, Sarah Sharif, Yaser M. Banad","doi":"10.1016/j.memori.2025.100133","DOIUrl":"10.1016/j.memori.2025.100133","url":null,"abstract":"<div><div>This study extends the state-of-the-art TaOx-based memristors by explicitly coupling electrode-dependent thermal conductivity to the electrical-thermal solver and by treating drift, diffusion, and Soret flux on equal footing. By examining titanium (Ti), palladium (Pd), and tungsten (W) electrodes, conductive filament (CF) dynamics is studied, particularly the role of thermal and electrical properties in governing oxygen vacancy migration. The enriched model reveals that Ti's low thermal conductivity (21.9 W/m·K) lowers the forming voltage to −1.72 V and boosts the peak diffusion flux to 5.4 A/cm<sup>2</sup>, whereas W's high thermal conductivity (174 W/m·K) suppresses filament growth, requiring −2.01 V. This is the first quantitative decomposition of the three vacancy-transport mechanisms under realistic Joule-heating conditions, enabling direct correlation between electrode choice and device variability. Our systematic analysis of drift, diffusion, and Soret flux mechanisms provides deeper insight into CF formation, stability, and device reliability. The insight translates into markedly tighter resistance distributions for Ti devices (σ/μ = 0.011 in LRS) and promising 10,000-s retention at 150 °C, pointing toward electrode-engineered RRAM for reliable neuromorphic computing. These findings underscore how careful electrode material selection can significantly enhance RRAM performance, reliability, and scalability, thereby presenting a promising device platform for neuromorphic and in-memory computing applications.</div></div>","PeriodicalId":100915,"journal":{"name":"Memories - Materials, Devices, Circuits and Systems","volume":"11 ","pages":"Article 100133"},"PeriodicalIF":0.0,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144489678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-01DOI: 10.1016/j.memori.2025.100131
Md Rownak Hossain Chowdhury, Mostafizur Rahman
Memory organization is essential for any AI (Artificial Intelligence) processor, as memory mapped I/O dictates the system's overall throughput. Regardless of how fast or how many parallel processing units are integrated into the processor, the performance will ultimately suffer when data transfer rates fail to match processing capabilities. Therefore, the efficacy of data orchestration within the memory hierarchy is a foundational aspect in benchmarking the performance of any AI accelerator. In this work, we investigate memory organization for a messaging-based vector processing Unit (MAVeC), where data routes across computation units to enable adaptive programmability at runtime. MAVeC features a hierarchical on-chip memory structure of less than 100 MB to minimize data movement, enhance locality, and maximize parallelism. Complementing this, we develop an end-to-end data orchestration methodology to manage data flow within the memory hierarchy. To evaluate the overall performance incorporating memory, we detail our extensive benchmarking results across diverse parameters, including PCIe (Peripheral Component Interconnect Express) configurations, available hardware resources, operating frequencies, and off-chip memory bandwidth. The MAVeC achieves a notable throughput of 95.39K inferences per second for Alex Net, operating at a 1 GHz frequency with 64 tiles and 32-bit precision, using PCIe 6.0 × 16 and HBM4 off-chip memory. In TSMC 28 nm technology node the estimated area for the MAVeC core is approximately 346 mm2. These results underscore the potential of the proposed memory hierarchy for the MAVeC accelerator, positioning it as a promising solution for future AI applications.
{"title":"Implications of memory embedding and hierarchy on the performance of MAVeC AI accelerators","authors":"Md Rownak Hossain Chowdhury, Mostafizur Rahman","doi":"10.1016/j.memori.2025.100131","DOIUrl":"10.1016/j.memori.2025.100131","url":null,"abstract":"<div><div>Memory organization is essential for any AI (Artificial Intelligence) processor, as memory mapped I/O dictates the system's overall throughput. Regardless of how fast or how many parallel processing units are integrated into the processor, the performance will ultimately suffer when data transfer rates fail to match processing capabilities. Therefore, the efficacy of data orchestration within the memory hierarchy is a foundational aspect in benchmarking the performance of any AI accelerator. In this work, we investigate memory organization for a messaging-based vector processing Unit (MAVeC), where data routes across computation units to enable adaptive programmability at runtime. MAVeC features a hierarchical on-chip memory structure of less than 100 MB to minimize data movement, enhance locality, and maximize parallelism. Complementing this, we develop an end-to-end data orchestration methodology to manage data flow within the memory hierarchy. To evaluate the overall performance incorporating memory, we detail our extensive benchmarking results across diverse parameters, including PCIe (Peripheral Component Interconnect Express) configurations, available hardware resources, operating frequencies, and off-chip memory bandwidth. The MAVeC achieves a notable throughput of 95.39K inferences per second for Alex Net, operating at a 1 GHz frequency with 64 tiles and 32-bit precision, using PCIe 6.0 × 16 and HBM4 off-chip memory. In TSMC 28 nm technology node the estimated area for the MAVeC core is approximately 346 mm<sup>2</sup>. These results underscore the potential of the proposed memory hierarchy for the MAVeC accelerator, positioning it as a promising solution for future AI applications.</div></div>","PeriodicalId":100915,"journal":{"name":"Memories - Materials, Devices, Circuits and Systems","volume":"10 ","pages":"Article 100131"},"PeriodicalIF":0.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144261298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-01DOI: 10.1016/j.memori.2025.100130
Mandeep Singh, Nakkina Sai Teja, Tarun Chaudhary, Balwinder Raj
The robust application of ferroelectric materials in various disciplines has resulted in the development of significantly more accurate and potent FeFETs, which have the potential to deliver more promising non-volatile memory and synaptic devices than traditional ones. The present study illustrates the fundamental concepts, operation, and construction of FeFETs and presents a methodology to determine suitable ferroelectric materials, the make-up of gate stacks, and the advantages that are necessary for an efficient and commercial FeFET. Among various ferroelectric-based FETs, the HfO2-based FeEFT has exhibited much more potential and huge advantages such as thin profiles, high polarisation, data retention, and endurance, which have been thoroughly explored in the present study. This paper discusses the contemporary challenges in device design by focusing primarily on the performance parameters such as CMOS compatibility of ferroelectric materials, gate leakage current, depolarisation fields, and a few other factors. Considering these factors will ultimately influence the critical concerns associated with devising design and practical limitations.
{"title":"Recent advancements and progress in development of ferroelectric field effect transistor: A review","authors":"Mandeep Singh, Nakkina Sai Teja, Tarun Chaudhary, Balwinder Raj","doi":"10.1016/j.memori.2025.100130","DOIUrl":"10.1016/j.memori.2025.100130","url":null,"abstract":"<div><div>The robust application of ferroelectric materials in various disciplines has resulted in the development of significantly more accurate and potent FeFETs, which have the potential to deliver more promising non-volatile memory and synaptic devices than traditional ones. The present study illustrates the fundamental concepts, operation, and construction of FeFETs and presents a methodology to determine suitable ferroelectric materials, the make-up of gate stacks, and the advantages that are necessary for an efficient and commercial FeFET. Among various ferroelectric-based FETs, the HfO<sub>2</sub>-based FeEFT has exhibited much more potential and huge advantages such as thin profiles, high polarisation, data retention, and endurance, which have been thoroughly explored in the present study. This paper discusses the contemporary challenges in device design by focusing primarily on the performance parameters such as CMOS compatibility of ferroelectric materials, gate leakage current, depolarisation fields, and a few other factors. Considering these factors will ultimately influence the critical concerns associated with devising design and practical limitations.</div></div>","PeriodicalId":100915,"journal":{"name":"Memories - Materials, Devices, Circuits and Systems","volume":"10 ","pages":"Article 100130"},"PeriodicalIF":0.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143918199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-18DOI: 10.1016/j.memori.2025.100128
Mohd K. Zulkalnain, Adel Barakat, Naqeeb Ullah, Haruichi Kanaya, Ramesh K. Pokharel
In this paper, a CMOS QCWM demodulator was designed to achieve a wide carrier frequency range to cater for a variety of applications. Previous designs utilize a pulse to sawtooth peak (PW2SP) converter and a comparator that necessitates a reference voltage, causing the frequency range to be limited, due to the current starved nature of the PW2SP circuit. To address this issue, a modified PW2SP employing a programmable current mirror with a 3-bit counter was proposed to provide current programmability and eliminate the use of a voltage reference. The proposed QCWM demodulator was designed and fabricated on 180 nm CMOS technology. The current programmability allows the QCWM demodulator to reach data rate of 400Kb/s to 8Mb/s, when the carrier frequency is varied from 1 MHz to 20 MHz. The design consumes 209 W at 20 MHz carrier frequency from a 1.4 V supply voltage with an energy consumption of 26.13 pJ/bit.
{"title":"Counter-based CMOS QCWM demodulator for wide frequency range WPT biohealth applications","authors":"Mohd K. Zulkalnain, Adel Barakat, Naqeeb Ullah, Haruichi Kanaya, Ramesh K. Pokharel","doi":"10.1016/j.memori.2025.100128","DOIUrl":"10.1016/j.memori.2025.100128","url":null,"abstract":"<div><div>In this paper, a CMOS QCWM demodulator was designed to achieve a wide carrier frequency range to cater for a variety of applications. Previous designs utilize a pulse to sawtooth peak (PW2SP) converter and a comparator that necessitates a reference voltage, causing the frequency range to be limited, due to the current starved nature of the PW2SP circuit. To address this issue, a modified PW2SP employing a programmable current mirror with a 3-bit counter was proposed to provide current programmability and eliminate the use of a voltage reference. The proposed QCWM demodulator was designed and fabricated on 180 nm CMOS technology. The current programmability allows the QCWM demodulator to reach data rate of 400Kb/s to 8Mb/s, when the carrier frequency is varied from 1 MHz to 20 MHz. The design consumes 209 <span><math><mi>μ</mi></math></span>W at 20 MHz carrier frequency from a 1.4 V supply voltage with an energy consumption of 26.13 pJ/bit.</div></div>","PeriodicalId":100915,"journal":{"name":"Memories - Materials, Devices, Circuits and Systems","volume":"10 ","pages":"Article 100128"},"PeriodicalIF":0.0,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143682017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Every digital computer system utilizes binary adders. However, researchers have focused on ternary logic to reduce power consumption in digital systems. To implement a ternary logic circuit, Carbon Nano Tube Field Effect Transistors (CNTFETs) have been employed, as the threshold voltage (Vth) of CNTFETs. Fundamentally, the carry look-ahead adders follow the parallel prefix carry propagation. In the parallel prefix adders, this propagates the carry/sum bits. The traditional Carry Propagate Adders (CPA) generate carry bits and propagate them. Their results show carry bit propagation needs time and extra circuits for carry generation, which occupies more chip area than Sum Propagation Adders (SPA). Specifically, this work explored the use of parallel prefix ternary sum/carry propagation adders with a proposed carry propagator block, which is a kind of multi-valued logic (MVL). This work utilized 32 nm CNTFETs to build the circuits. To evaluate the performance, simulations were conducted using Cadence Virtuoso Software for both the Ternary Carry Propagate Adder (TCPA) and the Ternary Sum Propagate Adder (TSPA). The results demonstrated that the 8-bit Kogge Stone TSPA exhibited a remarkable 37.3 % reduction in power consumption compared to the TCPA. Additionally, the 8-bit Kogge Stone TSPA also demonstrated a notable 45 % reduction in delay compared to the TCPA.
每个数字计算机系统都使用二进制加法器。然而,研究人员一直专注于三元逻辑,以降低数字系统的功耗。为了实现三元逻辑电路,采用碳纳米管场效应晶体管(cntfet)作为阈值电压(Vth)。基本上,进位前瞻加法器遵循并行前缀进位传播。在并行前缀加法器中,这将传播进位/和位。传统的进位传播加法器(CPA)产生进位并进行传播。结果表明,进位传输需要时间和额外的进位产生电路,比和传播加法器(SPA)占用更多的芯片面积。具体来说,本工作探讨了并行前缀三元和/进位传播加法器与进位传播块的使用,这是一种多值逻辑(MVL)。这项工作使用32纳米cntfet来构建电路。为了评估性能,使用Cadence Virtuoso软件对三进制传播加法器(TCPA)和三进制和传播加法器(TSPA)进行了仿真。结果表明,与TCPA相比,8位Kogge Stone TSPA的功耗显著降低了37.3%。此外,与TCPA相比,8位Kogge Stone TSPA还显着减少了45%的延迟。
{"title":"Design an energy efficient ternary parallel prefix carry/sum propagate adders using 32-nm CNTFET","authors":"Sudha Vani Yamani , H.K. Raghu Vamsi Kudulla , B.V.R.S. Ganesh , D. Sushma , Ch Manasa , Satti Harichandra Prasad","doi":"10.1016/j.memori.2025.100129","DOIUrl":"10.1016/j.memori.2025.100129","url":null,"abstract":"<div><div>Every digital computer system utilizes binary adders. However, researchers have focused on ternary logic to reduce power consumption in digital systems. To implement a ternary logic circuit, Carbon Nano Tube Field Effect Transistors (CNTFETs) have been employed, as the threshold voltage (V<sub>th</sub>) of CNTFETs. Fundamentally, the carry look-ahead adders follow the parallel prefix carry propagation. In the parallel prefix adders, this propagates the carry/sum bits. The traditional Carry Propagate Adders (CPA) generate carry bits and propagate them. Their results show carry bit propagation needs time and extra circuits for carry generation, which occupies more chip area than Sum Propagation Adders (SPA). Specifically, this work explored the use of parallel prefix ternary sum/carry propagation adders with a proposed carry propagator block, which is a kind of multi-valued logic (MVL). This work utilized 32 nm CNTFETs to build the circuits. To evaluate the performance, simulations were conducted using Cadence Virtuoso Software for both the Ternary Carry Propagate Adder (TCPA) and the Ternary Sum Propagate Adder (TSPA). The results demonstrated that the 8-bit Kogge Stone TSPA exhibited a remarkable 37.3 % reduction in power consumption compared to the TCPA. Additionally, the 8-bit Kogge Stone TSPA also demonstrated a notable 45 % reduction in delay compared to the TCPA.</div></div>","PeriodicalId":100915,"journal":{"name":"Memories - Materials, Devices, Circuits and Systems","volume":"10 ","pages":"Article 100129"},"PeriodicalIF":0.0,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143644920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}