Pub Date : 2025-11-20DOI: 10.1016/j.vlsi.2025.102604
Dekai Sun , Zhang Zhang , Wenyan Liu , Hongbin Yang , Yi Lu , Yonghong Zeng , Biao Zhang , Lianjie Lu
Implementing an artificial intelligence algorithm requires a lot of calculation, but the calculation process needs a lot of data migration, which consumes a lot of energy and time. In-memory computing is a promising paradigm to ease this limitation. XNOR-Network is an effective acceleration technique and has been widely applied in in-memory computing SRAM macro. Current in-memory computing SRAM macro for XNOR-Network has challenges in flexibility and reliability. To overcome these challenges, this paper proposes a differential in-memory computing 12T SRAM macro for XNOR-Network. The proposed SRAM macro eliminates the issue of memory information flipping that occurs during XNOR-and-accumulate operations. Moreover, it is capable of supporting XNOR-and-accumulate operations of varying sizes. Additionally, the XNOR-and-accumulate result can be read out quickly by the sensitive amplifier for its sign or read out by the Flash ADC for its multi-bit quantized value. The proposed architecture has an energy efficiency of 98.6TOPS/W and a recognition rate of 97.06% for MNIST data set.
{"title":"A differential in-memory computing 12T SRAM macro with enhanced flexibility and reliability for XNOR-network","authors":"Dekai Sun , Zhang Zhang , Wenyan Liu , Hongbin Yang , Yi Lu , Yonghong Zeng , Biao Zhang , Lianjie Lu","doi":"10.1016/j.vlsi.2025.102604","DOIUrl":"10.1016/j.vlsi.2025.102604","url":null,"abstract":"<div><div>Implementing an artificial intelligence algorithm requires a lot of calculation, but the calculation process needs a lot of data migration, which consumes a lot of energy and time. In-memory computing is a promising paradigm to ease this limitation. XNOR-Network is an effective acceleration technique and has been widely applied in in-memory computing SRAM macro. Current in-memory computing SRAM macro for XNOR-Network has challenges in flexibility and reliability. To overcome these challenges, this paper proposes a differential in-memory computing 12T SRAM macro for XNOR-Network. The proposed SRAM macro eliminates the issue of memory information flipping that occurs during XNOR-and-accumulate operations. Moreover, it is capable of supporting XNOR-and-accumulate operations of varying sizes. Additionally, the XNOR-and-accumulate result can be read out quickly by the sensitive amplifier for its sign or read out by the Flash ADC for its multi-bit quantized value. The proposed architecture has an energy efficiency of 98.6TOPS/W and a recognition rate of 97.06% for MNIST data set.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102604"},"PeriodicalIF":2.5,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145571986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arithmetic circuits are the fundamental building blocks of circuitry with applications including digital signal processing, cryptography processors, and multimedia. Integer multiplier circuits with high bit width of the operands dominate the extensive circuitry area of new-generation technologies. Traditionally, various multiplication algorithms are available to generate multiplier circuits considering area, delay, and power. Custom optimization is performed to reduce the circuit size, which increases the probability of logical bugs in the design. In the past over thirty years, prominent formal verification techniques such as Satisfiability (SAT) checking, Binary Decision Diagram (BDD), and Symbolic Computer Algebra (SCA) made massive progress in analyzing the correctness of the circuits. In this paper, we study the best state-of-the-art techniques from each method available in the academic domain and perform a comparative analysis to verify integer multiplier circuits with different architectures after logic optimization. Although the complexity of BDDs is constantly exponential with the input size of the circuit, and BDDs can be constructed only up to 18 bits, the method is robust to verify a variety of multiplier structures. Algebraic backward rewriting based on Symbolic Computer Algebra (SCA) facilitates the formal verification of high-bit-width multiplier circuits. Conventional approaches that leverage hierarchical structural information are constrained to algebraic-friendly multipliers, wherein adder sub-circuits are preserved in their canonical form, an assumption often invalidated post logic synthesis and optimization. In contrast, advanced algebraic techniques that operate directly on flattened net-lists demonstrate scalability and robustness in verifying large multiplier designs. Formal analysis with straightforward SAT techniques does not work well for comparing two structural non-similar circuits, which is often the case after applying logic optimization. If the degree of similarity is not excessively low, SAT-Sweeping can effectively reduce structural non-similarity, and SAT techniques can verify multipliers up to 512 bits. However, the verification of complex circuits, characterized by their non-algebraic-friendly nature, near-zero similarity to reference circuits, and larger input sizes, remains an open challenge.
{"title":"A comparative study on formal verification techniques to verify large integer multiplier circuits","authors":"Jitendra Kumar , Asutosh Srivastava , Masahiro Fujita","doi":"10.1016/j.vlsi.2025.102606","DOIUrl":"10.1016/j.vlsi.2025.102606","url":null,"abstract":"<div><div>Arithmetic circuits are the fundamental building blocks of circuitry with applications including digital signal processing, cryptography processors, and multimedia. Integer multiplier circuits with high bit width of the operands dominate the extensive circuitry area of new-generation technologies. Traditionally, various multiplication algorithms are available to generate multiplier circuits considering area, delay, and power. Custom optimization is performed to reduce the circuit size, which increases the probability of logical bugs in the design. In the past over thirty years, prominent formal verification techniques such as Satisfiability (SAT) checking, Binary Decision Diagram (BDD), and Symbolic Computer Algebra (SCA) made massive progress in analyzing the correctness of the circuits. In this paper, we study the best state-of-the-art techniques from each method available in the academic domain and perform a comparative analysis to verify integer multiplier circuits with different architectures after logic optimization. Although the complexity of BDDs is constantly exponential with the input size of the circuit, and BDDs can be constructed only up to 18 bits, the method is robust to verify a variety of multiplier structures. Algebraic backward rewriting based on Symbolic Computer Algebra (SCA) facilitates the formal verification of high-bit-width multiplier circuits. Conventional approaches that leverage hierarchical structural information are constrained to algebraic-friendly multipliers, wherein adder sub-circuits are preserved in their canonical form, an assumption often invalidated post logic synthesis and optimization. In contrast, advanced algebraic techniques that operate directly on flattened net-lists demonstrate scalability and robustness in verifying large multiplier designs. Formal analysis with straightforward SAT techniques does not work well for comparing two structural non-similar circuits, which is often the case after applying logic optimization. If the degree of similarity is not excessively low, SAT-Sweeping can effectively reduce structural non-similarity, and SAT techniques can verify multipliers up to 512 bits. However, the verification of complex circuits, characterized by their non-algebraic-friendly nature, near-zero similarity to reference circuits, and larger input sizes, remains an open challenge.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102606"},"PeriodicalIF":2.5,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145572012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This work presents challenges and solutions of global interconnect in Network-on-Chip (NoC) based System-on-Chips (SoCs) for congestion-free communication between different quantum accelerators in quantum computing systems. To address these problems, we have proposed two novel topologies in two-dimensional (2-D) and four topologies in three-dimensional (3-D). These topologies are based on two different architectural connection methods. The first two are the hybrid connection of the ring-of-mesh, with partial-diagonal-link (HMRPD) in 2-D and 3-D, and the other two are the hybrid connection of the ring-of-torus, with partial-diagonal-link (HTRPD) in 2-D and 3-D. Initially, the parametric analysis performed for both 2-D topologies and result shows that the interconnect has less diameter and average distance, which leads to reduce latency. It requires a small node degree, which makes it more accessible to design a network. It has a high bisection bandwidth, which helps in achieving low communication cost and high throughput. The scalability is higher than that of another existing interconnect. Further, we have examined the throughput, packet latency, and energy consumption of the interconnect for performance comparison of topologies under synthetic traffic patterns. We found that the proposed technique improves performance, optimizes communication cost, and energy consumption. Next, the 2-D HMRPD and 2-D HTRPD extended to 3-D symmetric network architectures by appending two additional ports in 2-D router architectures, namely up port and down port, and connecting these ports by Through Silicon Via (TSV), and routing of packets performed by a quasi-minimal routing technique. The result shows that these 3-D HMRPD and HTRPD have better performance than the 2-D HMRPD, 2-D HTRPD, and existing topologies. Unfortunately, these 3-D topologies result in extra energy consumption issues. Therefore, to solve this issue, heterogeneous layout of 2-D and 3-D router integration techniques applied in 3-D topologies for reducing number of TSV. Furthermore, we have presented two 3-D HTRPD topologies with TSV optimized and compared them with a full TSV-connected 3-D HTRPD. We found that 1P-3DR-HTRPD topology has the lowest gate count, area, dynamic, and static power consumption in comparison to fully connected 3-D HTRPD topology. This work has been designed by modifying network system simulator and also implemented in the Xc7z020clg484-1 ZYNQ FPGA device for validation. Furthermore, we have also examined that these 2-D topologies are more area-efficient and require a maximum crossbar size of 6x6 and have a high frequency of 2.29 GHz and 2.22 GHz for 2-D HMRPD and 2-D HTRPD, respectively, in comparison to other diagonal link topologies.
{"title":"Adaptive congestion-aware high performance scalable 2-D and 3-D topologies for network-on-chip based interconnect for quantum computing","authors":"Jayshree , Gopalakrishnan Seetharaman , Jitendra Kumar","doi":"10.1016/j.vlsi.2025.102597","DOIUrl":"10.1016/j.vlsi.2025.102597","url":null,"abstract":"<div><div>This work presents challenges and solutions of global interconnect in Network-on-Chip (NoC) based System-on-Chips (SoCs) for congestion-free communication between different quantum accelerators in quantum computing systems. To address these problems, we have proposed two novel topologies in two-dimensional (2-D) and four topologies in three-dimensional (3-D). These topologies are based on two different architectural connection methods. The first two are the hybrid connection of the ring-of-mesh, with partial-diagonal-link (HMRPD) in 2-D and 3-D, and the other two are the hybrid connection of the ring-of-torus, with partial-diagonal-link (HTRPD) in 2-D and 3-D. Initially, the parametric analysis performed for both 2-D topologies and result shows that the interconnect has less diameter and average distance, which leads to reduce latency. It requires a small node degree, which makes it more accessible to design a network. It has a high bisection bandwidth, which helps in achieving low communication cost and high throughput. The scalability is higher than that of another existing interconnect. Further, we have examined the throughput, packet latency, and energy consumption of the interconnect for performance comparison of topologies under synthetic traffic patterns. We found that the proposed technique improves performance, optimizes communication cost, and energy consumption. Next, the 2-D HMRPD and 2-D HTRPD extended to 3-D symmetric network architectures by appending two additional ports in 2-D router architectures, namely up port and down port, and connecting these ports by Through Silicon Via (TSV), and routing of packets performed by a quasi-minimal routing technique. The result shows that these 3-D HMRPD and HTRPD have better performance than the 2-D HMRPD, 2-D HTRPD, and existing topologies. Unfortunately, these 3-D topologies result in extra energy consumption issues. Therefore, to solve this issue, heterogeneous layout of 2-D and 3-D router integration techniques applied in 3-D topologies for reducing number of TSV. Furthermore, we have presented two 3-D HTRPD topologies with TSV optimized and compared them with a full TSV-connected 3-D HTRPD. We found that 1P-3DR-HTRPD topology has the lowest gate count, area, dynamic, and static power consumption in comparison to fully connected 3-D HTRPD topology. This work has been designed by modifying network system simulator and also implemented in the Xc7z020clg484-1 ZYNQ FPGA device for validation. Furthermore, we have also examined that these 2-D topologies are more area-efficient and require a maximum crossbar size of 6x6 and have a high frequency of 2.29 GHz and 2.22 GHz for 2-D HMRPD and 2-D HTRPD, respectively, in comparison to other diagonal link topologies.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102597"},"PeriodicalIF":2.5,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145618431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-20DOI: 10.1016/j.vlsi.2025.102602
Katherine Shu-Min Li , Fang-Chi Wu , Ching-Han Lai , Sying-Jyan Wang
With the rapid advancement of artificial intelligence (AI) technologies and the increasing proliferation of electronic devices, the demand for high-performance and secure printed circuit boards (PCBs) has grown substantially. In particular, the requirements for high-frequency operation, high-speed signal integrity, and enhanced security have become increasingly critical in modern PCB design. This study presents an integrated framework that incorporates test point insertion directly into the PCB routing process, simultaneously addressing testability and security concerns at the design stage. For the routing task, we propose a method that prioritizes nets by assigning routing sequences prior to trace generation. The A∗ search algorithm is then employed to perform multilayer routing, utilizing a customized heuristic function to minimize overall trace length while considering the known number of board layers. To determine optimal test point placement, we adopt a reinforcement learning approach, wherein an agent learns to select appropriate insertion actions guided by a carefully designed reward function. Experimental results demonstrate that the proposed approach achieves 100 % routing success and full test point coverage across all evaluated PCB designs. The resulting design allows for improved accessibility for electrical testing and lays the groundwork for subsequent security assessment.
{"title":"Security-oriented printed-circuit-board routing with deep reinforcement learning","authors":"Katherine Shu-Min Li , Fang-Chi Wu , Ching-Han Lai , Sying-Jyan Wang","doi":"10.1016/j.vlsi.2025.102602","DOIUrl":"10.1016/j.vlsi.2025.102602","url":null,"abstract":"<div><div>With the rapid advancement of artificial intelligence (AI) technologies and the increasing proliferation of electronic devices, the demand for high-performance and secure printed circuit boards (PCBs) has grown substantially. In particular, the requirements for high-frequency operation, high-speed signal integrity, and enhanced security have become increasingly critical in modern PCB design. This study presents an integrated framework that incorporates test point insertion directly into the PCB routing process, simultaneously addressing testability and security concerns at the design stage. For the routing task, we propose a method that prioritizes nets by assigning routing sequences prior to trace generation. The A∗ search algorithm is then employed to perform multilayer routing, utilizing a customized heuristic function to minimize overall trace length while considering the known number of board layers. To determine optimal test point placement, we adopt a reinforcement learning approach, wherein an agent learns to select appropriate insertion actions guided by a carefully designed reward function. Experimental results demonstrate that the proposed approach achieves 100 % routing success and full test point coverage across all evaluated PCB designs. The resulting design allows for improved accessibility for electrical testing and lays the groundwork for subsequent security assessment.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102602"},"PeriodicalIF":2.5,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145618433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-19DOI: 10.1016/j.vlsi.2025.102598
Mauricio Velázquez Díaz , Victor R. Gonzalez-Diaz , Gisela De La Fuente-Cortes , Guillermo Espinosa Flores-Verdad , Roberto S. Murphy-Arteaga
This article presents the design and implementation of a fully differential Successive Approximation Register (SAR) analog-to-digital converter (ADC) in 65 nm UMC technology, specifically targeting biomedical applications where area efficiency is a critical requirement. The design prioritizes achieving clean and precise first-order Noise Shaping (NS) by integrating a switched-capacitor-based integrator with our proposed C-2C ladder DAC topology, which is instrumental in significantly reducing area consumption. Noise performance is optimized by carefully correlating the capacitances of the integrator and DAC, ensuring precision and stability. To achieve robust operation, the design incorporates a process, voltage, and temperature (PVT)-resilient methodology for all system blocks, providing consistent performance and reliability under challenging conditions and variations in fabrication. The implemented prototype achieves an area efficiency of 0.058 , 10.37 ENOB over a 20 kHz Bandwidth, and operates at a 1 MHz sampling rate with a power consumption of .
{"title":"An area-efficient 1st order noise shaping SAR using C-2C ladder DAC for biomedical applications","authors":"Mauricio Velázquez Díaz , Victor R. Gonzalez-Diaz , Gisela De La Fuente-Cortes , Guillermo Espinosa Flores-Verdad , Roberto S. Murphy-Arteaga","doi":"10.1016/j.vlsi.2025.102598","DOIUrl":"10.1016/j.vlsi.2025.102598","url":null,"abstract":"<div><div>This article presents the design and implementation of a fully differential Successive Approximation Register (SAR) analog-to-digital converter (ADC) in 65 nm UMC technology, specifically targeting biomedical applications where area efficiency is a critical requirement. The design prioritizes achieving clean and precise first-order Noise Shaping (NS) by integrating a switched-capacitor-based integrator with our proposed C-2C ladder DAC topology, which is instrumental in significantly reducing area consumption. Noise performance is optimized by carefully correlating the capacitances of the integrator and DAC, ensuring precision and stability. To achieve robust operation, the design incorporates a process, voltage, and temperature (PVT)-resilient methodology for all system blocks, providing consistent performance and reliability under challenging conditions and variations in fabrication. The implemented prototype achieves an area efficiency of 0.058 <span><math><msup><mrow><mi>mm</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span>, 10.37 ENOB over a 20 kHz Bandwidth, and operates at a 1 MHz sampling rate with a power consumption of <span><math><mrow><mn>448</mn><mspace></mspace><mi>μ</mi><mi>W</mi></mrow></math></span>.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102598"},"PeriodicalIF":2.5,"publicationDate":"2025-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145618432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-15DOI: 10.1016/j.vlsi.2025.102601
Zhikui Duan , Dayi Yang , Shaobo He , Xinmei Yu , Zhuorui Tang , Qingyu Wu
This paper introduces a chaotic circuit design that integrates a Clapp oscillator with dual ultra-high-frequency (UHF) memristors. The circuit architecture marries the conventional Clapp oscillator topology with two UHF memristors, which operate at frequencies extending up to 1.5 GHz, and induces chaotic behavior by leveraging the inherent nonlinear properties of the memristors. The memristive circuit has been realized using the SMIC 0.18 m CMOS technology. Simulation outcomes indicate that, powered by a 3.3 V supply, the chaotic circuit is capable of producing both single-vortex and double-vortex-like chaotic attractors by adjusting the capacitance and inductance parameters. Additionally, the chaotic attributes of the circuit have been substantiated through phase portraits, Lyapunov exponent analysis, 0-1 test, Bifurcation diagram, and Poincaré section analysis. The circuit exhibits characteristics of ultra-high-frequency operation, a wealth of chaotic dynamics, and a stable signal output. This study provides a new solution for the generation of high frequency chaotic sequences, which has potential application prospects in the field of information security.
{"title":"A CMOS circuit for ultra high frequency chaos generation utilizing a Clapp oscillator with dual memristors","authors":"Zhikui Duan , Dayi Yang , Shaobo He , Xinmei Yu , Zhuorui Tang , Qingyu Wu","doi":"10.1016/j.vlsi.2025.102601","DOIUrl":"10.1016/j.vlsi.2025.102601","url":null,"abstract":"<div><div>This paper introduces a chaotic circuit design that integrates a Clapp oscillator with dual ultra-high-frequency (UHF) memristors. The circuit architecture marries the conventional Clapp oscillator topology with two UHF memristors, which operate at frequencies extending up to 1.5 GHz, and induces chaotic behavior by leveraging the inherent nonlinear properties of the memristors. The memristive circuit has been realized using the SMIC 0.18 <span><math><mi>μ</mi></math></span>m CMOS technology. Simulation outcomes indicate that, powered by a 3.3 V supply, the chaotic circuit is capable of producing both single-vortex and double-vortex-like chaotic attractors by adjusting the capacitance and inductance parameters. Additionally, the chaotic attributes of the circuit have been substantiated through phase portraits, Lyapunov exponent analysis, 0-1 test, Bifurcation diagram, and Poincaré section analysis. The circuit exhibits characteristics of ultra-high-frequency operation, a wealth of chaotic dynamics, and a stable signal output. This study provides a new solution for the generation of high frequency chaotic sequences, which has potential application prospects in the field of information security.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"106 ","pages":"Article 102601"},"PeriodicalIF":2.5,"publicationDate":"2025-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145571743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-14DOI: 10.1016/j.vlsi.2025.102600
T. Delphine Sheeba, G. Athisha
Compressed sensing applications frequently employ sparse signal recovery techniques, including Orthogonal Matching Pursuit (OMP), to effectively reconstruct signals. However, traditional OMP suffers from limitations in atom selection due to its reliance on single-atom selection methods, which can lead to inaccurate reconstructions and increased computational complexity. A novel Adaptable Threshold and Projection-Aware Orthogonal Matching Pursuit (ATPAwOMP) algorithm is suggested in this study to overcome these issues. By combining an adaptive thresholding method with projection-based atom selection, ATPAwOMP improves conventional OMP by iteratively improving reconstruction accuracy. By eliminating unnecessary atoms from the support set during the backtracking phase of the method, redundant computations are decreased, and the importance of the chosen atoms is increased. A lightweight VLSI design with a parallel multiplication and accumulation (MAC) unit, sorting unit, and matrix inversion unit is presented in order to further optimize the method for hardware deployment. A Newton-Raphson-based reciprocal operator decreases resource requirements for matrix inversion. At the same time, a Reconfigurable Adder/Subtractor Module (RASM) and a low-complexity LUT-based multiplier are integrated to minimize hardware overhead in the MAC unit. The proposed work is implemented in the Xilinx platform using the MIT-BIH arrhythmia database. The FPGA measures and the error metrics, such as signal to noise ratio (SNR), root mean square error (RMSE), percentage root mean square difference (PRD), and normalized PRDN, are evaluated. The ATPAwOMP algorithm is well-suited for real-time and resource-constrained applications like wearable ECG monitoring devices because of its adaptive thresholding and projection-aware approach, which provide notable increases in reconstruction accuracy and processing efficiency.
{"title":"VLSI implementation of adaptable threshold and projection aware OMP with reconfigurable LUT-based MAC unit for ECG signal reconstruction","authors":"T. Delphine Sheeba, G. Athisha","doi":"10.1016/j.vlsi.2025.102600","DOIUrl":"10.1016/j.vlsi.2025.102600","url":null,"abstract":"<div><div>Compressed sensing applications frequently employ sparse signal recovery techniques, including Orthogonal Matching Pursuit (OMP), to effectively reconstruct signals. However, traditional OMP suffers from limitations in atom selection due to its reliance on single-atom selection methods, which can lead to inaccurate reconstructions and increased computational complexity. A novel Adaptable Threshold and Projection-Aware Orthogonal Matching Pursuit (ATPAwOMP) algorithm is suggested in this study to overcome these issues. By combining an adaptive thresholding method with projection-based atom selection, ATPAwOMP improves conventional OMP by iteratively improving reconstruction accuracy. By eliminating unnecessary atoms from the support set during the backtracking phase of the method, redundant computations are decreased, and the importance of the chosen atoms is increased. A lightweight VLSI design with a parallel multiplication and accumulation (MAC) unit, sorting unit, and matrix inversion unit is presented in order to further optimize the method for hardware deployment. A Newton-Raphson-based reciprocal operator decreases resource requirements for matrix inversion. At the same time, a Reconfigurable Adder/Subtractor Module (RASM) and a low-complexity LUT-based multiplier are integrated to minimize hardware overhead in the MAC unit. The proposed work is implemented in the Xilinx platform using the MIT-BIH arrhythmia database. The FPGA measures and the error metrics, such as signal to noise ratio (SNR), root mean square error (RMSE), percentage root mean square difference (PRD), and normalized PRDN, are evaluated. The ATPAwOMP algorithm is well-suited for real-time and resource-constrained applications like wearable ECG monitoring devices because of its adaptive thresholding and projection-aware approach, which provide notable increases in reconstruction accuracy and processing efficiency.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102600"},"PeriodicalIF":2.5,"publicationDate":"2025-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145618429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-13DOI: 10.1016/j.vlsi.2025.102594
Parthiv Bhau , Vijay Savani
This work introduces five novel 1-bit Transmission Gate Diffusion Input (TGDI)-based Hybrid Full Adders (HFAs), optimized for low power consumption and high-speed performance, outperforming recent architectures. Additionally, an innovative 16-bit Ripple Carry Adder (RCA) is developed, leveraging a Doublet Transmission Gate Adder-based Compressor Structure (DTGAC) to address driving strength challenges while achieving a low power-delay product and improved Figure of Merit (FoM). The proposed architectures are simulated using the Cadence Virtuoso tool with 18 nm Fin Field Effect Transistor (FinFET) technology and a nominal supply voltage of 0.8 V (±10%) at 27 °C. Post-layout simulations validate the real-world electrical behavior of the proposed circuits. Process corner and Monte Carlo analysis confirm the robustness of the designs. The results reveal a significant FoM improvement of 45.16%–59.3% for the proposed 1-bit TGDI-based HFAs compared to the 1-bit conventional mirror adder. Furthermore, the 16-bit RCA with DTGAC structures utilizing the proposed adders achieves a FoM enhancement of 19.94%–28.88% as compared to the DTGAC-based 16-bit RCA with a mirror adder. These advancements establish the proposed architectures as highly efficient and robust solutions for low-power, high-performance digital arithmetic circuits.
{"title":"A hybrid 16-bit Ripple Carry Adder with Doublet Transmission Gate-based Compressor for performance boost","authors":"Parthiv Bhau , Vijay Savani","doi":"10.1016/j.vlsi.2025.102594","DOIUrl":"10.1016/j.vlsi.2025.102594","url":null,"abstract":"<div><div>This work introduces five novel 1-bit Transmission Gate Diffusion Input (TGDI)-based Hybrid Full Adders (HFAs), optimized for low power consumption and high-speed performance, outperforming recent architectures. Additionally, an innovative 16-bit Ripple Carry Adder (RCA) is developed, leveraging a Doublet Transmission Gate Adder-based Compressor Structure (DTGAC) to address driving strength challenges while achieving a low power-delay product and improved Figure of Merit (FoM). The proposed architectures are simulated using the Cadence Virtuoso tool with 18 nm Fin Field Effect Transistor (FinFET) technology and a nominal supply voltage of 0.8 V (±10%) at 27 °C. Post-layout simulations validate the real-world electrical behavior of the proposed circuits. Process corner and Monte Carlo analysis confirm the robustness of the designs. The results reveal a significant FoM improvement of 45.16%–59.3% for the proposed 1-bit TGDI-based HFAs compared to the 1-bit conventional mirror adder. Furthermore, the 16-bit RCA with DTGAC structures utilizing the proposed adders achieves a FoM enhancement of 19.94%–28.88% as compared to the DTGAC-based 16-bit RCA with a mirror adder. These advancements establish the proposed architectures as highly efficient and robust solutions for low-power, high-performance digital arithmetic circuits.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102594"},"PeriodicalIF":2.5,"publicationDate":"2025-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145536940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-12DOI: 10.1016/j.vlsi.2025.102599
Ana Mitrovic, Eby G. Friedman
Despite offering significant performance and energy efficiency advantages over CMOS, superconductive digital circuits face several challenges including scaling limitations. Traditional single flux quantum (SFQ) circuits require synchronous clock signals, leading to complex clock distribution networks. The integration density of SFQ circuits is also hindered by the need for large storage inductors. To overcome these challenges, an inductorless dynamic logic based on ferromagnetic bistable 2-Josephson junctions (JJ) is proposed. This logic family offers a scalable solution for asynchronous superconductive logic circuits. The behavior of 2-Josephson junctions is reviewed, and all-JJ dynamic circuits facilitating clockless operation are introduced. Inductorless dynamic AND and OR gates are evaluated in a half adder. The characteristics, margins, and effects of the parasitic inductances on circuit operation are discussed. As compared to RSFQ gates in the same technology (1 ), these logic gates exhibit 59% less delay (9 ps). 2-JJs require less energy to switch between equilibrium states. As a result, a decrease of 65% in bias current as compared to standard dynamic SFQ circuits is achieved. The reduction in bias current in half flux quantum operation requires 6.7X less energy per transition. Utilizing standard Josephson junctions rather than inductors saves and of loop inductance area within, respectively, a dynamic AND gate and dynamic OR gate for the 10 MIT LL SFQ5ee technology.
{"title":"Inductorless dynamic logic based on 2ϕ-Josephson junctions","authors":"Ana Mitrovic, Eby G. Friedman","doi":"10.1016/j.vlsi.2025.102599","DOIUrl":"10.1016/j.vlsi.2025.102599","url":null,"abstract":"<div><div>Despite offering significant performance and energy efficiency advantages over CMOS, superconductive digital circuits face several challenges including scaling limitations. Traditional single flux quantum (SFQ) circuits require synchronous clock signals, leading to complex clock distribution networks. The integration density of SFQ circuits is also hindered by the need for large storage inductors. To overcome these challenges, an inductorless dynamic logic based on ferromagnetic bistable 2<span><math><mi>ϕ</mi></math></span>-Josephson junctions (JJ) is proposed. This logic family offers a scalable solution for asynchronous superconductive logic circuits. The behavior of 2<span><math><mi>ϕ</mi></math></span>-Josephson junctions is reviewed, and all-JJ dynamic circuits facilitating clockless operation are introduced. Inductorless dynamic AND and OR gates are evaluated in a half adder. The characteristics, margins, and effects of the parasitic inductances on circuit operation are discussed. As compared to RSFQ gates in the same technology (1 <span><math><mrow><mi>kA</mi><mo>/</mo><msup><mrow><mi>cm</mi></mrow><mrow><mn>2</mn></mrow></msup></mrow></math></span>), these logic gates exhibit 59% less delay (9 ps). 2<span><math><mi>ϕ</mi></math></span>-JJs require less energy to switch between equilibrium states. As a result, a decrease of 65% in bias current as compared to standard dynamic SFQ circuits is achieved. The reduction in bias current in half flux quantum operation requires 6.7X less energy per transition. Utilizing standard Josephson junctions rather than inductors saves <span><math><mrow><mn>42</mn><mspace></mspace><mi>μ</mi><msup><mrow><mi>m</mi></mrow><mrow><mn>2</mn></mrow></msup></mrow></math></span> and <span><math><mrow><mn>53</mn><mspace></mspace><mi>μ</mi><msup><mrow><mi>m</mi></mrow><mrow><mn>2</mn></mrow></msup></mrow></math></span> of loop inductance area within, respectively, a dynamic AND gate and dynamic OR gate for the 10 <span><math><mrow><mi>kA</mi><mo>/</mo><msup><mrow><mi>cm</mi></mrow><mrow><mn>2</mn></mrow></msup></mrow></math></span> MIT LL SFQ5ee technology.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"106 ","pages":"Article 102599"},"PeriodicalIF":2.5,"publicationDate":"2025-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145519608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-08DOI: 10.1016/j.vlsi.2025.102596
Yingchun Lu , Xinkai Wu , Jinlin Chen , Huaguo Liang , Zhengfeng Huang , Xiumin Xu , Bo Liu
Physical Unclonable Function (PUF), a new hardware security primitive, provides a unique trustworthy root for a system by extracting deviations from a circuit's process. However, existing PUFs are difficult to achieve high reliability under temperature and voltage variations. In this paper, To address the problem of low reliability of Transient Effect Ring Oscillator (TERO) PUF, we propose a feedback TERO PUF based on Mueller gate, which uses the accumulation of RO loop delays to isolate the final PUF response, and stabilises the PUF quickly by introducing the delay of the feedback loop as a threshold. Experimental results on HSPICE show that the FT PUF reduces the BER to the worst 10.28 % over the temperature range of -20-80 °C and the voltage range of 0.8–1.2 V, and the uniqueness and uniformity are 51.38 % and 49.87 %, respectively. When implemented on several 7-series Xilinx devices, it achieved an 8.07 % reduction in unstable bit rate over conventional TERO PUFs under standard conditions (25 °C, 1.0V) and a worst-case unstable bit rate improvement of 3.15 % over the manufacturer's recommended voltage range.
{"title":"FTPUF:Feedback structure of TERO PUF for high reliability","authors":"Yingchun Lu , Xinkai Wu , Jinlin Chen , Huaguo Liang , Zhengfeng Huang , Xiumin Xu , Bo Liu","doi":"10.1016/j.vlsi.2025.102596","DOIUrl":"10.1016/j.vlsi.2025.102596","url":null,"abstract":"<div><div>Physical Unclonable Function (PUF), a new hardware security primitive, provides a unique trustworthy root for a system by extracting deviations from a circuit's process. However, existing PUFs are difficult to achieve high reliability under temperature and voltage variations. In this paper, To address the problem of low reliability of Transient Effect Ring Oscillator (TERO) PUF, we propose a feedback TERO PUF based on Mueller gate, which uses the accumulation of RO loop delays to isolate the final PUF response, and stabilises the PUF quickly by introducing the delay of the feedback loop as a threshold. Experimental results on HSPICE show that the FT PUF reduces the BER to the worst 10.28 % over the temperature range of -20-80 °C and the voltage range of 0.8–1.2 V, and the uniqueness and uniformity are 51.38 % and 49.87 %, respectively. When implemented on several 7-series Xilinx devices, it achieved an 8.07 % reduction in unstable bit rate over conventional TERO PUFs under standard conditions (25 °C, 1.0V) and a worst-case unstable bit rate improvement of 3.15 % over the manufacturer's recommended voltage range.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"106 ","pages":"Article 102596"},"PeriodicalIF":2.5,"publicationDate":"2025-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145519610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}