Pub Date : 2026-01-12DOI: 10.1016/j.vlsi.2026.102663
Yuanfa Ji , Haihui Zhang , Xiyan Sun , Furong Jiang , Qiang Fu
With the continuous increase in chip integration density and reliability requirements, test data volume has grown significantly. At the same time, limitations of automatic test equipment in terms of physical I/O channel count, memory capacity, and data transmission bandwidth have further raised test costs. To address these challenges, this paper proposes a test data compression method based on sliding-window encoding. This approach identifies repeated sequences in the data to be encoded and replaces them with shorter codewords, thereby achieving effective compression. Furthermore, a match length reuse mechanism is introduced, which considerably enhances both codeword utilization efficiency and compression performance. Additionally, this paper systematically analyzes the impact of encoding parameters on the compression ratio, optimizes the encoding scheme considering hardware overhead, and designs a corresponding decompression architecture. Experimental results show that the proposed method achieves an average compression ratio of 66.86% on ISCAS’89 benchmark circuits. This provides an innovative and practical solution for test data compression.
{"title":"A test data compression method based on sliding-window encoding and matching length reuse","authors":"Yuanfa Ji , Haihui Zhang , Xiyan Sun , Furong Jiang , Qiang Fu","doi":"10.1016/j.vlsi.2026.102663","DOIUrl":"10.1016/j.vlsi.2026.102663","url":null,"abstract":"<div><div>With the continuous increase in chip integration density and reliability requirements, test data volume has grown significantly. At the same time, limitations of automatic test equipment in terms of physical I/O channel count, memory capacity, and data transmission bandwidth have further raised test costs. To address these challenges, this paper proposes a test data compression method based on sliding-window encoding. This approach identifies repeated sequences in the data to be encoded and replaces them with shorter codewords, thereby achieving effective compression. Furthermore, a match length reuse mechanism is introduced, which considerably enhances both codeword utilization efficiency and compression performance. Additionally, this paper systematically analyzes the impact of encoding parameters on the compression ratio, optimizes the encoding scheme considering hardware overhead, and designs a corresponding decompression architecture. Experimental results show that the proposed method achieves an average compression ratio of 66.86% on ISCAS’89 benchmark circuits. This provides an innovative and practical solution for test data compression.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"108 ","pages":"Article 102663"},"PeriodicalIF":2.5,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145980164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-11DOI: 10.1016/j.vlsi.2026.102655
Yingchun Lu , Hongliang Lu , Yujie Liu , Huaguo Liang , Zhengfeng Huang , Jinlin Chen , Xiumin Xu , Liang Yao
Strong Physical Unclonable Functions (PUFs) are vulnerable to modeling attacks using Machine Learning (ML), and PUF-based authentication protocols also face security risks. To address these issues, this paper proposes a PUF structure with resistance to modeling attacks based on Dynamic Obfuscation (DO), composed of Linear Feedback Shift Registers (LFSRs), PUFs, and several logic gates. The characteristics of DO are as follows: (1) the initial state of the LFSR is determined by the PUF's response, making it uncontrollable; (2) the updated state of the LFSR determines the obfuscated bit of each input challenge, achieving a dynamic mapping between challenges and responses. An Arbiter PUF (APUF) based on DO is implemented on Xilinx Artix-7 FPGA, and experimental results show that the structure can effectively resist modeling attacks from various ML algorithms, with prediction accuracy close to 50 %. In addition, this paper proposes a mutual authentication protocol based on PUF, suitable for Internet of Things (IoT) systems.
{"title":"Design of a dynamic obfuscation-based strong PUF resistant to modeling attacks and mutual authentication protocol","authors":"Yingchun Lu , Hongliang Lu , Yujie Liu , Huaguo Liang , Zhengfeng Huang , Jinlin Chen , Xiumin Xu , Liang Yao","doi":"10.1016/j.vlsi.2026.102655","DOIUrl":"10.1016/j.vlsi.2026.102655","url":null,"abstract":"<div><div>Strong Physical Unclonable Functions (PUFs) are vulnerable to modeling attacks using Machine Learning (ML), and PUF-based authentication protocols also face security risks. To address these issues, this paper proposes a PUF structure with resistance to modeling attacks based on Dynamic Obfuscation (DO), composed of Linear Feedback Shift Registers (LFSRs), PUFs, and several logic gates. The characteristics of DO are as follows: (1) the initial state of the LFSR is determined by the PUF's response, making it uncontrollable; (2) the updated state of the LFSR determines the obfuscated bit of each input challenge, achieving a dynamic mapping between challenges and responses. An Arbiter PUF (APUF) based on DO is implemented on Xilinx Artix-7 FPGA, and experimental results show that the structure can effectively resist modeling attacks from various ML algorithms, with prediction accuracy close to 50 %. In addition, this paper proposes a mutual authentication protocol based on PUF, suitable for Internet of Things (IoT) systems.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"108 ","pages":"Article 102655"},"PeriodicalIF":2.5,"publicationDate":"2026-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145980161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-09DOI: 10.1016/j.vlsi.2026.102660
Prateek Goyal, Sujit Kumar Sahoo
This work introduces a novel Error-Optimized Hardware-Efficient Approximate Adder (EOHEAA) tailored for error-resilient computing tasks, where precision can be traded for improvements in energy, delay, and resource efficiency. The EOHEAA adopts a strategic method of controlled error propagation, enabling significant enhancement in accuracy metrics such as Mean Error Distance (MED), Mean Relative Error Distance (MRED), and Normalized MED (NMED), while maintaining minimal hardware overhead. Synthesized on the Artix-7 FPGA using Verilog HDL, EOHEAA achieves up to 38.6% reduction in power consumption, an 34% improvement in critical path delay, and notable savings in logic resources compared to conventional and state-of-the-art approximate adder designs. Comprehensive analysis across 8, 16, and 32-bit configurations further confirms its scalability and robustness, with PDP improvements reaching 71.5% in wider designs. Notably, EOHEAA outperforms several existing designs by achieving the lowest RMSE , minimum EDmax , and the highest accuracy-to-efficiency balance. ASIC-oriented design flow evaluation is further performed using Cadence Genus with predictive standard-cell libraries to analyze area, power, and timing behavior under advanced technology assumptions. To validate its real-world applicability, EOHEAA has been employed in Edge Detection and Color quantization using K-means clustering, both of which demonstrate high-quality outputs under relaxed accuracy constraints. Furthermore, a lightweight CNN-based validation framework is employed to examine the impact of approximate arithmetic on learning-based workloads, demonstrating that EOHEAA preserves inference accuracy while offering tangible energy and performance benefits. These results collectively position EOHEAA as a strong candidate for next-generation approximate arithmetic units in energy-aware image processing and machine-learning accelerators.
{"title":"EOHEAA: Error-Optimized Hardware-Efficient Approximate Adder for energy-aware error-resilient applications","authors":"Prateek Goyal, Sujit Kumar Sahoo","doi":"10.1016/j.vlsi.2026.102660","DOIUrl":"10.1016/j.vlsi.2026.102660","url":null,"abstract":"<div><div>This work introduces a novel Error-Optimized Hardware-Efficient Approximate Adder (EOHEAA) tailored for error-resilient computing tasks, where precision can be traded for improvements in energy, delay, and resource efficiency. The EOHEAA adopts a strategic method of controlled error propagation, enabling significant enhancement in accuracy metrics such as Mean Error Distance (MED), Mean Relative Error Distance (MRED), and <em>Normalized MED (NMED)</em>, while maintaining minimal hardware overhead. Synthesized on the Artix-7 FPGA <span><math><mrow><mo>(</mo><mi>X</mi><mi>C</mi><mn>7</mn><mi>A</mi><mn>35</mn><mi>T</mi><mo>−</mo><mn>1</mn><mi>C</mi><mi>P</mi><mi>G</mi><mn>236</mn><mi>C</mi><mo>)</mo></mrow></math></span> using Verilog HDL, EOHEAA achieves up to 38.6% reduction in power consumption, an 34% improvement in critical path delay, and notable savings in logic resources compared to conventional and state-of-the-art approximate adder designs. Comprehensive analysis across 8, 16, and 32-bit configurations further confirms its scalability and robustness, with PDP improvements reaching 71.5% in wider designs. Notably, EOHEAA outperforms several existing designs by achieving the lowest RMSE <span><math><mrow><mo>(</mo><mn>32</mn><mo>.</mo><mn>21</mn><mo>)</mo></mrow></math></span>, minimum ED<sub>max</sub> <span><math><mrow><mo>(</mo><mn>71</mn><mo>)</mo></mrow></math></span>, and the highest accuracy-to-efficiency balance. ASIC-oriented design flow evaluation is further performed using Cadence Genus with predictive standard-cell libraries to analyze area, power, and timing behavior under advanced technology assumptions. To validate its real-world applicability, EOHEAA has been employed in Edge Detection and Color quantization using K-means clustering, both of which demonstrate high-quality outputs under relaxed accuracy constraints. Furthermore, a lightweight CNN-based validation framework is employed to examine the impact of approximate arithmetic on learning-based workloads, demonstrating that EOHEAA preserves inference accuracy while offering tangible energy and performance benefits. These results collectively position EOHEAA as a strong candidate for next-generation approximate arithmetic units in energy-aware image processing and machine-learning accelerators.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"108 ","pages":"Article 102660"},"PeriodicalIF":2.5,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-09DOI: 10.1016/j.vlsi.2026.102661
Aydin Tarik Zengin
This paper introduces a high-precision, FPGA-based analog signal generator that fundamentally departs from conventional, template-based approaches by mathematically synthesizing each analog pulse in real time. Unlike systems relying on pre-recorded or pre-defined waveform memories, the proposed architecture dynamically computes every sample of double-exponential pulses on-the-fly within reconfigurable FPGA logic. Leveraging the AMD Xilinx ZYNQ 7010 SoC, the system ensures that every pulse is uniquely tailored on demand, with full control over rise time, decay time, amplitude, and pile-up effects. This real-time, parameter-driven signal generation enables the accurate emulation of complex detector signals, including overlapping events and user-defined spectral distributions, while guaranteeing deterministic timing and minimal processor overhead.
Experimental results demonstrate that the platform can precisely reproduce the analog characteristics and statistical features of diverse scintillation detector responses, outperforming commercial solutions limited to simple exponential or static waveform outputs. The modular, runtime-reconfigurable design supports dual-channel, high-fidelity operation and can be extended to broader application domains, including medical signal emulation and telecommunication waveform synthesis. By eliminating dependence on static pulse templates, this work establishes a new standard for flexibility, realism, and accuracy in embedded hardware testing and detector development.
{"title":"A method for mathematically synthesizing double-exponential signal generation on-the-fly on FPGA and its evaluation","authors":"Aydin Tarik Zengin","doi":"10.1016/j.vlsi.2026.102661","DOIUrl":"10.1016/j.vlsi.2026.102661","url":null,"abstract":"<div><div>This paper introduces a high-precision, FPGA-based analog signal generator that fundamentally departs from conventional, template-based approaches by mathematically synthesizing each analog pulse in real time. Unlike systems relying on pre-recorded or pre-defined waveform memories, the proposed architecture dynamically computes every sample of double-exponential pulses on-the-fly within reconfigurable FPGA logic. Leveraging the AMD Xilinx ZYNQ 7010 SoC, the system ensures that every pulse is uniquely tailored on demand, with full control over rise time, decay time, amplitude, and pile-up effects. This real-time, parameter-driven signal generation enables the accurate emulation of complex detector signals, including overlapping events and user-defined spectral distributions, while guaranteeing deterministic timing and minimal processor overhead.</div><div>Experimental results demonstrate that the platform can precisely reproduce the analog characteristics and statistical features of diverse scintillation detector responses, outperforming commercial solutions limited to simple exponential or static waveform outputs. The modular, runtime-reconfigurable design supports dual-channel, high-fidelity operation and can be extended to broader application domains, including medical signal emulation and telecommunication waveform synthesis. By eliminating dependence on static pulse templates, this work establishes a new standard for flexibility, realism, and accuracy in embedded hardware testing and detector development.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"108 ","pages":"Article 102661"},"PeriodicalIF":2.5,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145980163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Although the use of third-party netlist IP can enhance the quality of integrated circuit products and reduce development cycles, it also introduces potential security vulnerabilities. Identifying state registers in sequential netlists is a commonly adopted technique to assist engineers in understanding the control logic of unknown gate-level netlists. Traditional graph theory-based detection methods, such as RELIC and FSMX-ultra, suffer from low accuracy and high computational complexity. Recent graph neural network-based detection methods, such as ReIGNN, also exhibit limited accuracy, with many data DFFs being misclassified as state DFFs. In this article, we propose a graph transformer-based method, FSMformer, which utilizes bidirectional message passing as the local module and direction-aware linear fast attention as the global module, to enable the simultaneous extraction of structural and functional features from sequential netlists, thereby achieving efficient and accurate detection of state DFFs in large-scale netlists. According to the experimental results, our proposed FSMformer outperforms not only the state-of-the-art graph theory-based method FSMX-ultra and the state-of-the-art GNN-based method ReIGNN, but also various advanced neural network baselines that we employed for state DFFs detection.
{"title":"FSMformer: An efficient direction-aware graph transformer for state register detection of gate-level netlist","authors":"Zongtai Li, Liang Yang, Hao Li, Mian Lou, Zeyu Yang, Weidong Xu","doi":"10.1016/j.vlsi.2026.102656","DOIUrl":"10.1016/j.vlsi.2026.102656","url":null,"abstract":"<div><div>Although the use of third-party netlist IP can enhance the quality of integrated circuit products and reduce development cycles, it also introduces potential security vulnerabilities. Identifying state registers in sequential netlists is a commonly adopted technique to assist engineers in understanding the control logic of unknown gate-level netlists. Traditional graph theory-based detection methods, such as RELIC and FSMX-ultra, suffer from low accuracy and high computational complexity. Recent graph neural network-based detection methods, such as ReIGNN, also exhibit limited accuracy, with many data DFFs being misclassified as state DFFs. In this article, we propose a graph transformer-based method, FSMformer, which utilizes bidirectional message passing as the local module and direction-aware linear fast attention as the global module, to enable the simultaneous extraction of structural and functional features from sequential netlists, thereby achieving efficient and accurate detection of state DFFs in large-scale netlists. According to the experimental results, our proposed FSMformer outperforms not only the state-of-the-art graph theory-based method FSMX-ultra and the state-of-the-art GNN-based method ReIGNN, but also various advanced neural network baselines that we employed for state DFFs detection.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"108 ","pages":"Article 102656"},"PeriodicalIF":2.5,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-06DOI: 10.1016/j.vlsi.2026.102654
George B. Nardes, Thiago H. Rausch, Bruna H. Pereira, Douglas R. Melo, Cesar A. Zeferino
Convolutional Neural Networks (CNNs) are widely used in Automatic License Plate Recognition (ALPR) systems for Optical Character Recognition (OCR). Still, their computational cost often restricts deployment on edge devices. This work presents an 8-bit quantized CNN with a hardware-oriented dataflow designed specifically for OCR of Mercosur and Brazilian license plates. The model was trained using quantization-aware techniques and implemented on two FPGA platforms from different vendors, Altera Cyclone V and AMD Zynq UltraScale+, using the same VHDL architecture. The Zynq UltraScale+ implementation achieves 97.1% OCR accuracy, 2.12 ms latency, and 922 FPS in pipelined mode, while the Cyclone V version delivers 458 FPS with reduced BRAM and DSP usage. Energy measurements show 1.62 mJ per inference on Zynq UltraScale+ and 3.32 mJ on Cyclone V, confirming suitability for low-power, real-time ALPR. The results demonstrate that a portable 8-bit design can maintain accuracy comparable to that of floating-point models while achieving substantial gains in throughput and energy efficiency across heterogeneous FPGA devices.
{"title":"A resource-constrained CNN accelerator for real-time license plate character recognition on FPGA platforms","authors":"George B. Nardes, Thiago H. Rausch, Bruna H. Pereira, Douglas R. Melo, Cesar A. Zeferino","doi":"10.1016/j.vlsi.2026.102654","DOIUrl":"10.1016/j.vlsi.2026.102654","url":null,"abstract":"<div><div>Convolutional Neural Networks (CNNs) are widely used in Automatic License Plate Recognition (ALPR) systems for Optical Character Recognition (OCR). Still, their computational cost often restricts deployment on edge devices. This work presents an 8-bit quantized CNN with a hardware-oriented dataflow designed specifically for OCR of Mercosur and Brazilian license plates. The model was trained using quantization-aware techniques and implemented on two FPGA platforms from different vendors, Altera Cyclone V and AMD Zynq UltraScale+, using the same VHDL architecture. The Zynq UltraScale+ implementation achieves 97.1% OCR accuracy, 2.12 ms latency, and 922 FPS in pipelined mode, while the Cyclone V version delivers 458 FPS with reduced BRAM and DSP usage. Energy measurements show 1.62 mJ per inference on Zynq UltraScale+ and 3.32 mJ on Cyclone V, confirming suitability for low-power, real-time ALPR. The results demonstrate that a portable 8-bit design can maintain accuracy comparable to that of floating-point models while achieving substantial gains in throughput and energy efficiency across heterogeneous FPGA devices.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"108 ","pages":"Article 102654"},"PeriodicalIF":2.5,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-06DOI: 10.1016/j.vlsi.2026.102653
Viet-Thanh Pham , Victor Kamdoum Tamba , Luigi Fortuna
Non-equilibrium oscillators are special because of their hidden attractors. This work introduces a non-equilibrium oscillator, which is implemented using a reduced number of resistors. Its implementation requires a diode instead of analog multipliers. Our oscillator is easily realized with common off-the-shelf components in the laboratory, making it suitable for educational purposes. Dynamics of the oscillator are investigated to present its special features. In addition, the usage of the oscillator for generating random signal is presented illustrating its possible application.
{"title":"Non-equilibrium oscillator with a diode: Dynamics and application","authors":"Viet-Thanh Pham , Victor Kamdoum Tamba , Luigi Fortuna","doi":"10.1016/j.vlsi.2026.102653","DOIUrl":"10.1016/j.vlsi.2026.102653","url":null,"abstract":"<div><div>Non-equilibrium oscillators are special because of their hidden attractors. This work introduces a non-equilibrium oscillator, which is implemented using a reduced number of resistors. Its implementation requires a diode instead of analog multipliers. Our oscillator is easily realized with common off-the-shelf components in the laboratory, making it suitable for educational purposes. Dynamics of the oscillator are investigated to present its special features. In addition, the usage of the oscillator for generating random signal is presented illustrating its possible application.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"108 ","pages":"Article 102653"},"PeriodicalIF":2.5,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-05DOI: 10.1016/j.vlsi.2026.102646
S. Karipidis , A. Buzo , G. Pelz , T. Noulis
Analog and Mixed Signal circuit sizing with large-scale parameters requires a lot of simulations, especially in non-linear topology where large-signal analysis is a need. Reducing the number of simulations and in general the total design cycle time, is the main objective for optimal sizing of complicated circuits. In this work a circuit sizing automated design methodology is presented using the hybrid dual annealing and Nelder–Mead algorithm, significantly reducing the design cycle time and the required number of transient simulations. A customized hybrid algorithm environment using Dual Annealing and Nelder–Mead is developed where the optimization process is divided into different optimization sub-steps. The proposed hybrid algorithm based method achieves rapid convergence to the needed circuit performance specification. It uses combinations of direct search algorithms to separate metric evaluation accelerating the performance specifications convergence speed in a large parameter space. A complicated non-linear topology like a product level low-dropout (LDO) regulator, in 180 nm process node, with 30 parameters is used as the circuit vehicle to verify the proposed methodology. The sizing process converged with less than 1700 simulations having as input just the circuit schematic with no prior sizing knowledge. Sub optimization is also performed focused on each analysis type — DC, AC and transient, with a focus on reducing the number of transient simulations. The proposed combined algorithm method achieved 31 % faster convergence speed compared to the state-of-the-art methods and handles efficiently each simulation analysis.
{"title":"Hybrid algorithm based optimization strategies for analog circuit sizing in low dropout regulators","authors":"S. Karipidis , A. Buzo , G. Pelz , T. Noulis","doi":"10.1016/j.vlsi.2026.102646","DOIUrl":"10.1016/j.vlsi.2026.102646","url":null,"abstract":"<div><div>Analog and Mixed Signal circuit sizing with large-scale parameters requires a lot of simulations, especially in non-linear topology where large-signal analysis is a need. Reducing the number of simulations and in general the total design cycle time, is the main objective for optimal sizing of complicated circuits. In this work a circuit sizing automated design methodology is presented using the hybrid dual annealing and Nelder–Mead algorithm, significantly reducing the design cycle time and the required number of transient simulations. A customized hybrid algorithm environment using Dual Annealing and Nelder–Mead is developed where the optimization process is divided into different optimization sub-steps. The proposed hybrid algorithm based method achieves rapid convergence to the needed circuit performance specification. It uses combinations of direct search algorithms to separate metric evaluation accelerating the performance specifications convergence speed in a large parameter space. A complicated non-linear topology like a product level low-dropout (LDO) regulator, in 180 nm process node, with 30 parameters is used as the circuit vehicle to verify the proposed methodology. The sizing process converged with less than 1700 simulations having as input just the circuit schematic with no prior sizing knowledge. Sub optimization is also performed focused on each analysis type — DC, AC and transient, with a focus on reducing the number of transient simulations. The proposed combined algorithm method achieved 31 % faster convergence speed compared to the state-of-the-art methods and handles efficiently each simulation analysis.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"108 ","pages":"Article 102646"},"PeriodicalIF":2.5,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145980165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-03DOI: 10.1016/j.vlsi.2025.102645
J. Banumathi, G. Karthy
Signal processing widely uses Finite Impulse Response (FIR) filters because of their stability and linear phase. However, traditional FIR filter designs are limited by multiplication operations that lead to high hardware utilization and delay. To address this, modified FIR filters with optimized multipliers and adders are being developed to improve hardware resource utilization and delay performance. This paper presents novel designs for 8-tap and 16-tap FIR filters, leveraging Brent-Kung Adders (BKA) and Pipelined (P) Urdhva Triyakbhyam Vedic multipliers (UTVM) to achieve minimal delay and enhanced performance. Three architectures—Pipelined FIR using UTVM with BKA (PFIR-UTVM-BKA), FIR using PUTVM with BKA (FIR-PUTVM-BKA), and Pipelined FIR using PUTVM with BKA (PFIR-PUTVM-BKA)—were implemented with different device specifications on Kintex-7, Virtex-7, and Zynq 7000 platforms using Xilinx Vivado 2022.2 and ASIC 45 nm, simulated in Verilog. In FPGA, the proposed multiplier reduces delay by 43 %, 61.55 %, 73.01 %, and 78.51 % across different bit widths, and power, delay, and (power delay Product) PDP were reduced by 82.16 %,94.34 % and 98.99 % respectively, in ASIC. Additionally, the proposed FIR filter architectures achieve significant improvements, including 51.88 % and 27.13 % delay reduction, 75.40 % slice improvement, and 92.53 % and 97.03 % enhancement in slice registers for 8-tap and 16-tap 8-bit designs in FPGA, and power, delay, and PDP (Power Delay Product) were reduced by 87.97 %,97.59 % and 99.71 % respectively, in ASIC. These advancements make the proposed FIR filters highly suitable for high-speed digital signal processing (DSP) applications, where efficient processing and minimized latency are crucial. Integrating PVM and BKA plays a pivotal role in achieving these performance enhancements, positioning these filter designs as promising solutions for next-generation signal processing systems.
{"title":"High-performance FIR filter designs using Brent Kung Adder and pipelined Vedic multiplier","authors":"J. Banumathi, G. Karthy","doi":"10.1016/j.vlsi.2025.102645","DOIUrl":"10.1016/j.vlsi.2025.102645","url":null,"abstract":"<div><div>Signal processing widely uses Finite Impulse Response (FIR) filters because of their stability and linear phase. However, traditional FIR filter designs are limited by multiplication operations that lead to high hardware utilization and delay. To address this, modified FIR filters with optimized multipliers and adders are being developed to improve hardware resource utilization and delay performance. This paper presents novel designs for 8-tap and 16-tap FIR filters, leveraging Brent-Kung Adders (BKA) and Pipelined (P) Urdhva Triyakbhyam Vedic multipliers (UTVM) to achieve minimal delay and enhanced performance. Three architectures—Pipelined FIR using UTVM with BKA (PFIR-UTVM-BKA), FIR using PUTVM with BKA (FIR-PUTVM-BKA), and Pipelined FIR using PUTVM with BKA (PFIR-PUTVM-BKA)—were implemented with different device specifications on Kintex-7, Virtex-7, and Zynq 7000 platforms using Xilinx Vivado 2022.2 and ASIC 45 nm, simulated in Verilog. In FPGA, the proposed multiplier reduces delay by 43 %, 61.55 %, 73.01 %, and 78.51 % across different bit widths, and power, delay, and (power delay Product) PDP were reduced by 82.16 %,94.34 % and 98.99 % respectively, in ASIC. Additionally, the proposed FIR filter architectures achieve significant improvements, including 51.88 % and 27.13 % delay reduction, 75.40 % slice improvement, and 92.53 % and 97.03 % enhancement in slice registers for 8-tap and 16-tap 8-bit designs in FPGA, and power, delay, and PDP (Power Delay Product) were reduced by 87.97 %,97.59 % and 99.71 % respectively, in ASIC. These advancements make the proposed FIR filters highly suitable for high-speed digital signal processing (DSP) applications, where efficient processing and minimized latency are crucial. Integrating PVM and BKA plays a pivotal role in achieving these performance enhancements, positioning these filter designs as promising solutions for next-generation signal processing systems.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"108 ","pages":"Article 102645"},"PeriodicalIF":2.5,"publicationDate":"2026-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-03DOI: 10.1016/j.vlsi.2025.102640
Jinlin Chen , Huaguo Liang , Yingchun Lu , Liang Yao
With the rise of the internet and electronic devices, the security of network information has garnered increasing attention. True Random Number Generators (TRNGs) play an increasingly important role in information security. TRNG entropy sources based on Ring Oscillator (RO) have attracted significant interest due to their simple circuit design and ease of implementation on FPGAs. However, most existing works suffer from high hardware overhead. A novel ultra-lightweight TRNG based on multi-mode switching of RO-like rings is proposed in this work, which can be automatically placed and routed on the Xilinx Artix-7 FPGA, using only 10 LUTs and 2 D flip-flops. The randomness of the entropy source is analyzed through a mathematical model, proving that the output sequence is an unordered random bit string under any circumstances. The output sequence of the TRNG successfully passed various tests, including autocorrelation tests, NIST SP800-22, NIST SP800-90B, AIS-31, and TESTU01, with favorable results.
{"title":"RO-like ring-based TRNG with adaptive mode switching for enhanced entropy Harvesting","authors":"Jinlin Chen , Huaguo Liang , Yingchun Lu , Liang Yao","doi":"10.1016/j.vlsi.2025.102640","DOIUrl":"10.1016/j.vlsi.2025.102640","url":null,"abstract":"<div><div>With the rise of the internet and electronic devices, the security of network information has garnered increasing attention. True Random Number Generators (TRNGs) play an increasingly important role in information security. TRNG entropy sources based on Ring Oscillator (RO) have attracted significant interest due to their simple circuit design and ease of implementation on FPGAs. However, most existing works suffer from high hardware overhead. A novel ultra-lightweight TRNG based on multi-mode switching of RO-like rings is proposed in this work, which can be automatically placed and routed on the Xilinx Artix-7 FPGA, using only 10 LUTs and 2 D flip-flops. The randomness of the entropy source is analyzed through a mathematical model, proving that the output sequence is an unordered random bit string under any circumstances. The output sequence of the TRNG successfully passed various tests, including autocorrelation tests, NIST SP800-22, NIST SP800-90B, AIS-31, and TESTU01, with favorable results.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"108 ","pages":"Article 102640"},"PeriodicalIF":2.5,"publicationDate":"2026-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145980162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}