Pub Date : 2025-12-25DOI: 10.1016/j.vlsi.2025.102642
Mehmet Dogan , Erkan Yuce , Shahram Minaei
In this study, two first-order voltage-mode universal filters based on the plus-type second-generation current conveyors (CCII+s) are proposed. Each filter employs two CCII+s, a grounded capacitor, and three resistors. Each filter exhibits the feature of universality, i.e., they realize low-pass filter, high-pass filter, and all-pass filter (APF) responses. Additionally, the APF responses offer electronically tunable gain through grounded resistors, eliminating the need for extra amplifier stages. Total harmonic distortion variations of the APFs are low. Dynamic ranges of the proposed filters are wide. However, they require a passive element matching condition and include two floating resistors. As application examples, two quadrature oscillators are presented. Extensive SPICE simulations are conducted using 180 nm TSMC technology parameters. Experimental validations are also carried out using commercially available AD844 active devices.
{"title":"First-order universal filters with two CCII+s and a grounded capacitor: Theory and experimental validation","authors":"Mehmet Dogan , Erkan Yuce , Shahram Minaei","doi":"10.1016/j.vlsi.2025.102642","DOIUrl":"10.1016/j.vlsi.2025.102642","url":null,"abstract":"<div><div>In this study, two first-order voltage-mode universal filters based on the plus-type second-generation current conveyors (CCII+s) are proposed. Each filter employs two CCII+s, a grounded capacitor, and three resistors. Each filter exhibits the feature of universality, i.e., they realize low-pass filter, high-pass filter, and all-pass filter (APF) responses. Additionally, the APF responses offer electronically tunable gain through grounded resistors, eliminating the need for extra amplifier stages. Total harmonic distortion variations of the APFs are low. Dynamic ranges of the proposed filters are wide. However, they require a passive element matching condition and include two floating resistors. As application examples, two quadrature oscillators are presented. Extensive SPICE simulations are conducted using 180 nm TSMC technology parameters. Experimental validations are also carried out using commercially available AD844 active devices.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102642"},"PeriodicalIF":2.5,"publicationDate":"2025-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145839827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-24DOI: 10.1016/j.vlsi.2025.102641
Arash Hosseini, Shahram Mohammadnejad, Mohammad Azim Karami
This article presents two modified low-power basic and feedback-bias common-gate (CG) transimpedance amplifier topologies, which utilize a novel inductor-less current-reuse feedforward technique for 3 dB-bandwidth (BW) extension and relaxing critical trade-offs in CG-based TIAs. The topologies incorporate custom-designed biasing circuits to reduce performance variations across process and temperature. To assess the proposed topologies, mathematical and simulation analyses of both real (with and without zero) and complex conjugate pole conditions, along with noise analysis, have been conducted in modified and conventional structures. The proposed technique creates or adjusts a left-half plane zero through the feedforward path. Then, by neutralizing the dominant pole effect and generating a peaking property, the circuit's bandwidth is enhanced without reducing gain or increasing power consumption. In real pole conditions, the circuit's bandwidth increases by 1.5 2 times, while the input-referred noise is reduced by more than 2 times. In the complex conjugate pole state (specifically in feedback-bias CG), the proposed technique reduces the rate of bandwidth reduction caused by the input capacitance (Cin) increase (40 % improvement by changing the Cin from 1.5pF to 2.1 pF). Furthermore, the power consumption decreases by 2.4 times compared with the conventional feedback-bias topology. Topologies are validated in various process corners (TT, SS, FF) at different temperatures. In the worst cases, the BW variations of the modified basic and feedback-bias topologies have decreased by 42 % and 16 %, respectively. Additionally, Monte-Carlo and post-layout analysis of the proposed topologies are conducted in 0.18 μm CMOS standard technology.
{"title":"Low-power modified basic and feedback-bias common-gate transimpedance amplifiers with a novel bandwidth enhancement technique","authors":"Arash Hosseini, Shahram Mohammadnejad, Mohammad Azim Karami","doi":"10.1016/j.vlsi.2025.102641","DOIUrl":"10.1016/j.vlsi.2025.102641","url":null,"abstract":"<div><div>This article presents two modified low-power basic and feedback-bias common-gate (CG) transimpedance amplifier topologies, which utilize a novel inductor-less current-reuse feedforward technique for 3 dB-bandwidth (BW) extension and relaxing critical trade-offs in CG-based TIAs. The topologies incorporate custom-designed biasing circuits to reduce performance variations across process and temperature. To assess the proposed topologies, mathematical and simulation analyses of both real (with and without zero) and complex conjugate pole conditions, along with noise analysis, have been conducted in modified and conventional structures. The proposed technique creates or adjusts a left-half plane zero through the feedforward path. Then, by neutralizing the dominant pole effect and generating a peaking property, the circuit's bandwidth is enhanced without reducing gain or increasing power consumption. In real pole conditions, the circuit's bandwidth increases by 1.5 <span><math><mrow><mo>∼</mo></mrow></math></span> 2 times, while the input-referred noise is reduced by more than <span><math><mrow><mo>∼</mo></mrow></math></span> 2 times. In the complex conjugate pole state (specifically in feedback-bias CG), the proposed technique reduces the rate of bandwidth reduction caused by the input capacitance (Cin) increase (40 % improvement by changing the Cin from 1.5pF to 2.1 pF). Furthermore, the power consumption decreases by <span><math><mrow><mo>∼</mo></mrow></math></span> 2.4 times compared with the conventional feedback-bias topology. Topologies are validated in various process corners (TT, SS, FF) at different temperatures. In the worst cases, the BW variations of the modified basic and feedback-bias topologies have decreased by 42 % and 16 %, respectively. Additionally, Monte-Carlo and post-layout analysis of the proposed topologies are conducted in 0.18 μm CMOS standard technology.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"108 ","pages":"Article 102641"},"PeriodicalIF":2.5,"publicationDate":"2025-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146038435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-22DOI: 10.1016/j.vlsi.2025.102636
Ankita Sarkar, Mansi Jhamb
The rapid increase in health monitoring data in the digital era necessitates advanced security measures to protect sensitive medical information. While current encryption methods are effective, the looming threat of quantum computing, capable of breaking existing cryptographic protocols, demands a shift towards more resilient solutions. This paper introduces a novel security framework that integrates post-quantum cryptography with modern encryption to ensure robust protection. The proposed framework achieves high information entropy of 7.99, indicating strong security through unpredictability. It is designed for efficiency, with an average execution time of just 0.2856 μs, ensuring quick data processing without compromising security. Additionally, the framework operates with minimal power consumption, requiring only 1.4 mA, making it suitable for IoT-based medical systems where resource efficiency is critical. This approach not only secures current health monitoring scenario but also prepares them for future quantum threats, offering a comprehensive, efficient, and forward-looking solution to protect sensitive medical data in an increasingly interconnected world.
{"title":"Low power post quantum cryptography on reconfigurable devices for cyber physical system in IoT","authors":"Ankita Sarkar, Mansi Jhamb","doi":"10.1016/j.vlsi.2025.102636","DOIUrl":"10.1016/j.vlsi.2025.102636","url":null,"abstract":"<div><div>The rapid increase in health monitoring data in the digital era necessitates advanced security measures to protect sensitive medical information. While current encryption methods are effective, the looming threat of quantum computing, capable of breaking existing cryptographic protocols, demands a shift towards more resilient solutions. This paper introduces a novel security framework that integrates post-quantum cryptography with modern encryption to ensure robust protection. The proposed framework achieves high information entropy of 7.99, indicating strong security through unpredictability. It is designed for efficiency, with an average execution time of just 0.2856 μs, ensuring quick data processing without compromising security. Additionally, the framework operates with minimal power consumption, requiring only 1.4 mA, making it suitable for IoT-based medical systems where resource efficiency is critical. This approach not only secures current health monitoring scenario but also prepares them for future quantum threats, offering a comprehensive, efficient, and forward-looking solution to protect sensitive medical data in an increasingly interconnected world.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102636"},"PeriodicalIF":2.5,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145839826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Von Neumann architectures suffer from data transfer bottlenecks which can be circumvented by performing computation directly inside memory arrays. This work presents a low-power 10T static random-access memory (SRAM) cell-based compute-in-memory (CiM) architecture designed with 18 nm FinFET technology in the Cadence Virtuoso tool, specifically implementing full adder (FA) and full subtractor (FS) operations. Compared to the 8T SRAM cell-based CiM architecture, the proposed architecture achieves 1.33x lower delay, 3.87x lower power, 5.82x better power-delay-product (PDP), and 8.8x better energy-delay-product (EDP) for performing FA operations. For FS operations, proposed 10T SRAM cell-based CiM architecture achieves 2.1x lower delay, 3.86x lower power, 8.3x better PDP, and 17.8x better EDP. The transistor count is reduced by 2.56x (126T–49T) for both FA and FS, minimizing area and design complexity. Monte Carlo simulations and process-temperature analyses further confirm, that the proposed architecture demonstrates greater robustness and stability under variations. The proposed architecture shows strong potential for use in complex neural networks.
{"title":"Highly robust power efficient Full Adder and Full Subtractor CiM architecture using 10T SRAM cell","authors":"Madan Mohan Sharma , Ananya Kabba , Kulbhushan Sharma , Pankaj Kumar","doi":"10.1016/j.vlsi.2025.102639","DOIUrl":"10.1016/j.vlsi.2025.102639","url":null,"abstract":"<div><div>Von Neumann architectures suffer from data transfer bottlenecks which can be circumvented by performing computation directly inside memory arrays. This work presents a low-power 10T static random-access memory (SRAM) cell-based compute-in-memory (CiM) architecture designed with 18 nm FinFET technology in the Cadence Virtuoso tool, specifically implementing full adder (FA) and full subtractor (FS) operations. Compared to the 8T SRAM cell-based CiM architecture, the proposed architecture achieves 1.33x lower delay, 3.87x lower power, 5.82x better power-delay-product (PDP), and 8.8x better energy-delay-product (EDP) for performing FA operations. For FS operations, proposed 10T SRAM cell-based CiM architecture achieves 2.1x lower delay, 3.86x lower power, 8.3x better PDP, and 17.8x better EDP. The transistor count is reduced by 2.56x (126T–49T) for both FA and FS, minimizing area and design complexity. Monte Carlo simulations and process-temperature analyses further confirm, that the proposed architecture demonstrates greater robustness and stability under variations. The proposed architecture shows strong potential for use in complex neural networks.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102639"},"PeriodicalIF":2.5,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145883658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-21DOI: 10.1016/j.vlsi.2025.102637
Manali Dhar , Chiradeep Mukherjee , Saradindu Panda , Bansibadan Maji , Aurpan Majumder
Quantum Cellular Automata (QCA) is a promising technology that offers an alternative to conventional Metal Oxide Semiconductor (MOS) approaches for designing efficient, high-performance logic circuits. In present quantum technologies, there is a growing demand for QCA circuits to meet the requirements of high speed, energy efficiency, and device density. However, due to their nanoscale dimensions and complex fabrication processes, QCA circuits are inherently prone to defects, which significantly affect circuit reliability, energy efficiency, and design robustness. This paper explores the innovative research on the prediction of energy dissipation of QCA Layered T (QCA LT) Ex-OR, Ex-NOR, and 4-bit Binary to Gray (BTG) converter circuits under single-cell displacement defect (SCDD) and cell polarization using machine learning models. Firstly, QCA logic gates are selected and realized by LT logic over Majority voter (MV), where logic reduction methodologies are used using the coherence vector (watt/energy) simulation engine of QCADesigner-E. Both horizontal and vertical SCDD are applied to the output cell of the LT design, resulting in variations in the polarization and energy dissipation, acquired dataset scdd_polarization_energy (SPE Version 2). To assess the energy dissipation of QCA LT designs from the original dataset, Machine Learning (ML) models have been used. Consequently, the best-fitting machine learning models for prediction are identified as K-Nearest Neighbour (KNN), Random Forest (RF), and Polynomial Regression (PR). These models are evaluated based on the R2 Score, mean absolute error (MAE), mean squared error (MSE), and root mean squared error (RMSE). Based on evaluation parameters, the optimal machine learning model has been identified for each of the SCDD's directions.
量子元胞自动机(QCA)是一种很有前途的技术,它为设计高效、高性能的逻辑电路提供了传统金属氧化物半导体(MOS)方法的替代方案。在当前的量子技术中,为了满足高速、高能效和器件密度的要求,对QCA电路的需求日益增长。然而,由于其纳米级尺寸和复杂的制造工艺,QCA电路天生就容易出现缺陷,这严重影响了电路的可靠性、能效和设计稳健性。本文利用机器学习模型对QCA分层T (QCA LT) Ex-OR、Ex-NOR和4位二进制到灰色(BTG)转换电路在单细胞位移缺陷(SCDD)和细胞极化条件下的能量耗散进行了创新性研究。首先,QCA逻辑门的选择和实现是基于多数派选民(MV)的LT逻辑,其中使用qcaddesigner - e的相干向量(瓦特/能量)仿真引擎使用逻辑约简方法。获得的数据集scdd_polarization_energy (SPE Version 2)显示,水平和垂直SCDD都应用于LT设计的输出单元,导致极化和能量耗散的变化。为了从原始数据集评估QCA LT设计的能量耗散,使用了机器学习(ML)模型。因此,用于预测的最佳拟合机器学习模型被确定为k近邻(KNN)、随机森林(RF)和多项式回归(PR)。根据R2评分、平均绝对误差(MAE)、均方误差(MSE)和均方根误差(RMSE)对这些模型进行评估。基于评估参数,确定了SCDD各个方向的最佳机器学习模型。
{"title":"Predictive analysis of energy dissipation in Layered-T QCA circuits under cell displacement defects and polarization: A machine-learning approach","authors":"Manali Dhar , Chiradeep Mukherjee , Saradindu Panda , Bansibadan Maji , Aurpan Majumder","doi":"10.1016/j.vlsi.2025.102637","DOIUrl":"10.1016/j.vlsi.2025.102637","url":null,"abstract":"<div><div>Quantum Cellular Automata (QCA) is a promising technology that offers an alternative to conventional Metal Oxide Semiconductor (MOS) approaches for designing efficient, high-performance logic circuits. In present quantum technologies, there is a growing demand for QCA circuits to meet the requirements of high speed, energy efficiency, and device density. However, due to their nanoscale dimensions and complex fabrication processes, QCA circuits are inherently prone to defects, which significantly affect circuit reliability, energy efficiency, and design robustness. This paper explores the innovative research on the prediction of energy dissipation of QCA Layered T (QCA LT) Ex-OR, Ex-NOR, and 4-bit Binary to Gray (BTG) converter circuits under single-cell displacement defect (SCDD) and cell polarization using machine learning models. Firstly, QCA logic gates are selected and realized by LT logic over Majority voter (MV), where logic reduction methodologies are used using the coherence vector (watt/energy) simulation engine of QCADesigner-E. Both horizontal and vertical SCDD are applied to the output cell of the LT design, resulting in variations in the polarization and energy dissipation, acquired dataset <em>scdd_polarization_energy</em> (SPE Version 2). To assess the energy dissipation of QCA LT designs from the original dataset, Machine Learning (ML) models have been used. Consequently, the best-fitting machine learning models for prediction are identified as K-Nearest Neighbour (KNN), Random Forest (RF), and Polynomial Regression (PR). These models are evaluated based on the R<sup>2</sup> Score, mean absolute error (MAE), mean squared error (MSE), and root mean squared error (RMSE). Based on evaluation parameters, the optimal machine learning model has been identified for each of the SCDD's directions.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102637"},"PeriodicalIF":2.5,"publicationDate":"2025-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145839829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-17DOI: 10.1016/j.vlsi.2025.102632
Manhong Fan, Qingsong Liu, Shiqi Xu, Yonglong Bai
While multi-stability is a well-established phenomenon in traditional chaotic systems, it remains a largely unexplored area within the realm of neural networks. This paper proposes a method for generating the stable coexistence of multiple scroll attractors in a dual memristor synaptic Hopfield neural network (DMSHNN) under multi-level logic pulse currents. A systematic study of its dynamic behavior is conducted through methods such as bifurcation diagrams, Lyapunov exponent spectra, and phase diagrams. The research findings indicate that, under specific initial conditions, the DMSHNN system exhibits distinctive dynamic behaviors: 1. Periodic attractors and chaotic attractors not only undergo state transitions but also exhibit a phenomenon of biased coexistence; 2. Not only can transient chaos be observed in the DMSHNN system, but the application of multi-level logic pulse currents also facilitates a more stable coexistence of multiple scroll attractors when the memristor's initial conditions are altered. Subsequently, the physical feasibility of the theoretical model was validated through an STM32 digital circuit platform, and the experimental results are presented. Finally, based on the chaotic sequences generated by the DMSHNN model, a remote sensing image encryption algorithm was designed and implemented. This study not only expands the engineering applicability of the DMSHNN model through this algorithm but also provides empirical evidence for the model's chaotic dynamics and the practicality, feasibility, and security of the resultant image encryption algorithm.
{"title":"Dynamics analysis and application of multi-stable Hopfield neural networks under pulsed current stimulation","authors":"Manhong Fan, Qingsong Liu, Shiqi Xu, Yonglong Bai","doi":"10.1016/j.vlsi.2025.102632","DOIUrl":"10.1016/j.vlsi.2025.102632","url":null,"abstract":"<div><div>While multi-stability is a well-established phenomenon in traditional chaotic systems, it remains a largely unexplored area within the realm of neural networks. This paper proposes a method for generating the stable coexistence of multiple scroll attractors in a dual memristor synaptic Hopfield neural network (DMSHNN) under multi-level logic pulse currents. A systematic study of its dynamic behavior is conducted through methods such as bifurcation diagrams, Lyapunov exponent spectra, and phase diagrams. The research findings indicate that, under specific initial conditions, the DMSHNN system exhibits distinctive dynamic behaviors: 1. Periodic attractors and chaotic attractors not only undergo state transitions but also exhibit a phenomenon of biased coexistence; 2. Not only can transient chaos be observed in the DMSHNN system, but the application of multi-level logic pulse currents also facilitates a more stable coexistence of multiple scroll attractors when the memristor's initial conditions are altered. Subsequently, the physical feasibility of the theoretical model was validated through an STM32 digital circuit platform, and the experimental results are presented. Finally, based on the chaotic sequences generated by the DMSHNN model, a remote sensing image encryption algorithm was designed and implemented. This study not only expands the engineering applicability of the DMSHNN model through this algorithm but also provides empirical evidence for the model's chaotic dynamics and the practicality, feasibility, and security of the resultant image encryption algorithm.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102632"},"PeriodicalIF":2.5,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145839823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-17DOI: 10.1016/j.vlsi.2025.102635
Wangyong Chen , Ling Xiong, Songxuan He, Linlin Cai
Temperature variation both within a chip and from the environment is a critical concern for modern integrated circuits, posing a significant threat to system robustness. Current temperature compensation methods, however, face the challenge of additional design costs in terms of area and power consumption. This paper introduces a novel temperature immunity-driven design methodology that leverages the zero-temperature-coefficient ZTC feature to suppress the temperature sensitivity of critical paths in digital circuits. We propose an analytical compact model to determine the ZTC point by bridging device characteristics to standard cell behavior. This enables an efficient temperature immunity-driven design technology co-optimization (DTCO) paradigm. The impacts of operating conditions, process variations, and the aging effect on device characteristics, and consequently, on the digital ZTC are thoroughly investigated. These findings are seamlessly integrated into the existing design flow. The proposed framework, featuring ZTC-aware co-optimization in the presence of process variations and time-dependent aging effects, is demonstrated effectively on benchmark circuits. This work significantly contributes to the advancement of temperature-immune digital circuit design and optimization.
{"title":"Zero-temperature-coefficient-powered design technology co-optimization for temperature-immune digital circuits","authors":"Wangyong Chen , Ling Xiong, Songxuan He, Linlin Cai","doi":"10.1016/j.vlsi.2025.102635","DOIUrl":"10.1016/j.vlsi.2025.102635","url":null,"abstract":"<div><div>Temperature variation both within a chip and from the environment is a critical concern for modern integrated circuits, posing a significant threat to system robustness. Current temperature compensation methods, however, face the challenge of additional design costs in terms of area and power consumption. This paper introduces a novel temperature immunity-driven design methodology that leverages the zero-temperature-coefficient ZTC feature to suppress the temperature sensitivity of critical paths in digital circuits. We propose an analytical compact model to determine the ZTC point by bridging device characteristics to standard cell behavior. This enables an efficient temperature immunity-driven design technology co-optimization (DTCO) paradigm. The impacts of operating conditions, process variations, and the aging effect on device characteristics, and consequently, on the digital ZTC are thoroughly investigated. These findings are seamlessly integrated into the existing design flow. The proposed framework, featuring ZTC-aware co-optimization in the presence of process variations and time-dependent aging effects, is demonstrated effectively on benchmark circuits. This work significantly contributes to the advancement of temperature-immune digital circuit design and optimization.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102635"},"PeriodicalIF":2.5,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145839820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-16DOI: 10.1016/j.vlsi.2025.102634
Sajad Eydivandi , Hakem Beitollahi
Efficient hardware acceleration is crucial for real-time object detection using YOLO models, particularly on FPGA-based platforms. This paper presents SARPAR, a high-performance, reconfigurable accelerator designed at the Register Transfer Level (RTL). Unlike previous works that rely on High-Level Synthesis (HLS), SARPAR fully optimizes FPGA resources by carefully managing dataflow, memory bandwidth, and computation parallelism. The architecture employs 16-bit fixed-point precision, a ping-pong buffering mechanism, and systolic computation for both normal and pointwise convolutions, significantly enhancing performance. Implemented on a Zynq UltraScale+ MPSoC, SARPAR operates at 300 MHz, achieving efficient feature map loading and processing while considering off-chip memory bandwidth. Our findings highlight a significant performance advantage over state-of-the-art YOLO accelerators, delivering a throughput of 1382 TOP/s while operating at a power consumption of 5.15 watts. Our approach achieves a 183.97% improvement in energy efficiency compared to existing YOLO accelerators developed on FPGA.
{"title":"SARPAR: Systolic ARray Pallet-Integrated AcceleratoR for YOLO models on FPGA","authors":"Sajad Eydivandi , Hakem Beitollahi","doi":"10.1016/j.vlsi.2025.102634","DOIUrl":"10.1016/j.vlsi.2025.102634","url":null,"abstract":"<div><div>Efficient hardware acceleration is crucial for real-time object detection using YOLO models, particularly on FPGA-based platforms. This paper presents SARPAR, a high-performance, reconfigurable accelerator designed at the Register Transfer Level (RTL). Unlike previous works that rely on High-Level Synthesis (HLS), SARPAR fully optimizes FPGA resources by carefully managing dataflow, memory bandwidth, and computation parallelism. The architecture employs 16-bit fixed-point precision, a ping-pong buffering mechanism, and systolic computation for both normal and pointwise convolutions, significantly enhancing performance. Implemented on a Zynq UltraScale+ MPSoC, SARPAR operates at 300 MHz, achieving efficient feature map loading and processing while considering off-chip memory bandwidth. Our findings highlight a significant performance advantage over state-of-the-art YOLO accelerators, delivering a throughput of 1382 TOP/s while operating at a power consumption of 5.15 watts. Our approach achieves a 183.97% improvement in energy efficiency compared to existing YOLO accelerators developed on FPGA.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102634"},"PeriodicalIF":2.5,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145839824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-15DOI: 10.1016/j.vlsi.2025.102633
M. Maria Rubiston , B.R. Tapas Bapu
Hardware security remains a significant concern because Very Large Scale Integration (VLSI) circuits have become increasingly complex, and industries have begun utilizing untrusted third-party Intellectual Property. Security threats from Hardware Trojans (HTs) remain particularly dangerous since these devices create unethical modifications that break circuit integrity while challenging reliability and damaging confidentiality. Current HT detection methods struggle to scale properly and maintain high accuracy rates due to malicious Trojan design strategies, as well as the constraints of functional testing, side-channel evaluation, and formal verification techniques. To address these challenges, this research introduces DGCoNet-GBOA, a Diffusion Kernel Attention Network with Deformable Graph Convolutional Network-Based Security Framework optimized using the Gooseneck Barnacle Optimization Algorithm (GBOA) for real-time and highly accurate HT detection. The proposed framework extracts structural, power, and transition probability features using Scale-aware Modulation Meet Transformer (S-ammT) and balances the dataset using Diminishing Batch Normalization (DimBN). The DGCoNet framework analyses gate-level netlists (GLNs) as graphical networks to identify HT development changes, and GBOA uses optimization methods that boost detection precision capabilities. The model displays precise Trojan detection abilities, achieving 99.87 % accuracy with just 0.12 % false positive occurrences and 99.91 % precision when testing ISCAS'85 and ISCAS'89 benchmark systems. The proposed DGCoNet-GBOA method achieves an average 0.7–4.5 % improvement in accuracy over existing state-of-the-art approaches across ISCAS'85 and ISCAS'89 benchmarks. The framework built in this research provides scalable, high-reliability HT detection capabilities to safeguard VLSI circuits from present-day hardware security threats during semiconductor design.
{"title":"Identifying malicious modules using deformable graph convolutional network-based security framework for reliable VLSI circuit protection","authors":"M. Maria Rubiston , B.R. Tapas Bapu","doi":"10.1016/j.vlsi.2025.102633","DOIUrl":"10.1016/j.vlsi.2025.102633","url":null,"abstract":"<div><div>Hardware security remains a significant concern because Very Large Scale Integration (VLSI) circuits have become increasingly complex, and industries have begun utilizing untrusted third-party Intellectual Property. Security threats from Hardware Trojans (HTs) remain particularly dangerous since these devices create unethical modifications that break circuit integrity while challenging reliability and damaging confidentiality. Current HT detection methods struggle to scale properly and maintain high accuracy rates due to malicious Trojan design strategies, as well as the constraints of functional testing, side-channel evaluation, and formal verification techniques. To address these challenges, this research introduces DGCoNet-GBOA, a Diffusion Kernel Attention Network with Deformable Graph Convolutional Network-Based Security Framework optimized using the Gooseneck Barnacle Optimization Algorithm (GBOA) for real-time and highly accurate HT detection. The proposed framework extracts structural, power, and transition probability features using Scale-aware Modulation Meet Transformer (S-ammT) and balances the dataset using Diminishing Batch Normalization (DimBN). The DGCoNet framework analyses gate-level netlists (GLNs) as graphical networks to identify HT development changes, and GBOA uses optimization methods that boost detection precision capabilities. The model displays precise Trojan detection abilities, achieving 99.87 % accuracy with just 0.12 % false positive occurrences and 99.91 % precision when testing ISCAS'85 and ISCAS'89 benchmark systems. The proposed DGCoNet-GBOA method achieves an average 0.7–4.5 % improvement in accuracy over existing state-of-the-art approaches across ISCAS'85 and ISCAS'89 benchmarks. The framework built in this research provides scalable, high-reliability HT detection capabilities to safeguard VLSI circuits from present-day hardware security threats during semiconductor design.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"108 ","pages":"Article 102633"},"PeriodicalIF":2.5,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-13DOI: 10.1016/j.vlsi.2025.102626
Thi Diem Tran , Minh Tan Ha , Xuan Thao Tran , Ngoc Quoc Tran , Vu Trung Duong Le , Hoai Luan Pham , Van Tinh Nguyen
Deep Neural Networks (DNNs) have achieved remarkable success in diverse applications such as image classification, signal processing, and video analysis. Despite their effectiveness, these models require substantial computational resources, making FPGA-based hardware acceleration a critical enabler for real-time deployment. However, current methods for mapping DNNs to hardware have experienced limited adoption, mainly because software developers often lack the specialized hardware expertise needed for efficient implementation. High-Level Synthesis (HLS) tools were introduced to bridge this gap, but they typically confine designs to fixed platforms and simple network structures. Most existing tools support only standard architectures like VGG or ResNet with predefined parameters, offering little flexibility for customization and restricting deployment to specific FPGA devices. To address these limitations, we introduce Py2C, an automated framework that converts AI models from Python to C. Py2C supports a wide range of DNN architectures, from basic convolutional and pooling layers with variable window sizes to advanced models such as VGG, ResNet, InceptionNet, ShuffleNet, NambaNet, and YOLO. Integrated with Xilinx’s Vitis HLS, Py2C forms the Py2RTL flow, enabling register-transfer level (RTL) generation with custom-precision arithmetic and cross-platform verification. Validated on multiple networks, Py2C has demonstrated superior hardware efficiency and power reduction, particularly in QRS detection for ECG signals. By streamlining the AI-to-RTL conversion process, Py2C makes FPGA-based AI deployment both high-performance and accessible.
{"title":"An innovative HLS framework for all network architectures: From Python to SoC","authors":"Thi Diem Tran , Minh Tan Ha , Xuan Thao Tran , Ngoc Quoc Tran , Vu Trung Duong Le , Hoai Luan Pham , Van Tinh Nguyen","doi":"10.1016/j.vlsi.2025.102626","DOIUrl":"10.1016/j.vlsi.2025.102626","url":null,"abstract":"<div><div>Deep Neural Networks (DNNs) have achieved remarkable success in diverse applications such as image classification, signal processing, and video analysis. Despite their effectiveness, these models require substantial computational resources, making FPGA-based hardware acceleration a critical enabler for real-time deployment. However, current methods for mapping DNNs to hardware have experienced limited adoption, mainly because software developers often lack the specialized hardware expertise needed for efficient implementation. High-Level Synthesis (HLS) tools were introduced to bridge this gap, but they typically confine designs to fixed platforms and simple network structures. Most existing tools support only standard architectures like VGG or ResNet with predefined parameters, offering little flexibility for customization and restricting deployment to specific FPGA devices. To address these limitations, we introduce Py2C, an automated framework that converts AI models from Python to C. Py2C supports a wide range of DNN architectures, from basic convolutional and pooling layers with variable window sizes to advanced models such as VGG, ResNet, InceptionNet, ShuffleNet, NambaNet, and YOLO. Integrated with Xilinx’s Vitis HLS, Py2C forms the Py2RTL flow, enabling register-transfer level (RTL) generation with custom-precision arithmetic and cross-platform verification. Validated on multiple networks, Py2C has demonstrated superior hardware efficiency and power reduction, particularly in QRS detection for ECG signals. By streamlining the AI-to-RTL conversion process, Py2C makes FPGA-based AI deployment both high-performance and accessible.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102626"},"PeriodicalIF":2.5,"publicationDate":"2025-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145839830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}