Pub Date : 2025-12-12DOI: 10.1016/j.vlsi.2025.102630
Maoqun Yao, Xiaole Zhang
This paper proposes a design method for current-mode multi-operand addition circuits. The approach temporarily stacks carry signals — which would typically be computed in subsequent stages — within the current stage, and employs a bit-by-bit modulus operation to calculate the remainder for each digit. The integer quotient is then propagated to higher digits, while the final result is composed of the remainders from all digits. Circuits designed using this method feature a shortened critical path in current-mode multi-operand addition and exhibit low hardware cost. In SPICE simulations, the proposed circuit achieved approximately 35% lower average power consumption compared to full-adders from relevant literature, along with higher operating speed and fewer transistors. Since inter-stage carry outputs can exceed the representation range of the current digit, current-mode signals between stages are allowed to surpass conventional logic limits, making it possible to further reduce cost by increasing internal logical values. A 15-operand summation circuit designed with this method demonstrated correct logical functionality, achieving a 52% reduction in transistor count and a 33% shortening of the critical path.
{"title":"Design of CMOS current-mode multi-operand addition circuit based on carry stack","authors":"Maoqun Yao, Xiaole Zhang","doi":"10.1016/j.vlsi.2025.102630","DOIUrl":"10.1016/j.vlsi.2025.102630","url":null,"abstract":"<div><div>This paper proposes a design method for current-mode multi-operand addition circuits. The approach temporarily stacks carry signals — which would typically be computed in subsequent stages — within the current stage, and employs a bit-by-bit modulus operation to calculate the remainder for each digit. The integer quotient is then propagated to higher digits, while the final result is composed of the remainders from all digits. Circuits designed using this method feature a shortened critical path in current-mode multi-operand addition and exhibit low hardware cost. In SPICE simulations, the proposed circuit achieved approximately 35% lower average power consumption compared to full-adders from relevant literature, along with higher operating speed and fewer transistors. Since inter-stage carry outputs can exceed the representation range of the current digit, current-mode signals between stages are allowed to surpass conventional logic limits, making it possible to further reduce cost by increasing internal logical values. A 15-operand summation circuit designed with this method demonstrated correct logical functionality, achieving a 52% reduction in transistor count and a 33% shortening of the critical path.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102630"},"PeriodicalIF":2.5,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145839828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-11DOI: 10.1016/j.vlsi.2025.102629
Minghao Tang , Ming Ling , Minhua Ren , Zhihua Cai , Zhen Liu , Shidi Tang , Jianjun Li
To reduce computational demands on neural networks, pruning and quantization are commonly employed to lightweight models. These two approaches are typically viewed as orthogonal. However, this perception is limited, and further exploration of their intrinsic connection is required. Consequently, a heuristic algorithm, referred to as Hessian-Cooptimized Sparsity-Quantization (HCSQ), is proposed. This is the first algorithm to unify the intrinsic connection between quantization and semi-structured pruning through second-order Hessian information. Our algorithm introduces the concept of sensitivity through Hessian information, with further fine-tuning of layer-level sensitivity by adjusting the N:M sparsity ratio within layers, and it maximizes the utilization of quantization bit width. The evaluation of three lightweight models (ResNet20, ResNet18 and MobileNetV2) is conducted in four datasets (ImageNet, Tiny-ImageNet, CIFAR-10 and CIFAR-100), reaching a maximum compression ratio of ranges from 14.96 to 28.58 without reducing original accuracy( loss), better than state-of-the-art performance under the comparable accuracy loss. Furthermore, ablation experiments are conducted within the open source processor. In some layers, it achieves an acceleration of up to 4.79, and the entire model’s inference cycle time is reduced to 45%, compared to the ablation experiment. This demonstrates that the efficacy of the proposed algorithm extends beyond mere model compression; it also enhances hardware utilization when focus on the specific hardware designs.
{"title":"Hessian-driven N:M sparsity and quantization co-optimization for edge device deployment","authors":"Minghao Tang , Ming Ling , Minhua Ren , Zhihua Cai , Zhen Liu , Shidi Tang , Jianjun Li","doi":"10.1016/j.vlsi.2025.102629","DOIUrl":"10.1016/j.vlsi.2025.102629","url":null,"abstract":"<div><div>To reduce computational demands on neural networks, pruning and quantization are commonly employed to lightweight models. These two approaches are typically viewed as orthogonal. However, this perception is limited, and further exploration of their intrinsic connection is required. Consequently, a heuristic algorithm, referred to as Hessian-Cooptimized Sparsity-Quantization (HCSQ), is proposed. This is the first algorithm to unify the intrinsic connection between quantization and semi-structured pruning through second-order Hessian information. Our algorithm introduces the concept of sensitivity through Hessian information, with further fine-tuning of layer-level sensitivity by adjusting the N:M sparsity ratio within layers, and it maximizes the utilization of quantization bit width. The evaluation of three lightweight models (ResNet20, ResNet18 and MobileNetV2) is conducted in four datasets (ImageNet, Tiny-ImageNet, CIFAR-10 and CIFAR-100), reaching a maximum compression ratio of ranges from 14.96<span><math><mo>×</mo></math></span> to 28.58<span><math><mo>×</mo></math></span> without reducing original accuracy(<span><math><mrow><mo><</mo><mn>1</mn><mtext>%</mtext></mrow></math></span> loss), better than state-of-the-art performance under the comparable accuracy loss. Furthermore, ablation experiments are conducted within the open source processor. In some layers, it achieves an acceleration of up to 4.79<span><math><mo>×</mo></math></span>, and the entire model’s inference cycle time is reduced to 45%, compared to the ablation experiment. This demonstrates that the efficacy of the proposed algorithm extends beyond mere model compression; it also enhances hardware utilization when focus on the specific hardware designs.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102629"},"PeriodicalIF":2.5,"publicationDate":"2025-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145839825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fault diagnosis in analog and digital Very Large Scale Integration (VLSI) circuits is essential for ensuring reliable operation and performance. These circuits are increasingly complex due to miniaturization and high integration levels. Advanced circuits are susceptible to various faults including transient, permanent and intermittent types. Detecting and accurately diagnosing these faults remains major challenge due to signal complexity and noise. Therefore, this research proposes a novel model of Advanced Fault Diagnosis in Analog and Digital VLSI Circuits utilizing Optimized Multi-Anchor Space-Aware Temporal Convolutional Neural Network for Efficient Circuit Reliability Assessment (FDAD-VLSI- MSTCNN). The objective is to accurately detect and locate faults in analog and digital VLSI circuits to ensure reliable circuit performance. It aims to enhance circuit functionality by enabling optimal recovery of faulty designs. The proposed process begins with collecting input signals with frequency responses. The collected input signal is given to pre-processing using Robust Maximum Correntropy Kalman Filter (RMCKF) to remove noise. The Multidimensional Empirical Mode Decomposition (MEMD) is applied to decompose complex, non-stationary, nonlinear signals into simpler intrinsic mode functions (IMFs). These components undergo feature extraction using the Lifted Euler Characteristic Transform (LECT) extract mean, Standard Deviation (SD), kurtosis, skewness, Relative Entropy (RE), and minimum and maximum values features. Then, the extracted feature is given to Multi-Anchor Space-Aware Temporal Convolutional Neural Network (MSTCNN)to identify the fault locations for diagnosing fault in analog and digital VLSI circuits. The Divine Religions Algorithm (DRA) to recover the faulty circuit and restore normal circuit operation. Then the proposed FDAD-VLSI-MSTCNN is examined using performance metrics like Accuracy, Precision, Recall, F1-Score, Specificity, Receiver Operating Characteristic Curve (ROC), Computational Time and Execution Time. The proposed FDAD-VLSI-MSTCNN method provides 99.42 % higher accuracy, 98.34 % higher precision and 98.88 % higher recall while compared with existing methods like Soft fault detection in analog circuits using voltage feature extraction and supervised learning (SFDAC-VFE-SL), an investigation of extreme learning machine-based fault diagnosis to identify faulty components in analog circuits (FD-IFCAC-ELM) and detecting and classifying parametric faults in analog circuits using optimized attention neural networks (DCPF-AC-ANN) respectively.
{"title":"Advanced fault diagnosis in analog and digital VLSI circuits utilizing multi-anchor space-aware temporal convolutional neural network for efficient circuit reliability assessment","authors":"Divya Arivalagan , O. Vignesh , S.S. Abinayaa , V.S. Nishok","doi":"10.1016/j.vlsi.2025.102631","DOIUrl":"10.1016/j.vlsi.2025.102631","url":null,"abstract":"<div><div>Fault diagnosis in analog and digital Very Large Scale Integration (VLSI) circuits is essential for ensuring reliable operation and performance. These circuits are increasingly complex due to miniaturization and high integration levels. Advanced circuits are susceptible to various faults including transient, permanent and intermittent types. Detecting and accurately diagnosing these faults remains major challenge due to signal complexity and noise. Therefore, this research proposes a novel model of Advanced Fault Diagnosis in Analog and Digital VLSI Circuits utilizing Optimized Multi-Anchor Space-Aware Temporal Convolutional Neural Network for Efficient Circuit Reliability Assessment (FDAD-VLSI- MSTCNN). The objective is to accurately detect and locate faults in analog and digital VLSI circuits to ensure reliable circuit performance. It aims to enhance circuit functionality by enabling optimal recovery of faulty designs. The proposed process begins with collecting input signals with frequency responses. The collected input signal is given to pre-processing using Robust Maximum Correntropy Kalman Filter (RMCKF) to remove noise. The Multidimensional Empirical Mode Decomposition (MEMD) is applied to decompose complex, non-stationary, nonlinear signals into simpler intrinsic mode functions (IMFs). These components undergo feature extraction using the Lifted Euler Characteristic Transform (LECT) extract mean, Standard Deviation (SD), kurtosis, skewness, Relative Entropy (RE), and minimum and maximum values features. Then, the extracted feature is given to Multi-Anchor Space-Aware Temporal Convolutional Neural Network (MSTCNN)to identify the fault locations for diagnosing fault in analog and digital VLSI circuits. The Divine Religions Algorithm (DRA) to recover the faulty circuit and restore normal circuit operation. Then the proposed FDAD-VLSI-MSTCNN is examined using performance metrics like Accuracy, Precision, Recall, F1-Score, Specificity, Receiver Operating Characteristic Curve (ROC), Computational Time and Execution Time. The proposed FDAD-VLSI-MSTCNN method provides 99.42 % higher accuracy, 98.34 % higher precision and 98.88 % higher recall while compared with existing methods like Soft fault detection in analog circuits using voltage feature extraction and supervised learning (SFDAC-VFE-SL), an investigation of extreme learning machine-based fault diagnosis to identify faulty components in analog circuits (FD-IFCAC-ELM) and detecting and classifying parametric faults in analog circuits using optimized attention neural networks (DCPF-AC-ANN) respectively.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102631"},"PeriodicalIF":2.5,"publicationDate":"2025-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145839821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hardware Trojans are emerging malicious integrated circuit (IC) modifications that pose a significant threat to the integrity of electronics. While existing methods, such as functional testing and reverse engineering, are proposed to identify Trojan anomalies in electronics, their applicability to industrial pipelines is limited. This paper proposes a new image processing technique for efficient clustering and identification of Hardware Trojan insertion in integrated circuits. The uniqueness of the proposed AI-assisted image processing method relies on using real hardware to generate images using side-channel analysis (SCA) before applying unsupervised image classification to identify the impact of hardware Trojans without the need for costly golden references. Leveraging Machine Learning on side-channel data collected from Ring-Oscillator networks, image and digital signal processing are employed to extract features for detection. This research contributes a novel use of side-channel data as images, eliminating the reliance on golden references, and achieving a remarkable accuracy of 95% in Hardware Trojan detection. In addition to significantly advancing the field and addressing crucial challenges in semiconductor supply chain, making it a significant step toward securing it.
{"title":"AI-enabled image processing approach for efficient clustering and identification of hardware Trojans","authors":"Ashutosh Ghimire , Mohammed Alkurdi , Saraju Mohanty , Fathi Amsaad","doi":"10.1016/j.vlsi.2025.102628","DOIUrl":"10.1016/j.vlsi.2025.102628","url":null,"abstract":"<div><div>Hardware Trojans are emerging malicious integrated circuit (IC) modifications that pose a significant threat to the integrity of electronics. While existing methods, such as functional testing and reverse engineering, are proposed to identify Trojan anomalies in electronics, their applicability to industrial pipelines is limited. This paper proposes a new image processing technique for efficient clustering and identification of Hardware Trojan insertion in integrated circuits. The uniqueness of the proposed AI-assisted image processing method relies on using real hardware to generate images using side-channel analysis (SCA) before applying unsupervised image classification to identify the impact of hardware Trojans without the need for costly golden references. Leveraging Machine Learning on side-channel data collected from Ring-Oscillator networks, image and digital signal processing are employed to extract features for detection. This research contributes a novel use of side-channel data as images, eliminating the reliance on golden references, and achieving a remarkable accuracy of 95% in Hardware Trojan detection. In addition to significantly advancing the field and addressing crucial challenges in semiconductor supply chain, making it a significant step toward securing it.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102628"},"PeriodicalIF":2.5,"publicationDate":"2025-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145839822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-06DOI: 10.1016/j.vlsi.2025.102627
R. Sindhu, V. Arunachalam
The activation functions (AF) such as and are essential in a Long-Short Term Memory (LSTM) cell for time series classification using a Recurrent Neural Network (RNN). These AFs regulate the data flow effectively and optimize memory requirements in LSTM cells. Hardware realizations of these AFs are complex; consequently, approximation strategies must be adopted. The piece-wise linearization (PWL) method is appropriate for hardware implementations. A 7-segment PWL-based approximate , is proposed here. Employing a MATLAB-based error analysis, an optimum fixed-point data format (1-bit sign, 2-bit integer, 8-bit fraction) is chosen. The function is implemented with parallel segment selection and two 10-bit adders using TSMC 65 nm technology libraries. This architecture uses 356.4 μm2 area and consumes 230.7 μW at 1.67 GHz. Later, an approximate , is implemented using the module with two shifters, a complement and an 11-bit adder. It uses a 462.4 μm2 area and consumes 324.2 μW power at 1.25 GHz. An approximate LSTM cell with the proposed and functions are modelled using Python 3.2 and tested with the Italian Parkinson's dataset. The approximate LSTM cell produces closer classification metrics with a maximum deviation of 0.21 % from the exact LSTM cell.
{"title":"Hardware efficient approximate activation functions for a Long-Short-Term Memory cell","authors":"R. Sindhu, V. Arunachalam","doi":"10.1016/j.vlsi.2025.102627","DOIUrl":"10.1016/j.vlsi.2025.102627","url":null,"abstract":"<div><div>The activation functions (AF) such as <span><math><mrow><mi>s</mi><mi>i</mi><mi>g</mi><mi>m</mi><mi>o</mi><mi>i</mi><mi>d</mi><mrow><mo>(</mo><mi>x</mi><mo>)</mo></mrow></mrow></math></span> and <span><math><mrow><mi>tanh</mi><mrow><mo>(</mo><mi>x</mi><mo>)</mo></mrow></mrow></math></span> are essential in a Long-Short Term Memory (LSTM) cell for time series classification using a Recurrent Neural Network (RNN). These AFs regulate the data flow effectively and optimize memory requirements in LSTM cells. Hardware realizations of these AFs are complex; consequently, approximation strategies must be adopted. The piece-wise linearization (PWL) method is appropriate for hardware implementations. A 7-segment PWL-based approximate <span><math><mrow><mi>tanh</mi><mrow><mo>(</mo><mi>x</mi><mo>)</mo></mrow></mrow></math></span>, <span><math><mrow><mi>t</mi><mrow><mo>(</mo><msub><mi>x</mi><mn>8</mn></msub><mo>)</mo></mrow></mrow></math></span> is proposed here. Employing a MATLAB-based error analysis, an optimum fixed-point data format (1-bit sign, 2-bit integer, 8-bit fraction) is chosen. The function <span><math><mrow><mi>t</mi><mrow><mo>(</mo><msub><mi>x</mi><mn>8</mn></msub><mo>)</mo></mrow></mrow></math></span> is implemented with parallel segment selection and two 10-bit adders using TSMC 65 nm technology libraries. This architecture uses 356.4 μm<sup>2</sup> area and consumes 230.7 μW at 1.67 GHz. Later, an approximate <span><math><mrow><mi>s</mi><mi>i</mi><mi>g</mi><mi>m</mi><mi>o</mi><mi>i</mi><mi>d</mi><mrow><mo>(</mo><mi>x</mi><mo>)</mo></mrow></mrow></math></span>, <span><math><mrow><mi>σ</mi><mrow><mo>(</mo><msub><mi>x</mi><mn>8</mn></msub><mo>)</mo></mrow></mrow></math></span> is implemented using the <span><math><mrow><mi>t</mi><mrow><mo>(</mo><msub><mi>x</mi><mn>8</mn></msub><mo>)</mo></mrow></mrow></math></span> module with two shifters, a complement and an 11-bit adder. It uses a 462.4 μm<sup>2</sup> area and consumes 324.2 μW power at 1.25 GHz. An approximate LSTM cell with the proposed <span><math><mrow><mi>t</mi><mrow><mo>(</mo><msub><mi>x</mi><mn>8</mn></msub><mo>)</mo></mrow></mrow></math></span> and <span><math><mrow><mi>σ</mi><mrow><mo>(</mo><msub><mi>x</mi><mn>8</mn></msub><mo>)</mo></mrow></mrow></math></span> functions are modelled using Python 3.2 and tested with the Italian Parkinson's dataset. The approximate LSTM cell produces closer classification metrics with a maximum deviation of 0.21 % from the exact LSTM cell.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102627"},"PeriodicalIF":2.5,"publicationDate":"2025-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145737413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This high-speed, power-efficient content addressable memory (CAM) uses parallel lookups to match quickly without sacrificing power consumption. It introduces three key contributions: i. Pre-charge free operation, which improves search speed and reduces power requirements by eliminating node charging time, ii. A Hybrid Match Line (HML) structure that strategically balances power and delay, combining the high-speed attributes of NOR with the low-power attributes of NAND, and iii. Local searching technique ascertain further improvement in search time. Performance indicators improve greatly when these methods are seamlessly integrated. Utilizing 45 nm CMOS technology, the design supports diverse process voltages, temperatures, and frequencies for a 64x32 memory array. Monte Carlo simulations verify design stability. The proposed architecture outperforms the leading benchmark in speed and power-delay-product (PDP) by 54.6% and 76.02%, respectively. This novel design can do repeated data searches at frequencies up to 2 GHz after a single write operation, enabling quicker and more energy-efficient data processing that could revolutionize consumer electronics. This development could revolutionize consumer electronics by improving efficiency and speed in high-performance computing, mobile devices, and IoT applications.
{"title":"Achieving superior segmented CAM efficiency with pre-charge free local search based hybrid matcher for high speed applications","authors":"Shyamosree Goswami , Adwait Wakankar , Partha Bhattacharyya , Anup Dandapat","doi":"10.1016/j.vlsi.2025.102621","DOIUrl":"10.1016/j.vlsi.2025.102621","url":null,"abstract":"<div><div>This high-speed, power-efficient content addressable memory (CAM) uses parallel lookups to match quickly without sacrificing power consumption. It introduces three key contributions: i. Pre-charge free operation, which improves search speed and reduces power requirements by eliminating node charging time, ii. A Hybrid Match Line (HML) structure that strategically balances power and delay, combining the high-speed attributes of NOR with the low-power attributes of NAND, and iii. Local searching technique ascertain further improvement in search time. Performance indicators improve greatly when these methods are seamlessly integrated. Utilizing 45 nm CMOS technology, the design supports diverse process voltages, temperatures, and frequencies for a 64x32 memory array. Monte Carlo simulations verify design stability. The proposed architecture outperforms the leading benchmark in speed and power-delay-product (PDP) by 54.6% and 76.02%, respectively. This novel design can do repeated data searches at frequencies up to 2 GHz after a single write operation, enabling quicker and more energy-efficient data processing that could revolutionize consumer electronics. This development could revolutionize consumer electronics by improving efficiency and speed in high-performance computing, mobile devices, and IoT applications.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102621"},"PeriodicalIF":2.5,"publicationDate":"2025-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145736930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-02DOI: 10.1016/j.vlsi.2025.102625
Angelos Athanasiadis , Nikolaos Tampouratzis , Ioannis Papaefstathiou
The growing demand for real-time processing in artificial intelligence applications, particularly those involving Convolutional Neural Networks (CNNs), has highlighted the need for efficient computational solutions. Conventional processors and graphical processing units (GPUs), very often, fall short in balancing performance, power consumption, and latency, especially in embedded systems and edge computing platforms. Field-Programmable Gate Arrays (FPGAs) offer a promising alternative, combining high performance with energy efficiency and reconfigurability. This paper presents a design and implementation framework for implementing CNNs seamlessly on FPGAs that maintains full precision in all neural network parameters thus addressing a niche, that of non-quantized NNs. The presented framework extends Darknet, which is very widely used for the design of CNNs, and allows the designer, by effectively using a Darknet NN description, to efficiently implement CNNs in a heterogeneous system comprising of CPUs and FPGAs. Our framework is evaluated on the implementation of a number of different CNNs and as part of a real world application utilizing UAVs; in all cases it outperforms the CPU and GPU systems in terms of performance and/or power consumption. When compared with the FPGA frameworks that support quantization, our solution offers similar performance and/or energy efficiency without any degradation on the NN accuracy.
{"title":"An efficient open-source design and implementation framework for non-quantized CNNs on FPGAs","authors":"Angelos Athanasiadis , Nikolaos Tampouratzis , Ioannis Papaefstathiou","doi":"10.1016/j.vlsi.2025.102625","DOIUrl":"10.1016/j.vlsi.2025.102625","url":null,"abstract":"<div><div>The growing demand for real-time processing in artificial intelligence applications, particularly those involving Convolutional Neural Networks (CNNs), has highlighted the need for efficient computational solutions. Conventional processors and graphical processing units (GPUs), very often, fall short in balancing performance, power consumption, and latency, especially in embedded systems and edge computing platforms. Field-Programmable Gate Arrays (FPGAs) offer a promising alternative, combining high performance with energy efficiency and reconfigurability. This paper presents a design and implementation framework for implementing CNNs seamlessly on FPGAs that maintains full precision in all neural network parameters thus addressing a niche, that of non-quantized NNs. The presented framework extends Darknet, which is very widely used for the design of CNNs, and allows the designer, by effectively using a Darknet NN description, to efficiently implement CNNs in a heterogeneous system comprising of CPUs and FPGAs. Our framework is evaluated on the implementation of a number of different CNNs and as part of a real world application utilizing UAVs; in all cases it outperforms the CPU and GPU systems in terms of performance and/or power consumption. When compared with the FPGA frameworks that support quantization, our solution offers similar performance and/or energy efficiency without any degradation on the NN accuracy.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102625"},"PeriodicalIF":2.5,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145684628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The deposition of dielectric thin film in semiconductor fabrication is significantly influenced by process parameter configuration. Traditional optimization via experiments or multi-physics simulations is costly, time-consuming, and lacks flexibility. Data-driven methods that leverage production line sensor data provide a promising alternative. This work proposes a machine learning modeling framework for studying the nonlinear correlation between dielectric deposition parameters and film thickness distribution. The proposed approach is validated using historical High-Density Plasma Chemical Vapor Deposition (HDPCVD) process data collected from production runs and demonstrates strong predictive performance across multiple technology nodes. This framework achieves strong predictive performance in thin film thickness ( = 0.92) and enables practical assessment of specification compliance, achieving 79.5% accuracy in determining whether predicted thicknesses lie within the node–specific tolerances at the 14 nm node. The results suggest that data-driven modeling offers a practical, scalable, and efficient solution for process monitoring and optimization in advanced semiconductor fabrication.
{"title":"Machine-learning-driven prediction of thin film parameters for optimizing the dielectric deposition in semiconductor fabrication","authors":"Hao Wen , Enda Zhao , Qiyue Zhang , Ruofei Xiang , Wenjian Yu","doi":"10.1016/j.vlsi.2025.102617","DOIUrl":"10.1016/j.vlsi.2025.102617","url":null,"abstract":"<div><div>The deposition of dielectric thin film in semiconductor fabrication is significantly influenced by process parameter configuration. Traditional optimization via experiments or multi-physics simulations is costly, time-consuming, and lacks flexibility. Data-driven methods that leverage production line sensor data provide a promising alternative. This work proposes a machine learning modeling framework for studying the nonlinear correlation between dielectric deposition parameters and film thickness distribution. The proposed approach is validated using historical High-Density Plasma Chemical Vapor Deposition (HDPCVD) process data collected from production runs and demonstrates strong predictive performance across multiple technology nodes. This framework achieves strong predictive performance in thin film thickness (<span><math><msup><mrow><mo>R</mo></mrow><mrow><mn>2</mn></mrow></msup></math></span> = 0.92) and enables practical assessment of specification compliance, achieving 79.5% accuracy in determining whether predicted thicknesses lie within the node–specific tolerances at the 14 nm node. The results suggest that data-driven modeling offers a practical, scalable, and efficient solution for process monitoring and optimization in advanced semiconductor fabrication.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102617"},"PeriodicalIF":2.5,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145684623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-02DOI: 10.1016/j.vlsi.2025.102623
Yong Zhang , Wen-Jie Li , Guo-Jing Ge , Jin-Qiao Wang , Bo-Wen Jia , Ning Xu
The A∗ algorithm is one of the most common analog integrated circuit (IC) routing techniques. As the number of nets increases, the routing order of this heuristic routing algorithm will affect the routing results immensely. Currently, artificial intelligence (AI) technologies are widely applied in IC physical design to accelerate layout design. In this paper, we propose a reinforcement model based on net order selection. We construct multi-channel images of routing data and extract features of the coordinates of routing pins through an attention mechanism. After training, the model outputs an optimized net order, which is then used to perform routing with a bidirectional A∗ algorithm, thereby improving both the speed and efficiency of the routing process. Experimental results on cases based on 130-nm and 180-nm processes show that the proposed method can achieve a 2.5 % reduction in wire length and a 3.7 % decrease in the number of vias compared to state-of-the-art methods for analog IC routing. In terms of computational efficiency, the bidirectional A∗ algorithm improves performance by 7.3 % over the unidirectional A∗ algorithm in decision-making scenarios and by 51.09 % in the path-planning process. Simulation results further demonstrate that, compared with manual and advanced automation methods, the overall performance of the layout achieved by our method aligns most closely with schematic performance.
A *算法是最常见的模拟集成电路(IC)路由技术之一。随着网络数量的增加,启发式路由算法的路由顺序将极大地影响路由结果。目前,人工智能(AI)技术被广泛应用于集成电路物理设计中,以加速版图设计。本文提出了一种基于网络顺序选择的强化模型。我们构建多通道路由数据图像,并通过注意机制提取路由引脚的坐标特征。经过训练后,该模型输出一个优化后的净顺序,然后使用双向a *算法执行路由,从而提高了路由过程的速度和效率。基于130纳米和180纳米工艺的实验结果表明,与最先进的模拟IC布线方法相比,所提出的方法可以减少2.5%的线长和3.7%的过孔数量。在计算效率方面,双向A∗算法在决策场景中的性能比单向A∗算法提高了7.3%,在路径规划过程中提高了51.09%。仿真结果进一步表明,与人工和先进的自动化方法相比,该方法实现的布局总体性能与原理图性能最接近。
{"title":"Reinforcement learning-driven net order selection for efficient analog IC routing","authors":"Yong Zhang , Wen-Jie Li , Guo-Jing Ge , Jin-Qiao Wang , Bo-Wen Jia , Ning Xu","doi":"10.1016/j.vlsi.2025.102623","DOIUrl":"10.1016/j.vlsi.2025.102623","url":null,"abstract":"<div><div>The A∗ algorithm is one of the most common analog integrated circuit (IC) routing techniques. As the number of nets increases, the routing order of this heuristic routing algorithm will affect the routing results immensely. Currently, artificial intelligence (AI) technologies are widely applied in IC physical design to accelerate layout design. In this paper, we propose a reinforcement model based on net order selection. We construct multi-channel images of routing data and extract features of the coordinates of routing pins through an attention mechanism. After training, the model outputs an optimized net order, which is then used to perform routing with a bidirectional A∗ algorithm, thereby improving both the speed and efficiency of the routing process. Experimental results on cases based on 130-nm and 180-nm processes show that the proposed method can achieve a 2.5 % reduction in wire length and a 3.7 % decrease in the number of vias compared to state-of-the-art methods for analog IC routing. In terms of computational efficiency, the bidirectional A∗ algorithm improves performance by 7.3 % over the unidirectional A∗ algorithm in decision-making scenarios and by 51.09 % in the path-planning process. Simulation results further demonstrate that, compared with manual and advanced automation methods, the overall performance of the layout achieved by our method aligns most closely with schematic performance.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102623"},"PeriodicalIF":2.5,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145684629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01DOI: 10.1016/j.vlsi.2025.102624
Yan Xing, Zicheng Deng, Shuting Cai, Weijun Li, Xiaoming Xiong
Existing routability-driven global placers typically employed an iterative routability optimization process and performed cell inflation based only on lookahead congestion maps during each run. However, this incremental application of congestion estimation and mitigation resulted in placement solutions that deviate from optimal wirelength, thus compromising the optimization objective of balancing wirelength minimization and routability optimization. To simultaneously improve routability and reduce wirelength, this paper proposes a novel routability–wirelength co-guided cell inflation approach for global placement optimization. It employs a multi-task learning-based feature selection method, MTL-FS, to identify the optimal feature subset and train the corresponding routability–wirelength co-learning model, RWNet. During the iterative optimization process, both routability and wirelength are predicted using RWNet, and their correlation is interpreted by DeepSHAP to produce three impact maps. Subsequently, routability–wirelength co-guided cell inflation (RWCI) is performed based on an adjusted congestion map, which is derived from the predicted congestion map and the three impact maps. The experimental results on ISPD2011 and DAC2012 benchmark designs demonstrate that, compared to DERAMPlace and RoutePlacer (which represent non-machine learning-based and machine learning-based routability-driven placers, respectively), the proposed approach achieves both better optimization quality, specifically improved routability and reduced wirelength, and a decreased time cost. Moreover, the extension experiment shows our method consistently outperforms DREAMPlace (even when it uses 2D feature maps as proxies) in effectiveness while maintaining comparable efficiency. The Generalization experiment further confirms this superiority and comparable runtime, particularly in highly congested scenarios.
{"title":"Routability–wirelength co-guided cell inflation with explainable multi-task learning for global placement optimization","authors":"Yan Xing, Zicheng Deng, Shuting Cai, Weijun Li, Xiaoming Xiong","doi":"10.1016/j.vlsi.2025.102624","DOIUrl":"10.1016/j.vlsi.2025.102624","url":null,"abstract":"<div><div>Existing routability-driven global placers typically employed an iterative routability optimization process and performed cell inflation based only on lookahead congestion maps during each run. However, this incremental application of congestion estimation and mitigation resulted in placement solutions that deviate from optimal wirelength, thus compromising the optimization objective of balancing wirelength minimization and routability optimization. To simultaneously improve routability and reduce wirelength, this paper proposes a novel routability–wirelength co-guided cell inflation approach for global placement optimization. It employs a multi-task learning-based feature selection method, MTL-FS, to identify the optimal feature subset and train the corresponding routability–wirelength co-learning model, RWNet. During the iterative optimization process, both routability and wirelength are predicted using RWNet, and their correlation is interpreted by DeepSHAP to produce three impact maps. Subsequently, routability–wirelength co-guided cell inflation (RWCI) is performed based on an adjusted congestion map, which is derived from the predicted congestion map and the three impact maps. The experimental results on ISPD2011 and DAC2012 benchmark designs demonstrate that, compared to DERAMPlace and RoutePlacer (which represent non-machine learning-based and machine learning-based routability-driven placers, respectively), the proposed approach achieves both better optimization quality, specifically improved routability and reduced wirelength, and a decreased time cost. Moreover, the extension experiment shows our method consistently outperforms DREAMPlace (even when it uses 2D feature maps as proxies) in effectiveness while maintaining comparable efficiency. The Generalization experiment further confirms this superiority and comparable runtime, particularly in highly congested scenarios.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102624"},"PeriodicalIF":2.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145684627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}