首页 > 最新文献

Integration-The Vlsi Journal最新文献

英文 中文
Identifying malicious modules using deformable graph convolutional network-based security framework for reliable VLSI circuit protection 利用基于可变形图卷积网络的安全框架识别恶意模块,实现可靠的VLSI电路保护
IF 2.5 3区 工程技术 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-12-15 DOI: 10.1016/j.vlsi.2025.102633
M. Maria Rubiston , B.R. Tapas Bapu
Hardware security remains a significant concern because Very Large Scale Integration (VLSI) circuits have become increasingly complex, and industries have begun utilizing untrusted third-party Intellectual Property. Security threats from Hardware Trojans (HTs) remain particularly dangerous since these devices create unethical modifications that break circuit integrity while challenging reliability and damaging confidentiality. Current HT detection methods struggle to scale properly and maintain high accuracy rates due to malicious Trojan design strategies, as well as the constraints of functional testing, side-channel evaluation, and formal verification techniques. To address these challenges, this research introduces DGCoNet-GBOA, a Diffusion Kernel Attention Network with Deformable Graph Convolutional Network-Based Security Framework optimized using the Gooseneck Barnacle Optimization Algorithm (GBOA) for real-time and highly accurate HT detection. The proposed framework extracts structural, power, and transition probability features using Scale-aware Modulation Meet Transformer (S-ammT) and balances the dataset using Diminishing Batch Normalization (DimBN). The DGCoNet framework analyses gate-level netlists (GLNs) as graphical networks to identify HT development changes, and GBOA uses optimization methods that boost detection precision capabilities. The model displays precise Trojan detection abilities, achieving 99.87 % accuracy with just 0.12 % false positive occurrences and 99.91 % precision when testing ISCAS'85 and ISCAS'89 benchmark systems. The proposed DGCoNet-GBOA method achieves an average 0.7–4.5 % improvement in accuracy over existing state-of-the-art approaches across ISCAS'85 and ISCAS'89 benchmarks. The framework built in this research provides scalable, high-reliability HT detection capabilities to safeguard VLSI circuits from present-day hardware security threats during semiconductor design.
硬件安全仍然是一个重要的问题,因为超大规模集成电路(VLSI)已经变得越来越复杂,行业已经开始使用不可信的第三方知识产权。来自硬件木马(ht)的安全威胁仍然特别危险,因为这些设备会进行不道德的修改,破坏电路完整性,同时挑战可靠性并破坏机密性。由于恶意木马设计策略以及功能测试、侧信道评估和形式化验证技术的限制,当前的HT检测方法难以适当扩展并保持较高的准确率。为了解决这些挑战,本研究引入了DGCoNet-GBOA,这是一个基于可变形图卷积网络的扩散核注意网络,使用鹅颈藤壶优化算法(GBOA)进行优化,用于实时和高精度的高温检测。该框架使用尺度感知调制满足变压器(S-ammT)提取结构、功率和转移概率特征,并使用递减批处理归一化(DimBN)平衡数据集。DGCoNet框架将门级网络(gln)作为图形网络进行分析,以识别HT发展变化,GBOA使用优化方法提高检测精度能力。该模型显示了精确的特洛伊木马检测能力,在测试ISCAS'85和ISCAS'89基准系统时,准确率达到99.87%,假阳性发生率仅为0.12%,准确率为99.91%。拟议的DGCoNet-GBOA方法在ISCAS'85和ISCAS'89基准中,比现有的最先进方法的准确性平均提高了0.7 - 4.5%。本研究中构建的框架提供了可扩展的、高可靠性的高温检测功能,以保护VLSI电路在半导体设计期间免受当今硬件安全威胁。
{"title":"Identifying malicious modules using deformable graph convolutional network-based security framework for reliable VLSI circuit protection","authors":"M. Maria Rubiston ,&nbsp;B.R. Tapas Bapu","doi":"10.1016/j.vlsi.2025.102633","DOIUrl":"10.1016/j.vlsi.2025.102633","url":null,"abstract":"<div><div>Hardware security remains a significant concern because Very Large Scale Integration (VLSI) circuits have become increasingly complex, and industries have begun utilizing untrusted third-party Intellectual Property. Security threats from Hardware Trojans (HTs) remain particularly dangerous since these devices create unethical modifications that break circuit integrity while challenging reliability and damaging confidentiality. Current HT detection methods struggle to scale properly and maintain high accuracy rates due to malicious Trojan design strategies, as well as the constraints of functional testing, side-channel evaluation, and formal verification techniques. To address these challenges, this research introduces DGCoNet-GBOA, a Diffusion Kernel Attention Network with Deformable Graph Convolutional Network-Based Security Framework optimized using the Gooseneck Barnacle Optimization Algorithm (GBOA) for real-time and highly accurate HT detection. The proposed framework extracts structural, power, and transition probability features using Scale-aware Modulation Meet Transformer (S-ammT) and balances the dataset using Diminishing Batch Normalization (DimBN). The DGCoNet framework analyses gate-level netlists (GLNs) as graphical networks to identify HT development changes, and GBOA uses optimization methods that boost detection precision capabilities. The model displays precise Trojan detection abilities, achieving 99.87 % accuracy with just 0.12 % false positive occurrences and 99.91 % precision when testing ISCAS'85 and ISCAS'89 benchmark systems. The proposed DGCoNet-GBOA method achieves an average 0.7–4.5 % improvement in accuracy over existing state-of-the-art approaches across ISCAS'85 and ISCAS'89 benchmarks. The framework built in this research provides scalable, high-reliability HT detection capabilities to safeguard VLSI circuits from present-day hardware security threats during semiconductor design.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"108 ","pages":"Article 102633"},"PeriodicalIF":2.5,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An innovative HLS framework for all network architectures: From Python to SoC 一个适用于所有网络架构的创新HLS框架:从Python到SoC
IF 2.5 3区 工程技术 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-12-13 DOI: 10.1016/j.vlsi.2025.102626
Thi Diem Tran , Minh Tan Ha , Xuan Thao Tran , Ngoc Quoc Tran , Vu Trung Duong Le , Hoai Luan Pham , Van Tinh Nguyen
Deep Neural Networks (DNNs) have achieved remarkable success in diverse applications such as image classification, signal processing, and video analysis. Despite their effectiveness, these models require substantial computational resources, making FPGA-based hardware acceleration a critical enabler for real-time deployment. However, current methods for mapping DNNs to hardware have experienced limited adoption, mainly because software developers often lack the specialized hardware expertise needed for efficient implementation. High-Level Synthesis (HLS) tools were introduced to bridge this gap, but they typically confine designs to fixed platforms and simple network structures. Most existing tools support only standard architectures like VGG or ResNet with predefined parameters, offering little flexibility for customization and restricting deployment to specific FPGA devices. To address these limitations, we introduce Py2C, an automated framework that converts AI models from Python to C. Py2C supports a wide range of DNN architectures, from basic convolutional and pooling layers with variable window sizes to advanced models such as VGG, ResNet, InceptionNet, ShuffleNet, NambaNet, and YOLO. Integrated with Xilinx’s Vitis HLS, Py2C forms the Py2RTL flow, enabling register-transfer level (RTL) generation with custom-precision arithmetic and cross-platform verification. Validated on multiple networks, Py2C has demonstrated superior hardware efficiency and power reduction, particularly in QRS detection for ECG signals. By streamlining the AI-to-RTL conversion process, Py2C makes FPGA-based AI deployment both high-performance and accessible.
深度神经网络(dnn)在图像分类、信号处理和视频分析等多种应用中取得了显著的成功。尽管这些模型很有效,但它们需要大量的计算资源,这使得基于fpga的硬件加速成为实时部署的关键推动者。然而,目前将dnn映射到硬件的方法采用有限,主要是因为软件开发人员通常缺乏有效实现所需的专门硬件专业知识。高级综合(High-Level Synthesis, HLS)工具的引入弥补了这一差距,但它们通常将设计局限于固定的平台和简单的网络结构。大多数现有工具只支持标准架构,如VGG或ResNet,具有预定义的参数,提供很少的定制灵活性,并且限制部署到特定的FPGA设备。为了解决这些限制,我们引入了Py2C,一个将AI模型从Python转换为c的自动化框架。Py2C支持广泛的DNN架构,从基本的卷积层和具有可变窗口大小的池化层到高级模型,如VGG, ResNet, InceptionNet, ShuffleNet, NambaNet和YOLO。与Xilinx的Vitis HLS集成,Py2C形成Py2RTL流,支持使用自定义精度算法和跨平台验证生成寄存器传输级别(RTL)。在多个网络上验证,Py2C具有卓越的硬件效率和功耗降低,特别是在ECG信号的QRS检测方面。通过简化AI到rtl的转换过程,Py2C使基于fpga的AI部署既高性能又易于访问。
{"title":"An innovative HLS framework for all network architectures: From Python to SoC","authors":"Thi Diem Tran ,&nbsp;Minh Tan Ha ,&nbsp;Xuan Thao Tran ,&nbsp;Ngoc Quoc Tran ,&nbsp;Vu Trung Duong Le ,&nbsp;Hoai Luan Pham ,&nbsp;Van Tinh Nguyen","doi":"10.1016/j.vlsi.2025.102626","DOIUrl":"10.1016/j.vlsi.2025.102626","url":null,"abstract":"<div><div>Deep Neural Networks (DNNs) have achieved remarkable success in diverse applications such as image classification, signal processing, and video analysis. Despite their effectiveness, these models require substantial computational resources, making FPGA-based hardware acceleration a critical enabler for real-time deployment. However, current methods for mapping DNNs to hardware have experienced limited adoption, mainly because software developers often lack the specialized hardware expertise needed for efficient implementation. High-Level Synthesis (HLS) tools were introduced to bridge this gap, but they typically confine designs to fixed platforms and simple network structures. Most existing tools support only standard architectures like VGG or ResNet with predefined parameters, offering little flexibility for customization and restricting deployment to specific FPGA devices. To address these limitations, we introduce Py2C, an automated framework that converts AI models from Python to C. Py2C supports a wide range of DNN architectures, from basic convolutional and pooling layers with variable window sizes to advanced models such as VGG, ResNet, InceptionNet, ShuffleNet, NambaNet, and YOLO. Integrated with Xilinx’s Vitis HLS, Py2C forms the Py2RTL flow, enabling register-transfer level (RTL) generation with custom-precision arithmetic and cross-platform verification. Validated on multiple networks, Py2C has demonstrated superior hardware efficiency and power reduction, particularly in QRS detection for ECG signals. By streamlining the AI-to-RTL conversion process, Py2C makes FPGA-based AI deployment both high-performance and accessible.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102626"},"PeriodicalIF":2.5,"publicationDate":"2025-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145839830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Design of CMOS current-mode multi-operand addition circuit based on carry stack 基于进位堆栈的CMOS电流型多操作数加法电路设计
IF 2.5 3区 工程技术 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-12-12 DOI: 10.1016/j.vlsi.2025.102630
Maoqun Yao, Xiaole Zhang
This paper proposes a design method for current-mode multi-operand addition circuits. The approach temporarily stacks carry signals — which would typically be computed in subsequent stages — within the current stage, and employs a bit-by-bit modulus operation to calculate the remainder for each digit. The integer quotient is then propagated to higher digits, while the final result is composed of the remainders from all digits. Circuits designed using this method feature a shortened critical path in current-mode multi-operand addition and exhibit low hardware cost. In SPICE simulations, the proposed circuit achieved approximately 35% lower average power consumption compared to full-adders from relevant literature, along with higher operating speed and fewer transistors. Since inter-stage carry outputs can exceed the representation range of the current digit, current-mode signals between stages are allowed to surpass conventional logic limits, making it possible to further reduce cost by increasing internal logical values. A 15-operand summation circuit designed with this method demonstrated correct logical functionality, achieving a 52% reduction in transistor count and a 33% shortening of the critical path.
提出了一种电流型多操作数加法电路的设计方法。这种方法在当前阶段临时叠加信号(通常在后续阶段进行计算),并采用逐位模数运算来计算每个数字的余数。然后将整商传播到更高的数字,而最终结果由所有数字的余数组成。采用该方法设计的电路具有缩短电流模式多操作数加法的关键路径和低硬件成本的特点。在SPICE模拟中,与相关文献中的全加法器相比,所提出的电路的平均功耗降低了约35%,同时具有更高的工作速度和更少的晶体管。由于级间进位输出可以超过当前数字的表示范围,因此允许级间的电流模式信号超过常规逻辑限制,从而可以通过增加内部逻辑值来进一步降低成本。用这种方法设计的15个操作数求和电路显示出正确的逻辑功能,晶体管数量减少52%,关键路径缩短33%。
{"title":"Design of CMOS current-mode multi-operand addition circuit based on carry stack","authors":"Maoqun Yao,&nbsp;Xiaole Zhang","doi":"10.1016/j.vlsi.2025.102630","DOIUrl":"10.1016/j.vlsi.2025.102630","url":null,"abstract":"<div><div>This paper proposes a design method for current-mode multi-operand addition circuits. The approach temporarily stacks carry signals — which would typically be computed in subsequent stages — within the current stage, and employs a bit-by-bit modulus operation to calculate the remainder for each digit. The integer quotient is then propagated to higher digits, while the final result is composed of the remainders from all digits. Circuits designed using this method feature a shortened critical path in current-mode multi-operand addition and exhibit low hardware cost. In SPICE simulations, the proposed circuit achieved approximately 35% lower average power consumption compared to full-adders from relevant literature, along with higher operating speed and fewer transistors. Since inter-stage carry outputs can exceed the representation range of the current digit, current-mode signals between stages are allowed to surpass conventional logic limits, making it possible to further reduce cost by increasing internal logical values. A 15-operand summation circuit designed with this method demonstrated correct logical functionality, achieving a 52% reduction in transistor count and a 33% shortening of the critical path.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102630"},"PeriodicalIF":2.5,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145839828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hessian-driven N:M sparsity and quantization co-optimization for edge device deployment 边缘设备部署的hessian驱动N:M稀疏和量化协同优化
IF 2.5 3区 工程技术 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-12-11 DOI: 10.1016/j.vlsi.2025.102629
Minghao Tang , Ming Ling , Minhua Ren , Zhihua Cai , Zhen Liu , Shidi Tang , Jianjun Li
To reduce computational demands on neural networks, pruning and quantization are commonly employed to lightweight models. These two approaches are typically viewed as orthogonal. However, this perception is limited, and further exploration of their intrinsic connection is required. Consequently, a heuristic algorithm, referred to as Hessian-Cooptimized Sparsity-Quantization (HCSQ), is proposed. This is the first algorithm to unify the intrinsic connection between quantization and semi-structured pruning through second-order Hessian information. Our algorithm introduces the concept of sensitivity through Hessian information, with further fine-tuning of layer-level sensitivity by adjusting the N:M sparsity ratio within layers, and it maximizes the utilization of quantization bit width. The evaluation of three lightweight models (ResNet20, ResNet18 and MobileNetV2) is conducted in four datasets (ImageNet, Tiny-ImageNet, CIFAR-10 and CIFAR-100), reaching a maximum compression ratio of ranges from 14.96× to 28.58× without reducing original accuracy(<1% loss), better than state-of-the-art performance under the comparable accuracy loss. Furthermore, ablation experiments are conducted within the open source processor. In some layers, it achieves an acceleration of up to 4.79×, and the entire model’s inference cycle time is reduced to 45%, compared to the ablation experiment. This demonstrates that the efficacy of the proposed algorithm extends beyond mere model compression; it also enhances hardware utilization when focus on the specific hardware designs.
为了减少对神经网络的计算需求,通常在轻量级模型中使用剪枝和量化。这两种方法通常被认为是正交的。然而,这种认识是有限的,需要进一步探索它们之间的内在联系。因此,提出了一种启发式算法,称为hessian - cooptimization sparsity - quantiization (HCSQ)。这是第一个通过二阶Hessian信息统一量化与半结构化剪枝之间内在联系的算法。我们的算法通过Hessian信息引入灵敏度的概念,通过调整层内N:M稀疏比进一步微调层级灵敏度,最大限度地利用量化比特宽度。在四个数据集(ImageNet、Tiny-ImageNet、CIFAR-10和CIFAR-100)上对3个轻量级模型(ResNet20、ResNet18和MobileNetV2)进行了评估,在不降低原始精度(损失<;1%)的情况下,达到了14.96× ~ 28.58×的最大压缩比,在同等精度损失下优于目前的性能。此外,在开源处理器内进行了烧蚀实验。与烧蚀实验相比,在某些层中实现了高达4.79倍的加速,整个模型的推理周期时间减少到45%。这表明该算法的有效性超越了单纯的模型压缩;当专注于特定的硬件设计时,它还可以提高硬件利用率。
{"title":"Hessian-driven N:M sparsity and quantization co-optimization for edge device deployment","authors":"Minghao Tang ,&nbsp;Ming Ling ,&nbsp;Minhua Ren ,&nbsp;Zhihua Cai ,&nbsp;Zhen Liu ,&nbsp;Shidi Tang ,&nbsp;Jianjun Li","doi":"10.1016/j.vlsi.2025.102629","DOIUrl":"10.1016/j.vlsi.2025.102629","url":null,"abstract":"<div><div>To reduce computational demands on neural networks, pruning and quantization are commonly employed to lightweight models. These two approaches are typically viewed as orthogonal. However, this perception is limited, and further exploration of their intrinsic connection is required. Consequently, a heuristic algorithm, referred to as Hessian-Cooptimized Sparsity-Quantization (HCSQ), is proposed. This is the first algorithm to unify the intrinsic connection between quantization and semi-structured pruning through second-order Hessian information. Our algorithm introduces the concept of sensitivity through Hessian information, with further fine-tuning of layer-level sensitivity by adjusting the N:M sparsity ratio within layers, and it maximizes the utilization of quantization bit width. The evaluation of three lightweight models (ResNet20, ResNet18 and MobileNetV2) is conducted in four datasets (ImageNet, Tiny-ImageNet, CIFAR-10 and CIFAR-100), reaching a maximum compression ratio of ranges from 14.96<span><math><mo>×</mo></math></span> to 28.58<span><math><mo>×</mo></math></span> without reducing original accuracy(<span><math><mrow><mo>&lt;</mo><mn>1</mn><mtext>%</mtext></mrow></math></span> loss), better than state-of-the-art performance under the comparable accuracy loss. Furthermore, ablation experiments are conducted within the open source processor. In some layers, it achieves an acceleration of up to 4.79<span><math><mo>×</mo></math></span>, and the entire model’s inference cycle time is reduced to 45%, compared to the ablation experiment. This demonstrates that the efficacy of the proposed algorithm extends beyond mere model compression; it also enhances hardware utilization when focus on the specific hardware designs.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102629"},"PeriodicalIF":2.5,"publicationDate":"2025-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145839825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advanced fault diagnosis in analog and digital VLSI circuits utilizing multi-anchor space-aware temporal convolutional neural network for efficient circuit reliability assessment 利用多锚点空间感知时间卷积神经网络对模拟和数字VLSI电路进行高级故障诊断,以实现有效的电路可靠性评估
IF 2.5 3区 工程技术 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-12-11 DOI: 10.1016/j.vlsi.2025.102631
Divya Arivalagan , O. Vignesh , S.S. Abinayaa , V.S. Nishok
Fault diagnosis in analog and digital Very Large Scale Integration (VLSI) circuits is essential for ensuring reliable operation and performance. These circuits are increasingly complex due to miniaturization and high integration levels. Advanced circuits are susceptible to various faults including transient, permanent and intermittent types. Detecting and accurately diagnosing these faults remains major challenge due to signal complexity and noise. Therefore, this research proposes a novel model of Advanced Fault Diagnosis in Analog and Digital VLSI Circuits utilizing Optimized Multi-Anchor Space-Aware Temporal Convolutional Neural Network for Efficient Circuit Reliability Assessment (FDAD-VLSI- MSTCNN). The objective is to accurately detect and locate faults in analog and digital VLSI circuits to ensure reliable circuit performance. It aims to enhance circuit functionality by enabling optimal recovery of faulty designs. The proposed process begins with collecting input signals with frequency responses. The collected input signal is given to pre-processing using Robust Maximum Correntropy Kalman Filter (RMCKF) to remove noise. The Multidimensional Empirical Mode Decomposition (MEMD) is applied to decompose complex, non-stationary, nonlinear signals into simpler intrinsic mode functions (IMFs). These components undergo feature extraction using the Lifted Euler Characteristic Transform (LECT) extract mean, Standard Deviation (SD), kurtosis, skewness, Relative Entropy (RE), and minimum and maximum values features. Then, the extracted feature is given to Multi-Anchor Space-Aware Temporal Convolutional Neural Network (MSTCNN)to identify the fault locations for diagnosing fault in analog and digital VLSI circuits. The Divine Religions Algorithm (DRA) to recover the faulty circuit and restore normal circuit operation. Then the proposed FDAD-VLSI-MSTCNN is examined using performance metrics like Accuracy, Precision, Recall, F1-Score, Specificity, Receiver Operating Characteristic Curve (ROC), Computational Time and Execution Time. The proposed FDAD-VLSI-MSTCNN method provides 99.42 % higher accuracy, 98.34 % higher precision and 98.88 % higher recall while compared with existing methods like Soft fault detection in analog circuits using voltage feature extraction and supervised learning (SFDAC-VFE-SL), an investigation of extreme learning machine-based fault diagnosis to identify faulty components in analog circuits (FD-IFCAC-ELM) and detecting and classifying parametric faults in analog circuits using optimized attention neural networks (DCPF-AC-ANN) respectively.
模拟和数字超大规模集成电路(VLSI)的故障诊断是保证其可靠运行和性能的关键。由于小型化和高集成度,这些电路越来越复杂。高级电路易受各种故障的影响,包括瞬态、永久和间歇类型。由于信号的复杂性和噪声,检测和准确诊断这些故障仍然是一个重大挑战。因此,本研究提出了一种基于优化多锚点空间感知时序卷积神经网络(FDAD-VLSI- MSTCNN)的模拟和数字VLSI电路高级故障诊断新模型。目标是准确地检测和定位模拟和数字VLSI电路中的故障,以确保可靠的电路性能。它旨在通过实现故障设计的最佳恢复来增强电路功能。所提出的过程从收集具有频率响应的输入信号开始。采集到的输入信号通过鲁棒最大相关卡尔曼滤波(RMCKF)进行预处理,去除噪声。多维经验模态分解(MEMD)用于将复杂、非平稳、非线性信号分解为更简单的内禀模态函数(IMFs)。使用提升欧拉特征变换(LECT)提取平均值、标准差(SD)、峰度、偏度、相对熵(RE)以及最小值和最大值特征,对这些成分进行特征提取。然后,将提取的特征输入到多锚点空间感知时序卷积神经网络(MSTCNN)中进行故障定位,用于模拟和数字VLSI电路的故障诊断。通过DRA (Divine Religions Algorithm)算法恢复故障电路,使电路恢复正常运行。然后,使用准确度、精密度、召回率、f1评分、特异性、接受者工作特征曲线(ROC)、计算时间和执行时间等性能指标来检查所提出的FDAD-VLSI-MSTCNN。与现有的基于电压特征提取和监督学习(SFDAC-VFE-SL)的模拟电路软故障检测方法相比,本文提出的fdd - vlsi - mstcnn方法的准确率提高了99.42%,精密度提高了98.34%,召回率提高了98.88%。研究了基于极限学习机的模拟电路故障诊断方法(FD-IFCAC-ELM)和基于优化注意神经网络(DCPF-AC-ANN)的模拟电路参数故障检测与分类方法。
{"title":"Advanced fault diagnosis in analog and digital VLSI circuits utilizing multi-anchor space-aware temporal convolutional neural network for efficient circuit reliability assessment","authors":"Divya Arivalagan ,&nbsp;O. Vignesh ,&nbsp;S.S. Abinayaa ,&nbsp;V.S. Nishok","doi":"10.1016/j.vlsi.2025.102631","DOIUrl":"10.1016/j.vlsi.2025.102631","url":null,"abstract":"<div><div>Fault diagnosis in analog and digital Very Large Scale Integration (VLSI) circuits is essential for ensuring reliable operation and performance. These circuits are increasingly complex due to miniaturization and high integration levels. Advanced circuits are susceptible to various faults including transient, permanent and intermittent types. Detecting and accurately diagnosing these faults remains major challenge due to signal complexity and noise. Therefore, this research proposes a novel model of Advanced Fault Diagnosis in Analog and Digital VLSI Circuits utilizing Optimized Multi-Anchor Space-Aware Temporal Convolutional Neural Network for Efficient Circuit Reliability Assessment (FDAD-VLSI- MSTCNN). The objective is to accurately detect and locate faults in analog and digital VLSI circuits to ensure reliable circuit performance. It aims to enhance circuit functionality by enabling optimal recovery of faulty designs. The proposed process begins with collecting input signals with frequency responses. The collected input signal is given to pre-processing using Robust Maximum Correntropy Kalman Filter (RMCKF) to remove noise. The Multidimensional Empirical Mode Decomposition (MEMD) is applied to decompose complex, non-stationary, nonlinear signals into simpler intrinsic mode functions (IMFs). These components undergo feature extraction using the Lifted Euler Characteristic Transform (LECT) extract mean, Standard Deviation (SD), kurtosis, skewness, Relative Entropy (RE), and minimum and maximum values features. Then, the extracted feature is given to Multi-Anchor Space-Aware Temporal Convolutional Neural Network (MSTCNN)to identify the fault locations for diagnosing fault in analog and digital VLSI circuits. The Divine Religions Algorithm (DRA) to recover the faulty circuit and restore normal circuit operation. Then the proposed FDAD-VLSI-MSTCNN is examined using performance metrics like Accuracy, Precision, Recall, F1-Score, Specificity, Receiver Operating Characteristic Curve (ROC), Computational Time and Execution Time. The proposed FDAD-VLSI-MSTCNN method provides 99.42 % higher accuracy, 98.34 % higher precision and 98.88 % higher recall while compared with existing methods like Soft fault detection in analog circuits using voltage feature extraction and supervised learning (SFDAC-VFE-SL), an investigation of extreme learning machine-based fault diagnosis to identify faulty components in analog circuits (FD-IFCAC-ELM) and detecting and classifying parametric faults in analog circuits using optimized attention neural networks (DCPF-AC-ANN) respectively.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102631"},"PeriodicalIF":2.5,"publicationDate":"2025-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145839821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AI-enabled image processing approach for efficient clustering and identification of hardware Trojans 支持ai的图像处理方法,用于高效聚类和识别硬件木马
IF 2.5 3区 工程技术 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-12-10 DOI: 10.1016/j.vlsi.2025.102628
Ashutosh Ghimire , Mohammed Alkurdi , Saraju Mohanty , Fathi Amsaad
Hardware Trojans are emerging malicious integrated circuit (IC) modifications that pose a significant threat to the integrity of electronics. While existing methods, such as functional testing and reverse engineering, are proposed to identify Trojan anomalies in electronics, their applicability to industrial pipelines is limited. This paper proposes a new image processing technique for efficient clustering and identification of Hardware Trojan insertion in integrated circuits. The uniqueness of the proposed AI-assisted image processing method relies on using real hardware to generate images using side-channel analysis (SCA) before applying unsupervised image classification to identify the impact of hardware Trojans without the need for costly golden references. Leveraging Machine Learning on side-channel data collected from Ring-Oscillator networks, image and digital signal processing are employed to extract features for detection. This research contributes a novel use of side-channel data as images, eliminating the reliance on golden references, and achieving a remarkable accuracy of 95% in Hardware Trojan detection. In addition to significantly advancing the field and addressing crucial challenges in semiconductor supply chain, making it a significant step toward securing it.
硬件木马正在出现恶意集成电路(IC)修改,对电子产品的完整性构成重大威胁。虽然现有的方法,如功能测试和逆向工程,被提出用于识别电子产品中的特洛伊木马异常,但它们对工业管道的适用性有限。本文提出了一种新的图像处理技术,用于集成电路中硬件木马插入的高效聚类和识别。所提出的人工智能辅助图像处理方法的独特性在于使用真实硬件使用侧通道分析(SCA)生成图像,然后应用无监督图像分类来识别硬件木马的影响,而不需要昂贵的黄金参考。利用机器学习从环形振荡器网络收集的侧信道数据,采用图像和数字信号处理提取特征进行检测。本研究提出了一种利用侧信道数据作为图像的新方法,消除了对黄金参考的依赖,在硬件木马检测中达到了95%的显著准确率。除了显着推进该领域并解决半导体供应链中的关键挑战外,还使其成为确保其安全的重要一步。
{"title":"AI-enabled image processing approach for efficient clustering and identification of hardware Trojans","authors":"Ashutosh Ghimire ,&nbsp;Mohammed Alkurdi ,&nbsp;Saraju Mohanty ,&nbsp;Fathi Amsaad","doi":"10.1016/j.vlsi.2025.102628","DOIUrl":"10.1016/j.vlsi.2025.102628","url":null,"abstract":"<div><div>Hardware Trojans are emerging malicious integrated circuit (IC) modifications that pose a significant threat to the integrity of electronics. While existing methods, such as functional testing and reverse engineering, are proposed to identify Trojan anomalies in electronics, their applicability to industrial pipelines is limited. This paper proposes a new image processing technique for efficient clustering and identification of Hardware Trojan insertion in integrated circuits. The uniqueness of the proposed AI-assisted image processing method relies on using real hardware to generate images using side-channel analysis (SCA) before applying unsupervised image classification to identify the impact of hardware Trojans without the need for costly golden references. Leveraging Machine Learning on side-channel data collected from Ring-Oscillator networks, image and digital signal processing are employed to extract features for detection. This research contributes a novel use of side-channel data as images, eliminating the reliance on golden references, and achieving a remarkable accuracy of 95% in Hardware Trojan detection. In addition to significantly advancing the field and addressing crucial challenges in semiconductor supply chain, making it a significant step toward securing it.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102628"},"PeriodicalIF":2.5,"publicationDate":"2025-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145839822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hardware efficient approximate activation functions for a Long-Short-Term Memory cell 长短期记忆单元的硬件高效近似激活函数
IF 2.5 3区 工程技术 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-12-06 DOI: 10.1016/j.vlsi.2025.102627
R. Sindhu, V. Arunachalam
The activation functions (AF) such as sigmoid(x) and tanh(x) are essential in a Long-Short Term Memory (LSTM) cell for time series classification using a Recurrent Neural Network (RNN). These AFs regulate the data flow effectively and optimize memory requirements in LSTM cells. Hardware realizations of these AFs are complex; consequently, approximation strategies must be adopted. The piece-wise linearization (PWL) method is appropriate for hardware implementations. A 7-segment PWL-based approximate tanh(x), t(x8) is proposed here. Employing a MATLAB-based error analysis, an optimum fixed-point data format (1-bit sign, 2-bit integer, 8-bit fraction) is chosen. The function t(x8) is implemented with parallel segment selection and two 10-bit adders using TSMC 65 nm technology libraries. This architecture uses 356.4 μm2 area and consumes 230.7 μW at 1.67 GHz. Later, an approximate sigmoid(x), σ(x8) is implemented using the t(x8) module with two shifters, a complement and an 11-bit adder. It uses a 462.4 μm2 area and consumes 324.2 μW power at 1.25 GHz. An approximate LSTM cell with the proposed t(x8) and σ(x8) functions are modelled using Python 3.2 and tested with the Italian Parkinson's dataset. The approximate LSTM cell produces closer classification metrics with a maximum deviation of 0.21 % from the exact LSTM cell.
激活函数(AF)如sigmoid(x)和tanh(x)是使用递归神经网络(RNN)进行时间序列分类的长短期记忆(LSTM)单元中必不可少的。这些af有效地调节数据流并优化LSTM单元中的内存需求。这些af的硬件实现是复杂的;因此,必须采用近似策略。分段线性化(PWL)方法适用于硬件实现。这里提出了一个基于7段pwl的近似tanh(x), t(x8)。通过matlab误差分析,选择了最佳的定点数据格式(1位符号,2位整数,8位分数)。函数t(x8)是通过并行段选择和两个使用台积电65nm技术库的10位加法器实现的。该架构占用356.4 μm2的面积,1.67 GHz时功耗为230.7 μW。随后,使用t(x8)模块实现了一个近似的sigmoid(x), σ(x8),其中包含两个移位器,一个补码和一个11位加法器。它占地462.4 μm2,在1.25 GHz时功耗为324.2 μW。使用Python 3.2对具有提议的t(x8)和σ(x8)函数的近似LSTM单元进行建模,并使用意大利帕金森病数据集进行测试。近似LSTM单元产生更接近的分类指标,与精确LSTM单元的最大偏差为0.21%。
{"title":"Hardware efficient approximate activation functions for a Long-Short-Term Memory cell","authors":"R. Sindhu,&nbsp;V. Arunachalam","doi":"10.1016/j.vlsi.2025.102627","DOIUrl":"10.1016/j.vlsi.2025.102627","url":null,"abstract":"<div><div>The activation functions (AF) such as <span><math><mrow><mi>s</mi><mi>i</mi><mi>g</mi><mi>m</mi><mi>o</mi><mi>i</mi><mi>d</mi><mrow><mo>(</mo><mi>x</mi><mo>)</mo></mrow></mrow></math></span> and <span><math><mrow><mi>tanh</mi><mrow><mo>(</mo><mi>x</mi><mo>)</mo></mrow></mrow></math></span> are essential in a Long-Short Term Memory (LSTM) cell for time series classification using a Recurrent Neural Network (RNN). These AFs regulate the data flow effectively and optimize memory requirements in LSTM cells. Hardware realizations of these AFs are complex; consequently, approximation strategies must be adopted. The piece-wise linearization (PWL) method is appropriate for hardware implementations. A 7-segment PWL-based approximate <span><math><mrow><mi>tanh</mi><mrow><mo>(</mo><mi>x</mi><mo>)</mo></mrow></mrow></math></span>, <span><math><mrow><mi>t</mi><mrow><mo>(</mo><msub><mi>x</mi><mn>8</mn></msub><mo>)</mo></mrow></mrow></math></span> is proposed here. Employing a MATLAB-based error analysis, an optimum fixed-point data format (1-bit sign, 2-bit integer, 8-bit fraction) is chosen. The function <span><math><mrow><mi>t</mi><mrow><mo>(</mo><msub><mi>x</mi><mn>8</mn></msub><mo>)</mo></mrow></mrow></math></span> is implemented with parallel segment selection and two 10-bit adders using TSMC 65 nm technology libraries. This architecture uses 356.4 μm<sup>2</sup> area and consumes 230.7 μW at 1.67 GHz. Later, an approximate <span><math><mrow><mi>s</mi><mi>i</mi><mi>g</mi><mi>m</mi><mi>o</mi><mi>i</mi><mi>d</mi><mrow><mo>(</mo><mi>x</mi><mo>)</mo></mrow></mrow></math></span>, <span><math><mrow><mi>σ</mi><mrow><mo>(</mo><msub><mi>x</mi><mn>8</mn></msub><mo>)</mo></mrow></mrow></math></span> is implemented using the <span><math><mrow><mi>t</mi><mrow><mo>(</mo><msub><mi>x</mi><mn>8</mn></msub><mo>)</mo></mrow></mrow></math></span> module with two shifters, a complement and an 11-bit adder. It uses a 462.4 μm<sup>2</sup> area and consumes 324.2 μW power at 1.25 GHz. An approximate LSTM cell with the proposed <span><math><mrow><mi>t</mi><mrow><mo>(</mo><msub><mi>x</mi><mn>8</mn></msub><mo>)</mo></mrow></mrow></math></span> and <span><math><mrow><mi>σ</mi><mrow><mo>(</mo><msub><mi>x</mi><mn>8</mn></msub><mo>)</mo></mrow></mrow></math></span> functions are modelled using Python 3.2 and tested with the Italian Parkinson's dataset. The approximate LSTM cell produces closer classification metrics with a maximum deviation of 0.21 % from the exact LSTM cell.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102627"},"PeriodicalIF":2.5,"publicationDate":"2025-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145737413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Achieving superior segmented CAM efficiency with pre-charge free local search based hybrid matcher for high speed applications 实现卓越的分段凸轮效率与预收费免费本地搜索为基础的混合匹配器高速应用
IF 2.5 3区 工程技术 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-12-06 DOI: 10.1016/j.vlsi.2025.102621
Shyamosree Goswami , Adwait Wakankar , Partha Bhattacharyya , Anup Dandapat
This high-speed, power-efficient content addressable memory (CAM) uses parallel lookups to match quickly without sacrificing power consumption. It introduces three key contributions: i. Pre-charge free operation, which improves search speed and reduces power requirements by eliminating node charging time, ii. A Hybrid Match Line (HML) structure that strategically balances power and delay, combining the high-speed attributes of NOR with the low-power attributes of NAND, and iii. Local searching technique ascertain further improvement in search time. Performance indicators improve greatly when these methods are seamlessly integrated. Utilizing 45 nm CMOS technology, the design supports diverse process voltages, temperatures, and frequencies for a 64x32 memory array. Monte Carlo simulations verify design stability. The proposed architecture outperforms the leading benchmark in speed and power-delay-product (PDP) by 54.6% and 76.02%, respectively. This novel design can do repeated data searches at frequencies up to 2 GHz after a single write operation, enabling quicker and more energy-efficient data processing that could revolutionize consumer electronics. This development could revolutionize consumer electronics by improving efficiency and speed in high-performance computing, mobile devices, and IoT applications.
这种高速、节能的内容可寻址内存(CAM)使用并行查找来快速匹配,而不会牺牲功耗。它引入了三个关键贡献:1 .预充电自由操作,通过消除节点充电时间,提高搜索速度并降低功耗要求;混合匹配线(HML)结构,战略性地平衡了功率和延迟,结合了NOR的高速属性和NAND的低功耗属性;局部搜索技术进一步提高了搜索时间。当这些方法无缝集成时,性能指标将大大提高。该设计采用45纳米CMOS技术,支持64x32存储器阵列的各种工艺电压、温度和频率。蒙特卡罗仿真验证了设计的稳定性。该架构在速度和功率延迟积(PDP)方面分别优于领先基准54.6%和76.02%。这种新颖的设计可以在单次写入操作后以高达2 GHz的频率进行重复数据搜索,从而实现更快、更节能的数据处理,这可能会给消费电子产品带来革命性的变化。这一发展可以通过提高高性能计算、移动设备和物联网应用的效率和速度来彻底改变消费电子产品。
{"title":"Achieving superior segmented CAM efficiency with pre-charge free local search based hybrid matcher for high speed applications","authors":"Shyamosree Goswami ,&nbsp;Adwait Wakankar ,&nbsp;Partha Bhattacharyya ,&nbsp;Anup Dandapat","doi":"10.1016/j.vlsi.2025.102621","DOIUrl":"10.1016/j.vlsi.2025.102621","url":null,"abstract":"<div><div>This high-speed, power-efficient content addressable memory (CAM) uses parallel lookups to match quickly without sacrificing power consumption. It introduces three key contributions: i. Pre-charge free operation, which improves search speed and reduces power requirements by eliminating node charging time, ii. A Hybrid Match Line (HML) structure that strategically balances power and delay, combining the high-speed attributes of NOR with the low-power attributes of NAND, and iii. Local searching technique ascertain further improvement in search time. Performance indicators improve greatly when these methods are seamlessly integrated. Utilizing 45 nm CMOS technology, the design supports diverse process voltages, temperatures, and frequencies for a 64x32 memory array. Monte Carlo simulations verify design stability. The proposed architecture outperforms the leading benchmark in speed and power-delay-product (PDP) by 54.6% and 76.02%, respectively. This novel design can do repeated data searches at frequencies up to 2 GHz after a single write operation, enabling quicker and more energy-efficient data processing that could revolutionize consumer electronics. This development could revolutionize consumer electronics by improving efficiency and speed in high-performance computing, mobile devices, and IoT applications.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102621"},"PeriodicalIF":2.5,"publicationDate":"2025-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145736930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An efficient open-source design and implementation framework for non-quantized CNNs on FPGAs fpga上非量化cnn的高效开源设计与实现框架
IF 2.5 3区 工程技术 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-12-02 DOI: 10.1016/j.vlsi.2025.102625
Angelos Athanasiadis , Nikolaos Tampouratzis , Ioannis Papaefstathiou
The growing demand for real-time processing in artificial intelligence applications, particularly those involving Convolutional Neural Networks (CNNs), has highlighted the need for efficient computational solutions. Conventional processors and graphical processing units (GPUs), very often, fall short in balancing performance, power consumption, and latency, especially in embedded systems and edge computing platforms. Field-Programmable Gate Arrays (FPGAs) offer a promising alternative, combining high performance with energy efficiency and reconfigurability. This paper presents a design and implementation framework for implementing CNNs seamlessly on FPGAs that maintains full precision in all neural network parameters thus addressing a niche, that of non-quantized NNs. The presented framework extends Darknet, which is very widely used for the design of CNNs, and allows the designer, by effectively using a Darknet NN description, to efficiently implement CNNs in a heterogeneous system comprising of CPUs and FPGAs. Our framework is evaluated on the implementation of a number of different CNNs and as part of a real world application utilizing UAVs; in all cases it outperforms the CPU and GPU systems in terms of performance and/or power consumption. When compared with the FPGA frameworks that support quantization, our solution offers similar performance and/or energy efficiency without any degradation on the NN accuracy.
人工智能应用中对实时处理的需求日益增长,特别是涉及卷积神经网络(cnn)的应用,突出了对高效计算解决方案的需求。传统的处理器和图形处理单元(gpu)通常无法平衡性能、功耗和延迟,特别是在嵌入式系统和边缘计算平台中。现场可编程门阵列(fpga)提供了一种很有前途的替代方案,结合了高性能、能效和可重构性。本文提出了一种在fpga上无缝实现cnn的设计和实现框架,该框架在所有神经网络参数中保持完全的精度,从而解决了非量化nn的利基问题。该框架扩展了目前广泛应用于cnn设计的Darknet,并允许设计者通过有效地使用Darknet NN描述,在由cpu和fpga组成的异构系统中有效地实现cnn。我们的框架在许多不同cnn的实施上进行评估,并作为利用无人机的现实世界应用的一部分;在所有情况下,它在性能和/或功耗方面都优于CPU和GPU系统。与支持量化的FPGA框架相比,我们的解决方案提供了类似的性能和/或能源效率,而不会降低神经网络的精度。
{"title":"An efficient open-source design and implementation framework for non-quantized CNNs on FPGAs","authors":"Angelos Athanasiadis ,&nbsp;Nikolaos Tampouratzis ,&nbsp;Ioannis Papaefstathiou","doi":"10.1016/j.vlsi.2025.102625","DOIUrl":"10.1016/j.vlsi.2025.102625","url":null,"abstract":"<div><div>The growing demand for real-time processing in artificial intelligence applications, particularly those involving Convolutional Neural Networks (CNNs), has highlighted the need for efficient computational solutions. Conventional processors and graphical processing units (GPUs), very often, fall short in balancing performance, power consumption, and latency, especially in embedded systems and edge computing platforms. Field-Programmable Gate Arrays (FPGAs) offer a promising alternative, combining high performance with energy efficiency and reconfigurability. This paper presents a design and implementation framework for implementing CNNs seamlessly on FPGAs that maintains full precision in all neural network parameters thus addressing a niche, that of non-quantized NNs. The presented framework extends Darknet, which is very widely used for the design of CNNs, and allows the designer, by effectively using a Darknet NN description, to efficiently implement CNNs in a heterogeneous system comprising of CPUs and FPGAs. Our framework is evaluated on the implementation of a number of different CNNs and as part of a real world application utilizing UAVs; in all cases it outperforms the CPU and GPU systems in terms of performance and/or power consumption. When compared with the FPGA frameworks that support quantization, our solution offers similar performance and/or energy efficiency without any degradation on the NN accuracy.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102625"},"PeriodicalIF":2.5,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145684628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine-learning-driven prediction of thin film parameters for optimizing the dielectric deposition in semiconductor fabrication 机器学习驱动的薄膜参数预测,用于优化半导体制造中的介电沉积
IF 2.5 3区 工程技术 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-12-02 DOI: 10.1016/j.vlsi.2025.102617
Hao Wen , Enda Zhao , Qiyue Zhang , Ruofei Xiang , Wenjian Yu
The deposition of dielectric thin film in semiconductor fabrication is significantly influenced by process parameter configuration. Traditional optimization via experiments or multi-physics simulations is costly, time-consuming, and lacks flexibility. Data-driven methods that leverage production line sensor data provide a promising alternative. This work proposes a machine learning modeling framework for studying the nonlinear correlation between dielectric deposition parameters and film thickness distribution. The proposed approach is validated using historical High-Density Plasma Chemical Vapor Deposition (HDPCVD) process data collected from production runs and demonstrates strong predictive performance across multiple technology nodes. This framework achieves strong predictive performance in thin film thickness (R2 = 0.92) and enables practical assessment of specification compliance, achieving 79.5% accuracy in determining whether predicted thicknesses lie within the node–specific tolerances at the 14 nm node. The results suggest that data-driven modeling offers a practical, scalable, and efficient solution for process monitoring and optimization in advanced semiconductor fabrication.
在半导体制造中,介质薄膜的沉积受到工艺参数配置的显著影响。传统的通过实验或多物理场模拟进行优化既昂贵又耗时,而且缺乏灵活性。利用生产线传感器数据的数据驱动方法提供了一个有希望的替代方案。本文提出了一种机器学习建模框架,用于研究介电沉积参数与薄膜厚度分布之间的非线性相关性。利用从生产运行中收集的高密度等离子体化学气相沉积(HDPCVD)工艺数据验证了所提出的方法,并在多个技术节点上展示了强大的预测性能。该框架在薄膜厚度方面实现了强大的预测性能(R2 = 0.92),并实现了规范符合性的实际评估,在确定预测厚度是否在14 nm节点的节点特定公差范围内时,准确率达到79.5%。结果表明,数据驱动建模为先进半导体制造过程监控和优化提供了实用、可扩展和高效的解决方案。
{"title":"Machine-learning-driven prediction of thin film parameters for optimizing the dielectric deposition in semiconductor fabrication","authors":"Hao Wen ,&nbsp;Enda Zhao ,&nbsp;Qiyue Zhang ,&nbsp;Ruofei Xiang ,&nbsp;Wenjian Yu","doi":"10.1016/j.vlsi.2025.102617","DOIUrl":"10.1016/j.vlsi.2025.102617","url":null,"abstract":"<div><div>The deposition of dielectric thin film in semiconductor fabrication is significantly influenced by process parameter configuration. Traditional optimization via experiments or multi-physics simulations is costly, time-consuming, and lacks flexibility. Data-driven methods that leverage production line sensor data provide a promising alternative. This work proposes a machine learning modeling framework for studying the nonlinear correlation between dielectric deposition parameters and film thickness distribution. The proposed approach is validated using historical High-Density Plasma Chemical Vapor Deposition (HDPCVD) process data collected from production runs and demonstrates strong predictive performance across multiple technology nodes. This framework achieves strong predictive performance in thin film thickness (<span><math><msup><mrow><mo>R</mo></mrow><mrow><mn>2</mn></mrow></msup></math></span> = 0.92) and enables practical assessment of specification compliance, achieving 79.5% accuracy in determining whether predicted thicknesses lie within the node–specific tolerances at the 14 nm node. The results suggest that data-driven modeling offers a practical, scalable, and efficient solution for process monitoring and optimization in advanced semiconductor fabrication.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102617"},"PeriodicalIF":2.5,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145684623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Integration-The Vlsi Journal
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1