首页 > 最新文献

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems最新文献

英文 中文
MultiLens: A Multiobjective Adaptive DVFS Framework for Energy-Efficient DNN Inference 多镜头:一种多目标自适应DVFS框架,用于高效DNN推理
IF 2.9 3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-08-08 DOI: 10.1109/TCAD.2025.3597236
Jiawei Geng;Zongwei Zhu;Weihong Liu;Xuehai Zhou
To tackle power management challenges in deep neural networks (DNNs), dynamic voltage and frequency scaling (DVFS) has gained attention for its ability to enhance energy efficiency without modifying DNN structures. However, current DVFS methods, which rely on historical data, such as processor utilization and task load, suffer from issues like frequency ping-pong, response lag, and limited generalizability. These challenges are exacerbated by real-world scenarios that prioritize time, energy, or energy efficiency differently, making it even harder for existing methods to effectively configure DVFS under such multiobjective constraints or tradeoffs. This article presents multilens (MTL), a multiobjective adaptive DVFS framework. First, we propose a power-sensitive feature extraction method along with multiobjective constraint modeling to characterize DNN inference behavior. Second, critical power blocks are then identified through clustering based on inference behavior similarity, enabling adaptive DVFS instrumentation point settings. Moreover, to enhance the adaptability of multiple platforms and the flexibility of multiple scenarios, MTL integrates a complete deployment process. Experimental results demonstrate the effectiveness of the MTL in optimizing energy efficiency across different hardware platforms and deployment scenarios.
为了解决深度神经网络(DNN)中的电源管理挑战,动态电压和频率缩放(DVFS)因其在不改变DNN结构的情况下提高能源效率的能力而受到关注。然而,当前的DVFS方法依赖于历史数据(如处理器利用率和任务负载),存在频率乒乓、响应滞后和有限的通用性等问题。在现实世界中,时间、能源或能源效率的优先级不同,这加剧了这些挑战,使得现有方法更难在这种多目标约束或权衡下有效配置DVFS。本文介绍了一种多目标自适应DVFS框架——多透镜(multilens, MTL)。首先,我们提出了一种功率敏感特征提取方法,并结合多目标约束建模来表征深度神经网络的推理行为。其次,通过基于推理行为相似性的聚类来识别关键功率块,从而实现自适应DVFS仪表点设置。此外,为了增强多平台的适应性和多场景的灵活性,MTL集成了完整的部署流程。实验结果证明了MTL在不同硬件平台和部署场景下优化能源效率的有效性。
{"title":"MultiLens: A Multiobjective Adaptive DVFS Framework for Energy-Efficient DNN Inference","authors":"Jiawei Geng;Zongwei Zhu;Weihong Liu;Xuehai Zhou","doi":"10.1109/TCAD.2025.3597236","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3597236","url":null,"abstract":"To tackle power management challenges in deep neural networks (DNNs), dynamic voltage and frequency scaling (DVFS) has gained attention for its ability to enhance energy efficiency without modifying DNN structures. However, current DVFS methods, which rely on historical data, such as processor utilization and task load, suffer from issues like frequency ping-pong, response lag, and limited generalizability. These challenges are exacerbated by real-world scenarios that prioritize time, energy, or energy efficiency differently, making it even harder for existing methods to effectively configure DVFS under such multiobjective constraints or tradeoffs. This article presents multilens (MTL), a multiobjective adaptive DVFS framework. First, we propose a power-sensitive feature extraction method along with multiobjective constraint modeling to characterize DNN inference behavior. Second, critical power blocks are then identified through clustering based on inference behavior similarity, enabling adaptive DVFS instrumentation point settings. Moreover, to enhance the adaptability of multiple platforms and the flexibility of multiple scenarios, MTL integrates a complete deployment process. Experimental results demonstrate the effectiveness of the MTL in optimizing energy efficiency across different hardware platforms and deployment scenarios.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"45 3","pages":"1459-1472"},"PeriodicalIF":2.9,"publicationDate":"2025-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146223792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Super-Vth Standard Cells With Improved EDP: Design and Silicon Validation in 65nm LP CMOS 具有改进EDP的超v标准电池:65nm LP CMOS的设计和硅验证
IF 2.9 3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-08-07 DOI: 10.1109/TCAD.2025.3596880
Shubham Yadav;M. S. Oude Alink;André B. J. Kokkeler
The ever-increasing computational load and shrinking power budget have accentuated the need for energy-efficient operation of edge devices. In this article, a combination of static CMOS logic and hybrid pass transistor logic with Static CMOS output (HPSC), which has no floating or weak nodes and is thus as robust to noise as static CMOS logic, is used for designing toolchain-compatible super- $text {V}_{text {th}}$ standard cells. Optimized HPSC variants of a 2/3-input XOR cell, a 2/3-input XNR cell, a half adder cell, a full adder cell, and two variants of a 1-bit multiply–accumulate combinational cell are presented in a commercial 65nm low-power CMOS technology. Measurements of test structures based on ring oscillators and dummy path techniques show an average frequency and average energy-delay product improvement of up to 30.3% and 32.5%, respectively, at typical conditions. The proposed cells’ superior performance compared to the commercially available standard cells is also highlighted in terms of propagation delay, leakage, and dynamic power consumption. This shows a promising approach for foundries or other commercial entities to improve digital design performance to about half a technology node at no additional cost.
不断增加的计算负载和不断缩小的功耗预算,突出了对边缘设备节能运行的需求。本文采用静态CMOS逻辑和带静态CMOS输出(HPSC)的混合通管逻辑相结合的方法设计了兼容工具链的超级$text {V}_{text {th}}$标准单元,该方法没有浮动节点或弱节点,因此具有与静态CMOS逻辑一样的抗噪声能力。在商用65nm低功耗CMOS技术中,提出了2/3输入异或单元、2/3输入XNR单元、半加法器单元、全加法器单元和两种1位乘积组合单元的优化HPSC变体。基于环形振荡器和虚拟路径技术的测试结构的测量表明,在典型条件下,平均频率和平均能量延迟积分别提高了30.3%和32.5%。与市售标准电池相比,所提出的电池在传输延迟、泄漏和动态功耗方面的优越性能也得到了强调。这为铸造厂或其他商业实体提供了一种很有前途的方法,可以在不增加成本的情况下将数字设计性能提高到大约一半的技术节点。
{"title":"Super-Vth Standard Cells With Improved EDP: Design and Silicon Validation in 65nm LP CMOS","authors":"Shubham Yadav;M. S. Oude Alink;André B. J. Kokkeler","doi":"10.1109/TCAD.2025.3596880","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3596880","url":null,"abstract":"The ever-increasing computational load and shrinking power budget have accentuated the need for energy-efficient operation of edge devices. In this article, a combination of static CMOS logic and hybrid pass transistor logic with Static CMOS output (HPSC), which has no floating or weak nodes and is thus as robust to noise as static CMOS logic, is used for designing toolchain-compatible super-<inline-formula> <tex-math>$text {V}_{text {th}}$ </tex-math></inline-formula> standard cells. Optimized HPSC variants of a 2/3-input XOR cell, a 2/3-input XNR cell, a half adder cell, a full adder cell, and two variants of a 1-bit multiply–accumulate combinational cell are presented in a commercial 65nm low-power CMOS technology. Measurements of test structures based on ring oscillators and dummy path techniques show an average frequency and average energy-delay product improvement of up to 30.3% and 32.5%, respectively, at typical conditions. The proposed cells’ superior performance compared to the commercially available standard cells is also highlighted in terms of propagation delay, leakage, and dynamic power consumption. This shows a promising approach for foundries or other commercial entities to improve digital design performance to about half a technology node at no additional cost.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"45 3","pages":"1568-1581"},"PeriodicalIF":2.9,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146223843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Post-Routing Layout Optimization Framework for Lithography Process Window Enlargement 光刻工艺窗口放大后布线优化框架
IF 2.9 3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-08-07 DOI: 10.1109/TCAD.2025.3594247
Yajuan Su;Zixi Liu;Yibo Lin;Xiaojing Su;Yuqin Wang;Xin Hong;Yujie Jiang;Pengyu Ren;Yayi Wei
Lithography compliance is required to guarantee manufacturability of advanced integrated circuits. Conventional flow to enhance lithography printability relies on techniques like optical proximity correction and SRAF which happen at mask design. The optimization space at such a late design stage can be extremely limited due to fixed placement and routing solutions after layout design. In this work, we aim at optimizing lithography printability at early design stages and propose a post-routing layout optimization framework to enlarge lithography process window. The framework leverages a transformer-based deep learning model for fast process window evaluation and simultaneously modifies the layout patterns for lithography compliance, while subjecting to design rules and connectivity constraints. The experimental results exemplify the capability of exploiting our framework to improve the lithography window by an average of 4.31%. Furthermore, the framework greatly improves optimization for layouts with hotspots.
为了保证先进集成电路的可制造性,要求光刻符合要求。提高光刻印刷适性的传统流程依赖于光学接近校正和SRAF等技术,这些技术发生在掩模设计中。由于布局设计后的固定位置和布线解决方案,在这种设计后期的优化空间可能非常有限。在这项工作中,我们旨在优化光刻印刷的早期设计阶段,并提出了一个后路由布局优化框架,以扩大光刻工艺窗口。该框架利用基于变压器的深度学习模型进行快速流程窗口评估,并同时修改光刻合规的布局模式,同时遵守设计规则和连接约束。实验结果表明,利用该框架可将光刻窗口平均提高4.31%。此外,该框架大大提高了对热点布局的优化。
{"title":"A Post-Routing Layout Optimization Framework for Lithography Process Window Enlargement","authors":"Yajuan Su;Zixi Liu;Yibo Lin;Xiaojing Su;Yuqin Wang;Xin Hong;Yujie Jiang;Pengyu Ren;Yayi Wei","doi":"10.1109/TCAD.2025.3594247","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3594247","url":null,"abstract":"Lithography compliance is required to guarantee manufacturability of advanced integrated circuits. Conventional flow to enhance lithography printability relies on techniques like optical proximity correction and SRAF which happen at mask design. The optimization space at such a late design stage can be extremely limited due to fixed placement and routing solutions after layout design. In this work, we aim at optimizing lithography printability at early design stages and propose a post-routing layout optimization framework to enlarge lithography process window. The framework leverages a transformer-based deep learning model for fast process window evaluation and simultaneously modifies the layout patterns for lithography compliance, while subjecting to design rules and connectivity constraints. The experimental results exemplify the capability of exploiting our framework to improve the lithography window by an average of 4.31%. Furthermore, the framework greatly improves optimization for layouts with hotspots.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"45 3","pages":"1549-1553"},"PeriodicalIF":2.9,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146223726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Look-Up Table-Based Energy-Efficient Architecture for Neural Accelerators (LANA) 基于查找表的神经加速器节能架构
IF 2.9 3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-08-06 DOI: 10.1109/TCAD.2025.3596535
Ovishake Sen;Chukwufumnanya Ogbogu;Peyman Dehghanzadeh;Janardhan Rao Doppa;Swarup Bhunia;Partha Pratim Pande;Baibhab Chatterjee
Digital Von-Neumann implementations of neural accelerators are limited by high power consumption and area overheads, while Analog and non-CMOS implementations suffer from noise, device mismatch, and reliability issues. This article introduces a CMOS look-up table (LUT)-based architecture for neural accelerators (LANA) that reduces the power consumption and area overhead of conventional digital implementations through precomputed, faster LUT access while avoiding noise and mismatch challenges of analog circuits. To solve the scalability issues of conventional LUT-based computation, we use a divide-and-conquer (D&C) approach to split high-precision multiply accumulate (MAC) operations into lower-precision MAC. LANA achieves up to $29.54 times $ lower area with $3.34 times $ lower energy per inference task compared to traditional LUT (T-LUT)-based techniques and up to $1.24 times $ lower area with $1.80 times $ lower energy per inference task than conventional digital MAC (Wallace tree/array multipliers) without retraining and without affecting the accuracy of pretrained unpruned models, as well as on lottery ticket pruning (LTP) models that already reduce the number of required MAC operations by up to 98%. Finally, we introduce mixed precision analysis in the LANA framework for all LTP pruned and unpruned models (VGG11, VGG19, Resnet18, Resnet34, GoogleNet) that achieved up to $29.59 times $ (GoogleNet pruned)- $62.83 times $ (VGG11 unpruned) lower area with $3.34 times $ (GoogleNet pruned)- $8.1 times $ (VGG11 unpruned) lower energy per inference than T-LUT-based techniques, and up to $1.24 times $ (GoogleNet pruned)- $2.63 times $ (VGG11 unpruned) lower area requirement with $1.81 times $ (GoogleNet pruned)- $4.37 times $ (VGG11 unpruned) lower energy per inference across models as compared to conventional digital MAC-based techniques with ~1% accuracy loss relative to the baseline.
神经加速器的数字冯-诺伊曼实现受到高功耗和面积开销的限制,而模拟和非cmos实现则受到噪声、器件不匹配和可靠性问题的困扰。本文介绍了一种基于CMOS查找表(LUT)的神经加速器(LANA)架构,该架构通过预先计算、更快的LUT访问降低了传统数字实现的功耗和面积开销,同时避免了模拟电路的噪声和失配挑战。为了解决传统基于lut的计算的可扩展性问题,我们使用分治(D&C)方法将高精度乘法累积(MAC)操作拆分为较低精度的MAC。与传统的基于LUT (T-LUT)的技术相比,LANA实现了高达29.54 倍的低面积,每个推理任务的能量低3.34 倍;与传统的数字MAC(华莱士树/数组乘法器)相比,LANA实现了高达1.24 倍的低面积,每个推理任务的能量低1.80 倍,而无需重新训练,也不会影响其精度预训练的未修剪模型,以及彩票修剪(LTP)模型,这些模型已经将所需的MAC操作数量减少了高达98%。最后,我们在LANA框架中对所有LTP修剪和未修剪模型(VGG11, VGG19, Resnet18, Resnet34, GoogleNet)引入了混合精度分析,与基于t - lutt的技术相比,实现了高达29.59 times $ (GoogleNet修剪)- 62.83 times $ (VGG11未修剪)的低区域和3.34 times $ (GoogleNet修剪)- 8.1 times $ (VGG11未修剪)的低能量推断。与传统的基于数字mac的技术相比,高达1.24 times $ (GoogleNet修剪)- 2.63 times $ (VGG11未修剪)的面积要求降低了1.81 times $ (GoogleNet修剪)- 4.37 times $ (VGG11未修剪),跨模型的每次推理能量降低了,相对于基线的精度损失约为1%。
{"title":"Look-Up Table-Based Energy-Efficient Architecture for Neural Accelerators (LANA)","authors":"Ovishake Sen;Chukwufumnanya Ogbogu;Peyman Dehghanzadeh;Janardhan Rao Doppa;Swarup Bhunia;Partha Pratim Pande;Baibhab Chatterjee","doi":"10.1109/TCAD.2025.3596535","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3596535","url":null,"abstract":"Digital Von-Neumann implementations of neural accelerators are limited by high power consumption and area overheads, while Analog and non-CMOS implementations suffer from noise, device mismatch, and reliability issues. This article introduces a CMOS look-up table (LUT)-based architecture for neural accelerators (LANA) that reduces the power consumption and area overhead of conventional digital implementations through precomputed, faster LUT access while avoiding noise and mismatch challenges of analog circuits. To solve the scalability issues of conventional LUT-based computation, we use a divide-and-conquer (D&C) approach to split high-precision multiply accumulate (MAC) operations into lower-precision MAC. LANA achieves up to <inline-formula> <tex-math>$29.54 times $ </tex-math></inline-formula> lower area with <inline-formula> <tex-math>$3.34 times $ </tex-math></inline-formula> lower energy per inference task compared to traditional LUT (T-LUT)-based techniques and up to <inline-formula> <tex-math>$1.24 times $ </tex-math></inline-formula> lower area with <inline-formula> <tex-math>$1.80 times $ </tex-math></inline-formula> lower energy per inference task than conventional digital MAC (Wallace tree/array multipliers) without retraining and without affecting the accuracy of pretrained unpruned models, as well as on lottery ticket pruning (LTP) models that already reduce the number of required MAC operations by up to 98%. Finally, we introduce mixed precision analysis in the LANA framework for all LTP pruned and unpruned models (VGG11, VGG19, Resnet18, Resnet34, GoogleNet) that achieved up to <inline-formula> <tex-math>$29.59 times $ </tex-math></inline-formula> (GoogleNet pruned)-<inline-formula> <tex-math>$62.83 times $ </tex-math></inline-formula> (VGG11 unpruned) lower area with <inline-formula> <tex-math>$3.34 times $ </tex-math></inline-formula> (GoogleNet pruned)-<inline-formula> <tex-math>$8.1 times $ </tex-math></inline-formula> (VGG11 unpruned) lower energy per inference than T-LUT-based techniques, and up to <inline-formula> <tex-math>$1.24 times $ </tex-math></inline-formula> (GoogleNet pruned)-<inline-formula> <tex-math>$2.63 times $ </tex-math></inline-formula> (VGG11 unpruned) lower area requirement with <inline-formula> <tex-math>$1.81 times $ </tex-math></inline-formula> (GoogleNet pruned)-<inline-formula> <tex-math>$4.37 times $ </tex-math></inline-formula> (VGG11 unpruned) lower energy per inference across models as compared to conventional digital MAC-based techniques with ~1% accuracy loss relative to the baseline.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"45 3","pages":"1438-1452"},"PeriodicalIF":2.9,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146223791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DCTDSE: A Bimodal Design Space Exploration Flow via Discrete–Continuous Transformation 基于离散-连续变换的双峰设计空间探索流程
IF 2.9 3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-08-06 DOI: 10.1109/TCAD.2025.3596538
Shuaibo Huang;Liangji Wu;Yuyang Ye;Hao Yan;Longxing Shi
The conservation core accelerator presents a promising avenue for improving the computational efficiency of specific applications. However, its design space is ultrahigh-dimensional, significantly increasing the exploratory effort required to identify the optimal design across performance, power, and area metrics. Furthermore, the discrete nature of the microarchitecture space renders conventional search methods ineffective. To tackle these challenges, we propose the discrete–continuous transformation to speedup design space exploration, namely, DCTDSE. It can operate in either offline or online mode. In offline mode, it transforms the original discrete design space into a continuous space, builds predictive models, performs parallel gradient-based optimization, and maps the results back to the discrete domain. In the online mode, DCTDSE refines the models by iteratively resampling previously found solutions, thereby enhancing exploration quality while maintaining moderate runtime overhead. Experimental results indicate that DCTDSE achieves a $3.9{times }$ to $40{times }$ speedup over benchmark methods in offline mode. In online mode, it provides a $2.5{times }$ speedup, with a 21% reduction in exploration quality relative to the most accurate comparison method.
守恒核加速器为提高特定应用的计算效率提供了一条有前途的途径。然而,它的设计空间是超高维的,这大大增加了在性能、功率和面积指标上确定最佳设计所需的探索工作。此外,微架构空间的离散性使得传统的搜索方法无效。为了应对这些挑战,我们提出了离散-连续转换来加速设计空间的探索,即DCTDSE。它可以在离线或在线模式下运行。在离线模式下,它将原始离散设计空间转换为连续空间,建立预测模型,执行并行梯度优化,并将结果映射回离散域。在在线模式下,DCTDSE通过迭代地重新采样先前找到的解来改进模型,从而在保持适度运行时开销的同时提高了勘探质量。实验结果表明,在离线模式下,DCTDSE比基准方法实现了$3.9{times}$到$40{times}$的加速。在在线模式下,它提供了2.5{times}$的加速,相对于最准确的比较方法,勘探质量降低了21%。
{"title":"DCTDSE: A Bimodal Design Space Exploration Flow via Discrete–Continuous Transformation","authors":"Shuaibo Huang;Liangji Wu;Yuyang Ye;Hao Yan;Longxing Shi","doi":"10.1109/TCAD.2025.3596538","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3596538","url":null,"abstract":"The conservation core accelerator presents a promising avenue for improving the computational efficiency of specific applications. However, its design space is ultrahigh-dimensional, significantly increasing the exploratory effort required to identify the optimal design across performance, power, and area metrics. Furthermore, the discrete nature of the microarchitecture space renders conventional search methods ineffective. To tackle these challenges, we propose the discrete–continuous transformation to speedup design space exploration, namely, DCTDSE. It can operate in either offline or online mode. In offline mode, it transforms the original discrete design space into a continuous space, builds predictive models, performs parallel gradient-based optimization, and maps the results back to the discrete domain. In the online mode, DCTDSE refines the models by iteratively resampling previously found solutions, thereby enhancing exploration quality while maintaining moderate runtime overhead. Experimental results indicate that DCTDSE achieves a <inline-formula> <tex-math>$3.9{times }$ </tex-math></inline-formula> to <inline-formula> <tex-math>$40{times }$ </tex-math></inline-formula> speedup over benchmark methods in offline mode. In online mode, it provides a <inline-formula> <tex-math>$2.5{times }$ </tex-math></inline-formula> speedup, with a 21% reduction in exploration quality relative to the most accurate comparison method.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"45 3","pages":"1453-1458"},"PeriodicalIF":2.9,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146223564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Binary Weight Multibit Activation Quantization for Compute-in-Memory CNN Accelerators 内存中计算CNN加速器的二进制加权多比特激活量化
IF 2.9 3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-08-04 DOI: 10.1109/TCAD.2025.3595830
Wenyong Zhou;Zhengwu Liu;Yuan Ren;Ngai Wong
compute-in-memory (CIM) accelerators have emerged as a promising way for enhancing the energy efficiency of convolutional neural networks (CNNs). Deploying CNNs on CIM platforms generally requires quantization of network weights and activations to meet hardware constraints. However, existing approaches either prioritize hardware efficiency with binary weight and activation quantization at the cost of accuracy, or utilize multibit weights and activations for greater accuracy but limited efficiency. In this article, we introduce a novel binary weight multibit activation (BWMA) method for CNNs on CIM-based accelerators. Our contributions include: deriving closed-form solutions for weight quantization in each layer, significantly improving the representational capabilities of binarized weights; and developing a differentiable function for activation quantization, approximating the ideal multibit function while bypassing the extensive search for optimal settings. Through comprehensive experiments on CIFAR-10 and ImageNet datasets, we show that BWMA achieves notable accuracy improvements over existing methods, registering gains of 1.44%–5.46% and 0.35%–5.37% on respective datasets. Moreover, hardware simulation results indicate that 4-bit activation quantization strikes the optimal balance between hardware cost and model performance.
内存计算(CIM)加速器已经成为提高卷积神经网络(cnn)能量效率的一种有前途的方法。在CIM平台上部署cnn通常需要量化网络权重和激活,以满足硬件约束。然而,现有的方法要么以牺牲精度为代价优先考虑二进制权值和激活量化的硬件效率,要么利用多比特权值和激活来提高精度,但效率有限。在本文中,我们介绍了一种新的基于cim加速器的cnn二进制加权多比特激活(BWMA)方法。我们的贡献包括:在每层中推导出权值量化的封闭解,显著提高了二值化权值的表示能力;并开发了一个可微函数用于激活量化,近似于理想的多比特函数,同时绕过了对最佳设置的广泛搜索。通过在CIFAR-10和ImageNet数据集上的综合实验,我们表明BWMA比现有方法取得了显著的精度提高,分别在各自的数据集上取得了1.44% ~ 5.46%和0.35% ~ 5.37%的精度提高。此外,硬件仿真结果表明,4位激活量化在硬件成本和模型性能之间取得了最佳平衡。
{"title":"Binary Weight Multibit Activation Quantization for Compute-in-Memory CNN Accelerators","authors":"Wenyong Zhou;Zhengwu Liu;Yuan Ren;Ngai Wong","doi":"10.1109/TCAD.2025.3595830","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3595830","url":null,"abstract":"compute-in-memory (CIM) accelerators have emerged as a promising way for enhancing the energy efficiency of convolutional neural networks (CNNs). Deploying CNNs on CIM platforms generally requires quantization of network weights and activations to meet hardware constraints. However, existing approaches either prioritize hardware efficiency with binary weight and activation quantization at the cost of accuracy, or utilize multibit weights and activations for greater accuracy but limited efficiency. In this article, we introduce a novel binary weight multibit activation (BWMA) method for CNNs on CIM-based accelerators. Our contributions include: deriving closed-form solutions for weight quantization in each layer, significantly improving the representational capabilities of binarized weights; and developing a differentiable function for activation quantization, approximating the ideal multibit function while bypassing the extensive search for optimal settings. Through comprehensive experiments on CIFAR-10 and ImageNet datasets, we show that BWMA achieves notable accuracy improvements over existing methods, registering gains of 1.44%–5.46% and 0.35%–5.37% on respective datasets. Moreover, hardware simulation results indicate that 4-bit activation quantization strikes the optimal balance between hardware cost and model performance.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"45 3","pages":"1432-1437"},"PeriodicalIF":2.9,"publicationDate":"2025-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146223665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HyDAS: Hybrid Domain Deformed Attention for Selective Hotspot Detection HyDAS:用于选择性热点检测的混合域变形注意
IF 2.9 3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-07-30 DOI: 10.1109/TCAD.2025.3587533
Yuyang Chen;Qi Sun;Su Zheng;Xinyun Zhang;Bei Yu;Hao Geng
Technology node scaling is challenged in many aspects, including pitch reduction, patterning flexibility, and lithography process variability during manufacturing. Without exception, layout hotspot detection, one of the critical steps to achieving design closure, also requires upgrading the associated techniques. With the rapid development of deep learning techniques, the detector exploiting convolutional neural network (CNN) is superior to ones based on pattern matching and classical machine learning algorithms. However, due to the local nature of CNN, the traditional CNN-based detector fails to model the relationship between the patterns in a large-sized layout, resulting in ignoring the impact of light propagation and some optical effects during photolithography. Even worse, another challenge arises from the fact that engineers cannot fully trust the results of learning model-based detectors, especially when handling some complicated layout patterns in practice. This makes it very difficult to deploy the detectors. Observing the facts, we propose a vision transformer (ViT) model-based layout hotspot detector with a deformed attention mechanism, where the training paradigm is inspired by the large pretrained foundation model (e.g., OpenAI’s GPT-n series) and fine-tuning. Considering the light diffraction during photolithography, the hybrid domain (i.e., spatial and spectral domains) layout inputs via multichannel are leveraged. Besides, our proposed detector integrates a selective option where the model can choose to do prediction or send to engineers based on the misclassification risk level. Experimental results on the ICCAD2012 metal layer benchmarks and ICCAD2020 via layer benchmarks demonstrate the effectiveness and efficiency of our approach. We have made the ICCAD2020 dataset publicly available to support further research in hotspot detection, enable benchmarking across different process nodes and layout types, and facilitate reproducibility in the field. The dataset is accessible at https://github.com/shadowior/ICCAD2020.
技术节点缩放在许多方面都面临挑战,包括节距减小、图案灵活性和制造过程中的光刻工艺可变性。毫无例外,布局热点检测作为实现设计封闭的关键步骤之一,也需要升级相关技术。随着深度学习技术的快速发展,利用卷积神经网络(CNN)的检测器优于基于模式匹配和经典机器学习算法的检测器。然而,由于CNN的局域性,传统的基于CNN的探测器无法模拟大尺寸布局中图案之间的关系,从而忽略了光刻过程中光传播和一些光学效应的影响。更糟糕的是,另一个挑战来自于工程师不能完全信任学习基于模型的检测器的结果,特别是在实践中处理一些复杂的布局模式时。这使得部署探测器变得非常困难。观察到这一事实,我们提出了一种基于视觉变形(ViT)模型的布局热点检测器,该检测器具有变形注意机制,其训练范式受到大型预训练基础模型(例如OpenAI的GPT-n系列)的启发并进行微调。考虑到光刻过程中的光衍射,利用多通道的混合域(即空间域和光谱域)布局输入。此外,我们提出的检测器集成了一个选择性选项,模型可以根据错误分类的风险级别选择进行预测或发送给工程师。在ICCAD2012金属层基准和ICCAD2020层基准上的实验结果证明了该方法的有效性和效率。我们已经公开了ICCAD2020数据集,以支持在热点检测方面的进一步研究,实现跨不同工艺节点和布局类型的基准测试,并促进该领域的可重复性。该数据集可在https://github.com/shadowior/ICCAD2020上访问。
{"title":"HyDAS: Hybrid Domain Deformed Attention for Selective Hotspot Detection","authors":"Yuyang Chen;Qi Sun;Su Zheng;Xinyun Zhang;Bei Yu;Hao Geng","doi":"10.1109/TCAD.2025.3587533","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3587533","url":null,"abstract":"Technology node scaling is challenged in many aspects, including pitch reduction, patterning flexibility, and lithography process variability during manufacturing. Without exception, layout hotspot detection, one of the critical steps to achieving design closure, also requires upgrading the associated techniques. With the rapid development of deep learning techniques, the detector exploiting convolutional neural network (CNN) is superior to ones based on pattern matching and classical machine learning algorithms. However, due to the local nature of CNN, the traditional CNN-based detector fails to model the relationship between the patterns in a large-sized layout, resulting in ignoring the impact of light propagation and some optical effects during photolithography. Even worse, another challenge arises from the fact that engineers cannot fully trust the results of learning model-based detectors, especially when handling some complicated layout patterns in practice. This makes it very difficult to deploy the detectors. Observing the facts, we propose a vision transformer (ViT) model-based layout hotspot detector with a deformed attention mechanism, where the training paradigm is inspired by the large pretrained foundation model (e.g., OpenAI’s GPT-n series) and fine-tuning. Considering the light diffraction during photolithography, the hybrid domain (i.e., spatial and spectral domains) layout inputs via multichannel are leveraged. Besides, our proposed detector integrates a selective option where the model can choose to do prediction or send to engineers based on the misclassification risk level. Experimental results on the <monospace>ICCAD2012</monospace> metal layer benchmarks and <monospace>ICCAD2020</monospace> via layer benchmarks demonstrate the effectiveness and efficiency of our approach. We have made the <monospace>ICCAD2020</monospace> dataset publicly available to support further research in hotspot detection, enable benchmarking across different process nodes and layout types, and facilitate reproducibility in the field. The dataset is accessible at <uri>https://github.com/shadowior/ICCAD2020</uri>.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"45 3","pages":"1523-1534"},"PeriodicalIF":2.9,"publicationDate":"2025-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146223567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Accelerated Newton-Based Matrix Splitting Iteration Method for Mixed-Cell-Height Circuit Legalization 一种基于加速牛顿矩阵分裂迭代的混合单元高度电路合法化方法
IF 2.9 3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-07-28 DOI: 10.1109/TCAD.2025.3593205
Chencan Zhou;Yang Cao;Fan Yang;Xiaoqing Wen;Quan Shi;Rong Rong;Aili Yang
The advancement of technology nodes has intensified the focus on mixed-cell-height circuit design, posing challenges to traditional legalization techniques. In this article, we propose a novel and efficient accelerated Newton-based matrix splitting (ANMS) iteration method to address the mixed-cell-height circuit legalization problem. Our approach reformulates this problem into a generalized absolute value equation and leverages matrix splitting and the latest estimate vector to enhance computational efficiency. We also introduce a relaxation variant within the ANMS framework, namely, the accelerated Newton-based successive overrelaxation (ANSOR) method, which is particularly effective in scenarios requiring high computational performance and precise parameter tuning. The proposed method achieves linear computational complexity. Furthermore, we perform an in-depth analysis of the sufficient convergence conditions for the ANMS method and optimize cells that have excessive displacement. Experimental results show that the proposed ANMS method achieves a speedup of $1.09times $ $4.94times $ compared to state-of-the-art methods, while maintaining the quality of solution. This makes it highly suitable for addressing complex placement design challenges.
技术节点的进步加强了人们对混合蜂窝高度电路设计的关注,对传统的合法化技术提出了挑战。在本文中,我们提出了一种新颖高效的加速牛顿矩阵分裂(ANMS)迭代方法来解决混合单元高度电路的合法化问题。我们的方法将该问题重新表述为广义绝对值方程,并利用矩阵分裂和最新的估计向量来提高计算效率。我们还在ANMS框架中引入了一种松弛变体,即基于加速牛顿的连续过松弛(ANSOR)方法,该方法在需要高计算性能和精确参数调谐的场景中特别有效。该方法实现了线性计算复杂度。此外,我们对ANMS方法的充分收敛条件进行了深入的分析,并对具有过大位移的单元进行了优化。实验结果表明,与现有方法相比,该方法在保持求解质量的前提下,实现了1.09 ~ 4.94倍的加速。这使得它非常适合解决复杂的布局设计挑战。
{"title":"An Accelerated Newton-Based Matrix Splitting Iteration Method for Mixed-Cell-Height Circuit Legalization","authors":"Chencan Zhou;Yang Cao;Fan Yang;Xiaoqing Wen;Quan Shi;Rong Rong;Aili Yang","doi":"10.1109/TCAD.2025.3593205","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3593205","url":null,"abstract":"The advancement of technology nodes has intensified the focus on mixed-cell-height circuit design, posing challenges to traditional legalization techniques. In this article, we propose a novel and efficient accelerated Newton-based matrix splitting (ANMS) iteration method to address the mixed-cell-height circuit legalization problem. Our approach reformulates this problem into a generalized absolute value equation and leverages matrix splitting and the latest estimate vector to enhance computational efficiency. We also introduce a relaxation variant within the ANMS framework, namely, the accelerated Newton-based successive overrelaxation (ANSOR) method, which is particularly effective in scenarios requiring high computational performance and precise parameter tuning. The proposed method achieves linear computational complexity. Furthermore, we perform an in-depth analysis of the sufficient convergence conditions for the ANMS method and optimize cells that have excessive displacement. Experimental results show that the proposed ANMS method achieves a speedup of <inline-formula> <tex-math>$1.09times $ </tex-math></inline-formula>–<inline-formula> <tex-math>$4.94times $ </tex-math></inline-formula> compared to state-of-the-art methods, while maintaining the quality of solution. This makes it highly suitable for addressing complex placement design challenges.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"45 3","pages":"1535-1548"},"PeriodicalIF":2.9,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146223844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RelOps: Reliability Optimization in Standard Cells Across PVT Variations in FinFET Digital Circuits 在FinFET数字电路中跨PVT变化的标准单元的可靠性优化
IF 2.9 3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-07-28 DOI: 10.1109/TCAD.2025.3593207
Mohammad Rehan Akhtar;Ritwik Basyas Goswami;Zia Abbas
FinFET, now firmly established in leading VLSI industries for their superior performance, exhibit heightened aging susceptibility that poses significant reliability challenges. The aggressive scaling of technology nodes has further compromised circuit reliability in recent years, highlighting the need for effective aging mitigation techniques. Recent advancements in the miniaturization of nanoscale technology have demonstrated the potential of optimizing performance parameters in standard cells using machine learning (ML) models and optimization algorithms through device sizing modifications. Building on this progress, we propose a methodology for optimizing performance parameters in 16 nm high-performance (HP) FinFET for the first time. The approach leverages a multiobjective optimization algorithm framework to mitigate aging impacts across process, voltage, temperature (PVT) variations while addressing negative bias temperature instability (NBTI) and hot carrier injection (HCI) effects by optimally adjusting FinFET design parameters, including channel length (lg), width (tfin), and height (hfin). With SPICE simulations, time-series datasets were generated to train ML models that achieved an R2 score exceeding 0.99 and a mean absolute percentage error below 1% across standard cells. Our approach yields a significant simulation speedup and a reduction in simulation workload compared to traditional SPICE simulations. Using the proposed optimization algorithm framework, we improved the power-delay product (PDP) by up to 36.97% under nonaging conditions and 34.94% with aging considered with respect to the nominal dimension at the fresh year, demonstrating significant performance gains for FinFET-based standard cells. The experimental results on 12 distinct complex cells validate the aging mitigation across years.
FinFET以其卓越的性能在领先的VLSI行业中稳固地建立起来,但其老化敏感性增加,这对可靠性构成了重大挑战。近年来,技术节点的大规模扩展进一步降低了电路的可靠性,因此需要有效的老化缓解技术。纳米技术小型化的最新进展表明,利用机器学习(ML)模型和优化算法,通过调整器件尺寸,可以优化标准电池的性能参数。在此基础上,我们首次提出了一种优化16纳米高性能(HP) FinFET性能参数的方法。该方法利用多目标优化算法框架,通过优化调整FinFET设计参数,包括通道长度(lg)、宽度(tfin)和高度(hfin),减轻工艺、电压、温度(PVT)变化对老化的影响,同时解决负偏置温度不稳定性(NBTI)和热载流子注入(HCI)效应。通过SPICE模拟,生成时间序列数据集来训练机器学习模型,这些模型在标准单元间的R2得分超过0.99,平均绝对百分比误差低于1%。与传统的SPICE模拟相比,我们的方法产生了显着的模拟加速和减少模拟工作量。使用所提出的优化算法框架,我们将功率延迟积(PDP)在非老化条件下提高了36.97%,在新一年的标称尺寸考虑老化时提高了34.94%,证明了基于finfet的标准电池的显着性能提升。在12个不同复杂细胞上的实验结果验证了其延缓衰老的作用。
{"title":"RelOps: Reliability Optimization in Standard Cells Across PVT Variations in FinFET Digital Circuits","authors":"Mohammad Rehan Akhtar;Ritwik Basyas Goswami;Zia Abbas","doi":"10.1109/TCAD.2025.3593207","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3593207","url":null,"abstract":"FinFET, now firmly established in leading VLSI industries for their superior performance, exhibit heightened aging susceptibility that poses significant reliability challenges. The aggressive scaling of technology nodes has further compromised circuit reliability in recent years, highlighting the need for effective aging mitigation techniques. Recent advancements in the miniaturization of nanoscale technology have demonstrated the potential of optimizing performance parameters in standard cells using machine learning (ML) models and optimization algorithms through device sizing modifications. Building on this progress, we propose a methodology for optimizing performance parameters in 16 nm high-performance (HP) FinFET for the first time. The approach leverages a multiobjective optimization algorithm framework to mitigate aging impacts across process, voltage, temperature (PVT) variations while addressing negative bias temperature instability (NBTI) and hot carrier injection (HCI) effects by optimally adjusting FinFET design parameters, including channel length (lg), width (tfin), and height (hfin). With SPICE simulations, time-series datasets were generated to train ML models that achieved an R2 score exceeding 0.99 and a mean absolute percentage error below 1% across standard cells. Our approach yields a significant simulation speedup and a reduction in simulation workload compared to traditional SPICE simulations. Using the proposed optimization algorithm framework, we improved the power-delay product (PDP) by up to 36.97% under nonaging conditions and 34.94% with aging considered with respect to the nominal dimension at the fresh year, demonstrating significant performance gains for FinFET-based standard cells. The experimental results on 12 distinct complex cells validate the aging mitigation across years.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"45 3","pages":"1371-1383"},"PeriodicalIF":2.9,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146223850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gas Leakage Detection Using YOLO Accelerator Based on ZYNQ 基于ZYNQ的YOLO加速器气体泄漏检测
IF 2.9 3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-07-25 DOI: 10.1109/TCAD.2025.3592582
Yunpeng Yang;Feng Gao;Xiaopeng Yang;Runze Zhang;Zhipeng Li;Hua Xia;Qifeng Li;Xiangyun Ma
Infrared imaging is a valuable technology for gas leakage detection due to its high sensitivity, long detection range, and high efficiency. Conventional target detection methods depend on manually extracting image features, which often leads to limited accuracy, low adaptability, and slow detection speeds. Deep learning technology offers a potential solution to these challenges; however, the increasing depth of neural networks imposes significant computational demands, posing challenges to real-time detection. This article presents a compact and energy-efficient gas detection system, implemented with a ZYNQ platform and an infrared camera. We propose a ZYNQ-based convolution accelerator to enhance gas plume detection from images captured by the infrared camera. Operating at a clock frequency of 130 MHz, the accelerator is capable of reaching a peak performance of 37.44 Gop/s, with power consumption of only 4.12 W. The system achieves a processing speed of 0.235 s per image, enabling real-time gas leakage detection.
红外成像具有灵敏度高、探测距离远、效率高等优点,是一种有价值的气体泄漏检测技术。传统的目标检测方法依赖于人工提取图像特征,往往导致精度有限、适应性低、检测速度慢。深度学习技术为这些挑战提供了一个潜在的解决方案;然而,随着神经网络深度的增加,对计算量的要求越来越高,这对实时检测提出了挑战。本文介绍了一种紧凑、节能的气体检测系统,该系统采用ZYNQ平台和红外摄像机实现。我们提出了一种基于zynq的卷积加速器,以增强红外相机捕获图像的气体羽流检测。在130 MHz的时钟频率下工作,加速器能够达到37.44 Gop/s的峰值性能,功耗仅为4.12 W。该系统的处理速度为每张图像0.235秒,可实现实时气体泄漏检测。
{"title":"Gas Leakage Detection Using YOLO Accelerator Based on ZYNQ","authors":"Yunpeng Yang;Feng Gao;Xiaopeng Yang;Runze Zhang;Zhipeng Li;Hua Xia;Qifeng Li;Xiangyun Ma","doi":"10.1109/TCAD.2025.3592582","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3592582","url":null,"abstract":"Infrared imaging is a valuable technology for gas leakage detection due to its high sensitivity, long detection range, and high efficiency. Conventional target detection methods depend on manually extracting image features, which often leads to limited accuracy, low adaptability, and slow detection speeds. Deep learning technology offers a potential solution to these challenges; however, the increasing depth of neural networks imposes significant computational demands, posing challenges to real-time detection. This article presents a compact and energy-efficient gas detection system, implemented with a ZYNQ platform and an infrared camera. We propose a ZYNQ-based convolution accelerator to enhance gas plume detection from images captured by the infrared camera. Operating at a clock frequency of 130 MHz, the accelerator is capable of reaching a peak performance of 37.44 Gop/s, with power consumption of only 4.12 W. The system achieves a processing speed of 0.235 s per image, enabling real-time gas leakage detection.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"45 2","pages":"1021-1027"},"PeriodicalIF":2.9,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146006858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1