Pub Date : 2025-08-08DOI: 10.1109/TCAD.2025.3597236
Jiawei Geng;Zongwei Zhu;Weihong Liu;Xuehai Zhou
To tackle power management challenges in deep neural networks (DNNs), dynamic voltage and frequency scaling (DVFS) has gained attention for its ability to enhance energy efficiency without modifying DNN structures. However, current DVFS methods, which rely on historical data, such as processor utilization and task load, suffer from issues like frequency ping-pong, response lag, and limited generalizability. These challenges are exacerbated by real-world scenarios that prioritize time, energy, or energy efficiency differently, making it even harder for existing methods to effectively configure DVFS under such multiobjective constraints or tradeoffs. This article presents multilens (MTL), a multiobjective adaptive DVFS framework. First, we propose a power-sensitive feature extraction method along with multiobjective constraint modeling to characterize DNN inference behavior. Second, critical power blocks are then identified through clustering based on inference behavior similarity, enabling adaptive DVFS instrumentation point settings. Moreover, to enhance the adaptability of multiple platforms and the flexibility of multiple scenarios, MTL integrates a complete deployment process. Experimental results demonstrate the effectiveness of the MTL in optimizing energy efficiency across different hardware platforms and deployment scenarios.
{"title":"MultiLens: A Multiobjective Adaptive DVFS Framework for Energy-Efficient DNN Inference","authors":"Jiawei Geng;Zongwei Zhu;Weihong Liu;Xuehai Zhou","doi":"10.1109/TCAD.2025.3597236","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3597236","url":null,"abstract":"To tackle power management challenges in deep neural networks (DNNs), dynamic voltage and frequency scaling (DVFS) has gained attention for its ability to enhance energy efficiency without modifying DNN structures. However, current DVFS methods, which rely on historical data, such as processor utilization and task load, suffer from issues like frequency ping-pong, response lag, and limited generalizability. These challenges are exacerbated by real-world scenarios that prioritize time, energy, or energy efficiency differently, making it even harder for existing methods to effectively configure DVFS under such multiobjective constraints or tradeoffs. This article presents multilens (MTL), a multiobjective adaptive DVFS framework. First, we propose a power-sensitive feature extraction method along with multiobjective constraint modeling to characterize DNN inference behavior. Second, critical power blocks are then identified through clustering based on inference behavior similarity, enabling adaptive DVFS instrumentation point settings. Moreover, to enhance the adaptability of multiple platforms and the flexibility of multiple scenarios, MTL integrates a complete deployment process. Experimental results demonstrate the effectiveness of the MTL in optimizing energy efficiency across different hardware platforms and deployment scenarios.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"45 3","pages":"1459-1472"},"PeriodicalIF":2.9,"publicationDate":"2025-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146223792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-07DOI: 10.1109/TCAD.2025.3596880
Shubham Yadav;M. S. Oude Alink;André B. J. Kokkeler
The ever-increasing computational load and shrinking power budget have accentuated the need for energy-efficient operation of edge devices. In this article, a combination of static CMOS logic and hybrid pass transistor logic with Static CMOS output (HPSC), which has no floating or weak nodes and is thus as robust to noise as static CMOS logic, is used for designing toolchain-compatible super-$text {V}_{text {th}}$ standard cells. Optimized HPSC variants of a 2/3-input XOR cell, a 2/3-input XNR cell, a half adder cell, a full adder cell, and two variants of a 1-bit multiply–accumulate combinational cell are presented in a commercial 65nm low-power CMOS technology. Measurements of test structures based on ring oscillators and dummy path techniques show an average frequency and average energy-delay product improvement of up to 30.3% and 32.5%, respectively, at typical conditions. The proposed cells’ superior performance compared to the commercially available standard cells is also highlighted in terms of propagation delay, leakage, and dynamic power consumption. This shows a promising approach for foundries or other commercial entities to improve digital design performance to about half a technology node at no additional cost.
{"title":"Super-Vth Standard Cells With Improved EDP: Design and Silicon Validation in 65nm LP CMOS","authors":"Shubham Yadav;M. S. Oude Alink;André B. J. Kokkeler","doi":"10.1109/TCAD.2025.3596880","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3596880","url":null,"abstract":"The ever-increasing computational load and shrinking power budget have accentuated the need for energy-efficient operation of edge devices. In this article, a combination of static CMOS logic and hybrid pass transistor logic with Static CMOS output (HPSC), which has no floating or weak nodes and is thus as robust to noise as static CMOS logic, is used for designing toolchain-compatible super-<inline-formula> <tex-math>$text {V}_{text {th}}$ </tex-math></inline-formula> standard cells. Optimized HPSC variants of a 2/3-input XOR cell, a 2/3-input XNR cell, a half adder cell, a full adder cell, and two variants of a 1-bit multiply–accumulate combinational cell are presented in a commercial 65nm low-power CMOS technology. Measurements of test structures based on ring oscillators and dummy path techniques show an average frequency and average energy-delay product improvement of up to 30.3% and 32.5%, respectively, at typical conditions. The proposed cells’ superior performance compared to the commercially available standard cells is also highlighted in terms of propagation delay, leakage, and dynamic power consumption. This shows a promising approach for foundries or other commercial entities to improve digital design performance to about half a technology node at no additional cost.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"45 3","pages":"1568-1581"},"PeriodicalIF":2.9,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146223843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lithography compliance is required to guarantee manufacturability of advanced integrated circuits. Conventional flow to enhance lithography printability relies on techniques like optical proximity correction and SRAF which happen at mask design. The optimization space at such a late design stage can be extremely limited due to fixed placement and routing solutions after layout design. In this work, we aim at optimizing lithography printability at early design stages and propose a post-routing layout optimization framework to enlarge lithography process window. The framework leverages a transformer-based deep learning model for fast process window evaluation and simultaneously modifies the layout patterns for lithography compliance, while subjecting to design rules and connectivity constraints. The experimental results exemplify the capability of exploiting our framework to improve the lithography window by an average of 4.31%. Furthermore, the framework greatly improves optimization for layouts with hotspots.
{"title":"A Post-Routing Layout Optimization Framework for Lithography Process Window Enlargement","authors":"Yajuan Su;Zixi Liu;Yibo Lin;Xiaojing Su;Yuqin Wang;Xin Hong;Yujie Jiang;Pengyu Ren;Yayi Wei","doi":"10.1109/TCAD.2025.3594247","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3594247","url":null,"abstract":"Lithography compliance is required to guarantee manufacturability of advanced integrated circuits. Conventional flow to enhance lithography printability relies on techniques like optical proximity correction and SRAF which happen at mask design. The optimization space at such a late design stage can be extremely limited due to fixed placement and routing solutions after layout design. In this work, we aim at optimizing lithography printability at early design stages and propose a post-routing layout optimization framework to enlarge lithography process window. The framework leverages a transformer-based deep learning model for fast process window evaluation and simultaneously modifies the layout patterns for lithography compliance, while subjecting to design rules and connectivity constraints. The experimental results exemplify the capability of exploiting our framework to improve the lithography window by an average of 4.31%. Furthermore, the framework greatly improves optimization for layouts with hotspots.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"45 3","pages":"1549-1553"},"PeriodicalIF":2.9,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146223726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Digital Von-Neumann implementations of neural accelerators are limited by high power consumption and area overheads, while Analog and non-CMOS implementations suffer from noise, device mismatch, and reliability issues. This article introduces a CMOS look-up table (LUT)-based architecture for neural accelerators (LANA) that reduces the power consumption and area overhead of conventional digital implementations through precomputed, faster LUT access while avoiding noise and mismatch challenges of analog circuits. To solve the scalability issues of conventional LUT-based computation, we use a divide-and-conquer (D&C) approach to split high-precision multiply accumulate (MAC) operations into lower-precision MAC. LANA achieves up to $29.54 times $ lower area with $3.34 times $ lower energy per inference task compared to traditional LUT (T-LUT)-based techniques and up to $1.24 times $ lower area with $1.80 times $ lower energy per inference task than conventional digital MAC (Wallace tree/array multipliers) without retraining and without affecting the accuracy of pretrained unpruned models, as well as on lottery ticket pruning (LTP) models that already reduce the number of required MAC operations by up to 98%. Finally, we introduce mixed precision analysis in the LANA framework for all LTP pruned and unpruned models (VGG11, VGG19, Resnet18, Resnet34, GoogleNet) that achieved up to $29.59 times $ (GoogleNet pruned)-$62.83 times $ (VGG11 unpruned) lower area with $3.34 times $ (GoogleNet pruned)-$8.1 times $ (VGG11 unpruned) lower energy per inference than T-LUT-based techniques, and up to $1.24 times $ (GoogleNet pruned)-$2.63 times $ (VGG11 unpruned) lower area requirement with $1.81 times $ (GoogleNet pruned)-$4.37 times $ (VGG11 unpruned) lower energy per inference across models as compared to conventional digital MAC-based techniques with ~1% accuracy loss relative to the baseline.
神经加速器的数字冯-诺伊曼实现受到高功耗和面积开销的限制,而模拟和非cmos实现则受到噪声、器件不匹配和可靠性问题的困扰。本文介绍了一种基于CMOS查找表(LUT)的神经加速器(LANA)架构,该架构通过预先计算、更快的LUT访问降低了传统数字实现的功耗和面积开销,同时避免了模拟电路的噪声和失配挑战。为了解决传统基于lut的计算的可扩展性问题,我们使用分治(D&C)方法将高精度乘法累积(MAC)操作拆分为较低精度的MAC。与传统的基于LUT (T-LUT)的技术相比,LANA实现了高达29.54 倍的低面积,每个推理任务的能量低3.34 倍;与传统的数字MAC(华莱士树/数组乘法器)相比,LANA实现了高达1.24 倍的低面积,每个推理任务的能量低1.80 倍,而无需重新训练,也不会影响其精度预训练的未修剪模型,以及彩票修剪(LTP)模型,这些模型已经将所需的MAC操作数量减少了高达98%。最后,我们在LANA框架中对所有LTP修剪和未修剪模型(VGG11, VGG19, Resnet18, Resnet34, GoogleNet)引入了混合精度分析,与基于t - lutt的技术相比,实现了高达29.59 times $ (GoogleNet修剪)- 62.83 times $ (VGG11未修剪)的低区域和3.34 times $ (GoogleNet修剪)- 8.1 times $ (VGG11未修剪)的低能量推断。与传统的基于数字mac的技术相比,高达1.24 times $ (GoogleNet修剪)- 2.63 times $ (VGG11未修剪)的面积要求降低了1.81 times $ (GoogleNet修剪)- 4.37 times $ (VGG11未修剪),跨模型的每次推理能量降低了,相对于基线的精度损失约为1%。
{"title":"Look-Up Table-Based Energy-Efficient Architecture for Neural Accelerators (LANA)","authors":"Ovishake Sen;Chukwufumnanya Ogbogu;Peyman Dehghanzadeh;Janardhan Rao Doppa;Swarup Bhunia;Partha Pratim Pande;Baibhab Chatterjee","doi":"10.1109/TCAD.2025.3596535","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3596535","url":null,"abstract":"Digital Von-Neumann implementations of neural accelerators are limited by high power consumption and area overheads, while Analog and non-CMOS implementations suffer from noise, device mismatch, and reliability issues. This article introduces a CMOS look-up table (LUT)-based architecture for neural accelerators (LANA) that reduces the power consumption and area overhead of conventional digital implementations through precomputed, faster LUT access while avoiding noise and mismatch challenges of analog circuits. To solve the scalability issues of conventional LUT-based computation, we use a divide-and-conquer (D&C) approach to split high-precision multiply accumulate (MAC) operations into lower-precision MAC. LANA achieves up to <inline-formula> <tex-math>$29.54 times $ </tex-math></inline-formula> lower area with <inline-formula> <tex-math>$3.34 times $ </tex-math></inline-formula> lower energy per inference task compared to traditional LUT (T-LUT)-based techniques and up to <inline-formula> <tex-math>$1.24 times $ </tex-math></inline-formula> lower area with <inline-formula> <tex-math>$1.80 times $ </tex-math></inline-formula> lower energy per inference task than conventional digital MAC (Wallace tree/array multipliers) without retraining and without affecting the accuracy of pretrained unpruned models, as well as on lottery ticket pruning (LTP) models that already reduce the number of required MAC operations by up to 98%. Finally, we introduce mixed precision analysis in the LANA framework for all LTP pruned and unpruned models (VGG11, VGG19, Resnet18, Resnet34, GoogleNet) that achieved up to <inline-formula> <tex-math>$29.59 times $ </tex-math></inline-formula> (GoogleNet pruned)-<inline-formula> <tex-math>$62.83 times $ </tex-math></inline-formula> (VGG11 unpruned) lower area with <inline-formula> <tex-math>$3.34 times $ </tex-math></inline-formula> (GoogleNet pruned)-<inline-formula> <tex-math>$8.1 times $ </tex-math></inline-formula> (VGG11 unpruned) lower energy per inference than T-LUT-based techniques, and up to <inline-formula> <tex-math>$1.24 times $ </tex-math></inline-formula> (GoogleNet pruned)-<inline-formula> <tex-math>$2.63 times $ </tex-math></inline-formula> (VGG11 unpruned) lower area requirement with <inline-formula> <tex-math>$1.81 times $ </tex-math></inline-formula> (GoogleNet pruned)-<inline-formula> <tex-math>$4.37 times $ </tex-math></inline-formula> (VGG11 unpruned) lower energy per inference across models as compared to conventional digital MAC-based techniques with ~1% accuracy loss relative to the baseline.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"45 3","pages":"1438-1452"},"PeriodicalIF":2.9,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146223791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-06DOI: 10.1109/TCAD.2025.3596538
Shuaibo Huang;Liangji Wu;Yuyang Ye;Hao Yan;Longxing Shi
The conservation core accelerator presents a promising avenue for improving the computational efficiency of specific applications. However, its design space is ultrahigh-dimensional, significantly increasing the exploratory effort required to identify the optimal design across performance, power, and area metrics. Furthermore, the discrete nature of the microarchitecture space renders conventional search methods ineffective. To tackle these challenges, we propose the discrete–continuous transformation to speedup design space exploration, namely, DCTDSE. It can operate in either offline or online mode. In offline mode, it transforms the original discrete design space into a continuous space, builds predictive models, performs parallel gradient-based optimization, and maps the results back to the discrete domain. In the online mode, DCTDSE refines the models by iteratively resampling previously found solutions, thereby enhancing exploration quality while maintaining moderate runtime overhead. Experimental results indicate that DCTDSE achieves a $3.9{times }$ to $40{times }$ speedup over benchmark methods in offline mode. In online mode, it provides a $2.5{times }$ speedup, with a 21% reduction in exploration quality relative to the most accurate comparison method.
{"title":"DCTDSE: A Bimodal Design Space Exploration Flow via Discrete–Continuous Transformation","authors":"Shuaibo Huang;Liangji Wu;Yuyang Ye;Hao Yan;Longxing Shi","doi":"10.1109/TCAD.2025.3596538","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3596538","url":null,"abstract":"The conservation core accelerator presents a promising avenue for improving the computational efficiency of specific applications. However, its design space is ultrahigh-dimensional, significantly increasing the exploratory effort required to identify the optimal design across performance, power, and area metrics. Furthermore, the discrete nature of the microarchitecture space renders conventional search methods ineffective. To tackle these challenges, we propose the discrete–continuous transformation to speedup design space exploration, namely, DCTDSE. It can operate in either offline or online mode. In offline mode, it transforms the original discrete design space into a continuous space, builds predictive models, performs parallel gradient-based optimization, and maps the results back to the discrete domain. In the online mode, DCTDSE refines the models by iteratively resampling previously found solutions, thereby enhancing exploration quality while maintaining moderate runtime overhead. Experimental results indicate that DCTDSE achieves a <inline-formula> <tex-math>$3.9{times }$ </tex-math></inline-formula> to <inline-formula> <tex-math>$40{times }$ </tex-math></inline-formula> speedup over benchmark methods in offline mode. In online mode, it provides a <inline-formula> <tex-math>$2.5{times }$ </tex-math></inline-formula> speedup, with a 21% reduction in exploration quality relative to the most accurate comparison method.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"45 3","pages":"1453-1458"},"PeriodicalIF":2.9,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146223564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-04DOI: 10.1109/TCAD.2025.3595830
Wenyong Zhou;Zhengwu Liu;Yuan Ren;Ngai Wong
compute-in-memory (CIM) accelerators have emerged as a promising way for enhancing the energy efficiency of convolutional neural networks (CNNs). Deploying CNNs on CIM platforms generally requires quantization of network weights and activations to meet hardware constraints. However, existing approaches either prioritize hardware efficiency with binary weight and activation quantization at the cost of accuracy, or utilize multibit weights and activations for greater accuracy but limited efficiency. In this article, we introduce a novel binary weight multibit activation (BWMA) method for CNNs on CIM-based accelerators. Our contributions include: deriving closed-form solutions for weight quantization in each layer, significantly improving the representational capabilities of binarized weights; and developing a differentiable function for activation quantization, approximating the ideal multibit function while bypassing the extensive search for optimal settings. Through comprehensive experiments on CIFAR-10 and ImageNet datasets, we show that BWMA achieves notable accuracy improvements over existing methods, registering gains of 1.44%–5.46% and 0.35%–5.37% on respective datasets. Moreover, hardware simulation results indicate that 4-bit activation quantization strikes the optimal balance between hardware cost and model performance.
{"title":"Binary Weight Multibit Activation Quantization for Compute-in-Memory CNN Accelerators","authors":"Wenyong Zhou;Zhengwu Liu;Yuan Ren;Ngai Wong","doi":"10.1109/TCAD.2025.3595830","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3595830","url":null,"abstract":"compute-in-memory (CIM) accelerators have emerged as a promising way for enhancing the energy efficiency of convolutional neural networks (CNNs). Deploying CNNs on CIM platforms generally requires quantization of network weights and activations to meet hardware constraints. However, existing approaches either prioritize hardware efficiency with binary weight and activation quantization at the cost of accuracy, or utilize multibit weights and activations for greater accuracy but limited efficiency. In this article, we introduce a novel binary weight multibit activation (BWMA) method for CNNs on CIM-based accelerators. Our contributions include: deriving closed-form solutions for weight quantization in each layer, significantly improving the representational capabilities of binarized weights; and developing a differentiable function for activation quantization, approximating the ideal multibit function while bypassing the extensive search for optimal settings. Through comprehensive experiments on CIFAR-10 and ImageNet datasets, we show that BWMA achieves notable accuracy improvements over existing methods, registering gains of 1.44%–5.46% and 0.35%–5.37% on respective datasets. Moreover, hardware simulation results indicate that 4-bit activation quantization strikes the optimal balance between hardware cost and model performance.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"45 3","pages":"1432-1437"},"PeriodicalIF":2.9,"publicationDate":"2025-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146223665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Technology node scaling is challenged in many aspects, including pitch reduction, patterning flexibility, and lithography process variability during manufacturing. Without exception, layout hotspot detection, one of the critical steps to achieving design closure, also requires upgrading the associated techniques. With the rapid development of deep learning techniques, the detector exploiting convolutional neural network (CNN) is superior to ones based on pattern matching and classical machine learning algorithms. However, due to the local nature of CNN, the traditional CNN-based detector fails to model the relationship between the patterns in a large-sized layout, resulting in ignoring the impact of light propagation and some optical effects during photolithography. Even worse, another challenge arises from the fact that engineers cannot fully trust the results of learning model-based detectors, especially when handling some complicated layout patterns in practice. This makes it very difficult to deploy the detectors. Observing the facts, we propose a vision transformer (ViT) model-based layout hotspot detector with a deformed attention mechanism, where the training paradigm is inspired by the large pretrained foundation model (e.g., OpenAI’s GPT-n series) and fine-tuning. Considering the light diffraction during photolithography, the hybrid domain (i.e., spatial and spectral domains) layout inputs via multichannel are leveraged. Besides, our proposed detector integrates a selective option where the model can choose to do prediction or send to engineers based on the misclassification risk level. Experimental results on the ICCAD2012 metal layer benchmarks and ICCAD2020 via layer benchmarks demonstrate the effectiveness and efficiency of our approach. We have made the ICCAD2020 dataset publicly available to support further research in hotspot detection, enable benchmarking across different process nodes and layout types, and facilitate reproducibility in the field. The dataset is accessible at https://github.com/shadowior/ICCAD2020.
{"title":"HyDAS: Hybrid Domain Deformed Attention for Selective Hotspot Detection","authors":"Yuyang Chen;Qi Sun;Su Zheng;Xinyun Zhang;Bei Yu;Hao Geng","doi":"10.1109/TCAD.2025.3587533","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3587533","url":null,"abstract":"Technology node scaling is challenged in many aspects, including pitch reduction, patterning flexibility, and lithography process variability during manufacturing. Without exception, layout hotspot detection, one of the critical steps to achieving design closure, also requires upgrading the associated techniques. With the rapid development of deep learning techniques, the detector exploiting convolutional neural network (CNN) is superior to ones based on pattern matching and classical machine learning algorithms. However, due to the local nature of CNN, the traditional CNN-based detector fails to model the relationship between the patterns in a large-sized layout, resulting in ignoring the impact of light propagation and some optical effects during photolithography. Even worse, another challenge arises from the fact that engineers cannot fully trust the results of learning model-based detectors, especially when handling some complicated layout patterns in practice. This makes it very difficult to deploy the detectors. Observing the facts, we propose a vision transformer (ViT) model-based layout hotspot detector with a deformed attention mechanism, where the training paradigm is inspired by the large pretrained foundation model (e.g., OpenAI’s GPT-n series) and fine-tuning. Considering the light diffraction during photolithography, the hybrid domain (i.e., spatial and spectral domains) layout inputs via multichannel are leveraged. Besides, our proposed detector integrates a selective option where the model can choose to do prediction or send to engineers based on the misclassification risk level. Experimental results on the <monospace>ICCAD2012</monospace> metal layer benchmarks and <monospace>ICCAD2020</monospace> via layer benchmarks demonstrate the effectiveness and efficiency of our approach. We have made the <monospace>ICCAD2020</monospace> dataset publicly available to support further research in hotspot detection, enable benchmarking across different process nodes and layout types, and facilitate reproducibility in the field. The dataset is accessible at <uri>https://github.com/shadowior/ICCAD2020</uri>.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"45 3","pages":"1523-1534"},"PeriodicalIF":2.9,"publicationDate":"2025-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146223567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-28DOI: 10.1109/TCAD.2025.3593205
Chencan Zhou;Yang Cao;Fan Yang;Xiaoqing Wen;Quan Shi;Rong Rong;Aili Yang
The advancement of technology nodes has intensified the focus on mixed-cell-height circuit design, posing challenges to traditional legalization techniques. In this article, we propose a novel and efficient accelerated Newton-based matrix splitting (ANMS) iteration method to address the mixed-cell-height circuit legalization problem. Our approach reformulates this problem into a generalized absolute value equation and leverages matrix splitting and the latest estimate vector to enhance computational efficiency. We also introduce a relaxation variant within the ANMS framework, namely, the accelerated Newton-based successive overrelaxation (ANSOR) method, which is particularly effective in scenarios requiring high computational performance and precise parameter tuning. The proposed method achieves linear computational complexity. Furthermore, we perform an in-depth analysis of the sufficient convergence conditions for the ANMS method and optimize cells that have excessive displacement. Experimental results show that the proposed ANMS method achieves a speedup of $1.09times $ –$4.94times $ compared to state-of-the-art methods, while maintaining the quality of solution. This makes it highly suitable for addressing complex placement design challenges.
{"title":"An Accelerated Newton-Based Matrix Splitting Iteration Method for Mixed-Cell-Height Circuit Legalization","authors":"Chencan Zhou;Yang Cao;Fan Yang;Xiaoqing Wen;Quan Shi;Rong Rong;Aili Yang","doi":"10.1109/TCAD.2025.3593205","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3593205","url":null,"abstract":"The advancement of technology nodes has intensified the focus on mixed-cell-height circuit design, posing challenges to traditional legalization techniques. In this article, we propose a novel and efficient accelerated Newton-based matrix splitting (ANMS) iteration method to address the mixed-cell-height circuit legalization problem. Our approach reformulates this problem into a generalized absolute value equation and leverages matrix splitting and the latest estimate vector to enhance computational efficiency. We also introduce a relaxation variant within the ANMS framework, namely, the accelerated Newton-based successive overrelaxation (ANSOR) method, which is particularly effective in scenarios requiring high computational performance and precise parameter tuning. The proposed method achieves linear computational complexity. Furthermore, we perform an in-depth analysis of the sufficient convergence conditions for the ANMS method and optimize cells that have excessive displacement. Experimental results show that the proposed ANMS method achieves a speedup of <inline-formula> <tex-math>$1.09times $ </tex-math></inline-formula>–<inline-formula> <tex-math>$4.94times $ </tex-math></inline-formula> compared to state-of-the-art methods, while maintaining the quality of solution. This makes it highly suitable for addressing complex placement design challenges.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"45 3","pages":"1535-1548"},"PeriodicalIF":2.9,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146223844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-28DOI: 10.1109/TCAD.2025.3593207
Mohammad Rehan Akhtar;Ritwik Basyas Goswami;Zia Abbas
FinFET, now firmly established in leading VLSI industries for their superior performance, exhibit heightened aging susceptibility that poses significant reliability challenges. The aggressive scaling of technology nodes has further compromised circuit reliability in recent years, highlighting the need for effective aging mitigation techniques. Recent advancements in the miniaturization of nanoscale technology have demonstrated the potential of optimizing performance parameters in standard cells using machine learning (ML) models and optimization algorithms through device sizing modifications. Building on this progress, we propose a methodology for optimizing performance parameters in 16 nm high-performance (HP) FinFET for the first time. The approach leverages a multiobjective optimization algorithm framework to mitigate aging impacts across process, voltage, temperature (PVT) variations while addressing negative bias temperature instability (NBTI) and hot carrier injection (HCI) effects by optimally adjusting FinFET design parameters, including channel length (lg), width (tfin), and height (hfin). With SPICE simulations, time-series datasets were generated to train ML models that achieved an R2 score exceeding 0.99 and a mean absolute percentage error below 1% across standard cells. Our approach yields a significant simulation speedup and a reduction in simulation workload compared to traditional SPICE simulations. Using the proposed optimization algorithm framework, we improved the power-delay product (PDP) by up to 36.97% under nonaging conditions and 34.94% with aging considered with respect to the nominal dimension at the fresh year, demonstrating significant performance gains for FinFET-based standard cells. The experimental results on 12 distinct complex cells validate the aging mitigation across years.
{"title":"RelOps: Reliability Optimization in Standard Cells Across PVT Variations in FinFET Digital Circuits","authors":"Mohammad Rehan Akhtar;Ritwik Basyas Goswami;Zia Abbas","doi":"10.1109/TCAD.2025.3593207","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3593207","url":null,"abstract":"FinFET, now firmly established in leading VLSI industries for their superior performance, exhibit heightened aging susceptibility that poses significant reliability challenges. The aggressive scaling of technology nodes has further compromised circuit reliability in recent years, highlighting the need for effective aging mitigation techniques. Recent advancements in the miniaturization of nanoscale technology have demonstrated the potential of optimizing performance parameters in standard cells using machine learning (ML) models and optimization algorithms through device sizing modifications. Building on this progress, we propose a methodology for optimizing performance parameters in 16 nm high-performance (HP) FinFET for the first time. The approach leverages a multiobjective optimization algorithm framework to mitigate aging impacts across process, voltage, temperature (PVT) variations while addressing negative bias temperature instability (NBTI) and hot carrier injection (HCI) effects by optimally adjusting FinFET design parameters, including channel length (lg), width (tfin), and height (hfin). With SPICE simulations, time-series datasets were generated to train ML models that achieved an R2 score exceeding 0.99 and a mean absolute percentage error below 1% across standard cells. Our approach yields a significant simulation speedup and a reduction in simulation workload compared to traditional SPICE simulations. Using the proposed optimization algorithm framework, we improved the power-delay product (PDP) by up to 36.97% under nonaging conditions and 34.94% with aging considered with respect to the nominal dimension at the fresh year, demonstrating significant performance gains for FinFET-based standard cells. The experimental results on 12 distinct complex cells validate the aging mitigation across years.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"45 3","pages":"1371-1383"},"PeriodicalIF":2.9,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146223850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-25DOI: 10.1109/TCAD.2025.3592582
Yunpeng Yang;Feng Gao;Xiaopeng Yang;Runze Zhang;Zhipeng Li;Hua Xia;Qifeng Li;Xiangyun Ma
Infrared imaging is a valuable technology for gas leakage detection due to its high sensitivity, long detection range, and high efficiency. Conventional target detection methods depend on manually extracting image features, which often leads to limited accuracy, low adaptability, and slow detection speeds. Deep learning technology offers a potential solution to these challenges; however, the increasing depth of neural networks imposes significant computational demands, posing challenges to real-time detection. This article presents a compact and energy-efficient gas detection system, implemented with a ZYNQ platform and an infrared camera. We propose a ZYNQ-based convolution accelerator to enhance gas plume detection from images captured by the infrared camera. Operating at a clock frequency of 130 MHz, the accelerator is capable of reaching a peak performance of 37.44 Gop/s, with power consumption of only 4.12 W. The system achieves a processing speed of 0.235 s per image, enabling real-time gas leakage detection.
{"title":"Gas Leakage Detection Using YOLO Accelerator Based on ZYNQ","authors":"Yunpeng Yang;Feng Gao;Xiaopeng Yang;Runze Zhang;Zhipeng Li;Hua Xia;Qifeng Li;Xiangyun Ma","doi":"10.1109/TCAD.2025.3592582","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3592582","url":null,"abstract":"Infrared imaging is a valuable technology for gas leakage detection due to its high sensitivity, long detection range, and high efficiency. Conventional target detection methods depend on manually extracting image features, which often leads to limited accuracy, low adaptability, and slow detection speeds. Deep learning technology offers a potential solution to these challenges; however, the increasing depth of neural networks imposes significant computational demands, posing challenges to real-time detection. This article presents a compact and energy-efficient gas detection system, implemented with a ZYNQ platform and an infrared camera. We propose a ZYNQ-based convolution accelerator to enhance gas plume detection from images captured by the infrared camera. Operating at a clock frequency of 130 MHz, the accelerator is capable of reaching a peak performance of 37.44 Gop/s, with power consumption of only 4.12 W. The system achieves a processing speed of 0.235 s per image, enabling real-time gas leakage detection.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"45 2","pages":"1021-1027"},"PeriodicalIF":2.9,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146006858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}