Pub Date : 2025-03-10DOI: 10.1109/TCAD.2025.3549703
Michel Takken;Maria Emmerich;Robert Wille
The design of microfluidic devices, i.e., Lab-on-Chips (LoCs) or Micro Total Analysis Systems ($mu $ TASs), is a tedious and cumbersome process with many time-consuming and costly fabrication cycles. Many of these devices contain dissolved species (i.e., solutes) that are required to appear in the system at specific predefined concentrations. The use of simulations can aid the design process of microfluidic devices. However, methods from Computational Fluid Dynamics (CFDs), which are commonly used, are computationally costly and require a lot of time to finish. In this work, we present a simulator for species concentrations in channel-based microfluidic devices that operates on a higher level of abstraction and is multiple orders of magnitude faster than CFD simulation methods. The simulator has been implemented in C++ and is benchmarked against CFD simulations as well as against measured results from experiments on a fabricated device. The results are analyzed and the applicability of the simulator for the simulation of microfluidic devices is assessed.
{"title":"An Abstract Simulator for Species Concentrations in Channel-Based Microfluidic Devices","authors":"Michel Takken;Maria Emmerich;Robert Wille","doi":"10.1109/TCAD.2025.3549703","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3549703","url":null,"abstract":"The design of microfluidic devices, i.e., Lab-on-Chips (LoCs) or Micro Total Analysis Systems (<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>TASs), is a tedious and cumbersome process with many time-consuming and costly fabrication cycles. Many of these devices contain dissolved species (i.e., solutes) that are required to appear in the system at specific predefined concentrations. The use of simulations can aid the design process of microfluidic devices. However, methods from Computational Fluid Dynamics (CFDs), which are commonly used, are computationally costly and require a lot of time to finish. In this work, we present a simulator for species concentrations in channel-based microfluidic devices that operates on a higher level of abstraction and is multiple orders of magnitude faster than CFD simulation methods. The simulator has been implemented in C++ and is benchmarked against CFD simulations as well as against measured results from experiments on a fabricated device. The results are analyzed and the applicability of the simulator for the simulation of microfluidic devices is assessed.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 10","pages":"3764-3775"},"PeriodicalIF":2.9,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10918827","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145090050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-09DOI: 10.1109/TCAD.2025.3568777
Xiangyu Ran;Chuxiong Lin;Yuxuan Qin;Jieyu Li;Ling Yang;Weifeng He
By stacking two data channels between $V_{DD}$ and $V_{SS}$ , charge recycling buses (CRBs) halve the voltage swing on interconnects, achieving significant power savings for energy-efficient on-chip data transmission. However, the middle voltage ($V_{MID}$ ) between the two channels may fluctuate dynamically due to the diversity of input data, which significantly impacts data propagation delay and reliability. Unfortunately, existing SPICE-based simulators do not run fast enough to identify the worst-case $V_{MID}$ fluctuation and propagation delay, posing great challenges in CRB design. In this article, we present a dedicated CRB Simulator for fast and accurate $V_{MID}$ and timing analysis. A highly condensed $V_{MID}$ fluctuation model, which integrates each cycle’s $V_{MID}$ changes into a single closed-form formula, is embedded in the simulator to predict $V_{MID}$ values at clock edges. In addition, a speed-monitoring algorithm is developed to track the continuously changing signal propagation speed under intracycle $V_{MID}$ fluctuations for accurate delay estimation. Both the $V_{MID}$ fluctuation model and the delay estimation algorithm involve only a small number of arithmetic operations, thus featuring remarkably low computational complexity. Compared with HSPICE, our CRB Simulator runs >$1.0times 10^{5} $ times faster on average across various CRB circuits, with a $V_{MID}$ prediction error of only 1.3 mV and a delay estimation error as low as 0.6%. The significant speed improvement combined with high accuracy makes our CRB Simulator an efficient and reliable solution for $V_{MID}$ and timing analysis in CRB circuit design.
{"title":"An Ultraspeed Middle Voltage and Timing Analyzer With Near-SPICE Accuracy for Charge-Recycling Buses","authors":"Xiangyu Ran;Chuxiong Lin;Yuxuan Qin;Jieyu Li;Ling Yang;Weifeng He","doi":"10.1109/TCAD.2025.3568777","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3568777","url":null,"abstract":"By stacking two data channels between <inline-formula> <tex-math>$V_{DD}$ </tex-math></inline-formula> and <inline-formula> <tex-math>$V_{SS}$ </tex-math></inline-formula>, charge recycling buses (CRBs) halve the voltage swing on interconnects, achieving significant power savings for energy-efficient on-chip data transmission. However, the middle voltage (<inline-formula> <tex-math>$V_{MID}$ </tex-math></inline-formula>) between the two channels may fluctuate dynamically due to the diversity of input data, which significantly impacts data propagation delay and reliability. Unfortunately, existing SPICE-based simulators do not run fast enough to identify the worst-case <inline-formula> <tex-math>$V_{MID}$ </tex-math></inline-formula> fluctuation and propagation delay, posing great challenges in CRB design. In this article, we present a dedicated CRB Simulator for fast and accurate <inline-formula> <tex-math>$V_{MID}$ </tex-math></inline-formula> and timing analysis. A highly condensed <inline-formula> <tex-math>$V_{MID}$ </tex-math></inline-formula> fluctuation model, which integrates each cycle’s <inline-formula> <tex-math>$V_{MID}$ </tex-math></inline-formula> changes into a single closed-form formula, is embedded in the simulator to predict <inline-formula> <tex-math>$V_{MID}$ </tex-math></inline-formula> values at clock edges. In addition, a speed-monitoring algorithm is developed to track the continuously changing signal propagation speed under intracycle <inline-formula> <tex-math>$V_{MID}$ </tex-math></inline-formula> fluctuations for accurate delay estimation. Both the <inline-formula> <tex-math>$V_{MID}$ </tex-math></inline-formula> fluctuation model and the delay estimation algorithm involve only a small number of arithmetic operations, thus featuring remarkably low computational complexity. Compared with HSPICE, our CRB Simulator runs ><inline-formula> <tex-math>$1.0times 10^{5} $ </tex-math></inline-formula> times faster on average across various CRB circuits, with a <inline-formula> <tex-math>$V_{MID}$ </tex-math></inline-formula> prediction error of only 1.3 mV and a delay estimation error as low as 0.6%. The significant speed improvement combined with high accuracy makes our CRB Simulator an efficient and reliable solution for <inline-formula> <tex-math>$V_{MID}$ </tex-math></inline-formula> and timing analysis in CRB circuit design.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 12","pages":"4740-4751"},"PeriodicalIF":2.9,"publicationDate":"2025-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145560670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As performance demands continue to rise, shared-memory heterogeneous systems (SMHSs) have been widely adopted for their ability to enable efficient communication and data sharing between different heterogeneous cores. However, existing SMHS face challenges in uneven workload distribution among heterogeneous cores and suboptimal mapping schemes, preventing them from fully leveraging their architectural advantages. To address these issues, this article proposes a mapping-aware framework for modeling SMHSs called MAP-SIM. By performing performance modeling for CPUs and systolic arrays (SAs), and considering rational schemes for the partition and mapping of computational tasks, MAP-SIM aims to evaluate and optimize the computational performance of heterogeneous multicore architectures. The experimental results show that compared to previous work, MAP-SIM can increase simulation speed by 14 to 67 times and can also enhance the computational performance of SMHS by 1.4 to 4.4 times.
{"title":"MAP-SIM: A DNN-Specific Mapping Optimization Framework for Shared-Memory CPU-Systolic Array Architectures","authors":"Yuhang Li;Mei Wen;Junzhong Shen;Zhaoyun Chen;Yang Shi;Tianyu Wang;Zili Shao","doi":"10.1109/TCAD.2025.3568347","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3568347","url":null,"abstract":"As performance demands continue to rise, shared-memory heterogeneous systems (SMHSs) have been widely adopted for their ability to enable efficient communication and data sharing between different heterogeneous cores. However, existing SMHS face challenges in uneven workload distribution among heterogeneous cores and suboptimal mapping schemes, preventing them from fully leveraging their architectural advantages. To address these issues, this article proposes a mapping-aware framework for modeling SMHSs called MAP-SIM. By performing performance modeling for CPUs and systolic arrays (SAs), and considering rational schemes for the partition and mapping of computational tasks, MAP-SIM aims to evaluate and optimize the computational performance of heterogeneous multicore architectures. The experimental results show that compared to previous work, MAP-SIM can increase simulation speed by 14 to 67 times and can also enhance the computational performance of SMHS by 1.4 to 4.4 times.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 12","pages":"4752-4764"},"PeriodicalIF":2.9,"publicationDate":"2025-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145560678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-07DOI: 10.1109/TCAD.2025.3549355
Weiguo Li;Zhipeng Huang;Bei Yu;Wenxing Zhu;Jian Chen;Zhixue He;Xingquan Li
The advancement of modern clock tree synthesis (CTS) encounters a bottleneck, primarily due to the difficulty in achieving multiobjective co-optimization among complex design processes. To concurrently optimize skew, latency, and load capacitance, we propose an iterative and hierarchical CTS framework, which is composed of clustering, topology generation and routing, buffering, and optimization. First, we introduce a capacitance-based metric to achieve adaptive balanced clustering and optimize the cluster results through simulated annealing. Second, to construct a clock tree with lower latency, load capacitance, and skew, we introduce the skew-latency-load tree (SLLT), which combines the advantages of bound skew tree and Steiner shallow-light tree, and we propose an effective SLLT construction algorithm. Third, to further optimize CTS result by buffering, we introduce the critical wirelength evaluation (CWE) to evaluate the capability of each buffer, and propose the insertion delay estimation (IDE) to reduce the evaluation bias during buffering, then design the iterative skew convergence algorithm (ISCA) to achieve complete convergence of skew. We validate our solution using 28 nm process technology. Compared to our method, the commercial tool increases skew, latency, and clock capacitance by 39.5%, 13.0%, and 18.5%, respectively, while the OpenROAD by 101.6%, 50.7%, and 25.5%, respectively.
{"title":"iCTS: Iterative and Hierarchical Clock Tree Synthesis With Skew-Latency-Load Tree","authors":"Weiguo Li;Zhipeng Huang;Bei Yu;Wenxing Zhu;Jian Chen;Zhixue He;Xingquan Li","doi":"10.1109/TCAD.2025.3549355","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3549355","url":null,"abstract":"The advancement of modern clock tree synthesis (CTS) encounters a bottleneck, primarily due to the difficulty in achieving multiobjective co-optimization among complex design processes. To concurrently optimize skew, latency, and load capacitance, we propose an iterative and hierarchical CTS framework, which is composed of clustering, topology generation and routing, buffering, and optimization. First, we introduce a capacitance-based metric to achieve adaptive balanced clustering and optimize the cluster results through simulated annealing. Second, to construct a clock tree with lower latency, load capacitance, and skew, we introduce the skew-latency-load tree (SLLT), which combines the advantages of bound skew tree and Steiner shallow-light tree, and we propose an effective SLLT construction algorithm. Third, to further optimize CTS result by buffering, we introduce the critical wirelength evaluation (CWE) to evaluate the capability of each buffer, and propose the insertion delay estimation (IDE) to reduce the evaluation bias during buffering, then design the iterative skew convergence algorithm (ISCA) to achieve complete convergence of skew. We validate our solution using 28 nm process technology. Compared to our method, the commercial tool increases skew, latency, and clock capacitance by 39.5%, 13.0%, and 18.5%, respectively, while the OpenROAD by 101.6%, 50.7%, and 25.5%, respectively.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 10","pages":"3948-3961"},"PeriodicalIF":2.9,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145090056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-07DOI: 10.1109/TCAD.2025.3549352
Hyojoon Yun;Hyeonchan Lim;Hayoung Lee;Sungho Kang
Logic diagnosis is essential for improving reliability and yield. In conventional diagnosis methods, although various methods are proposed to enhance the accuracy and resolution of logic diagnosis, there are still diagnosis results where the reported locations of defects are incorrect. Particularly in logic circuits, which contain a large number of gates, multiple faults can occur, not just single faults. Since the number of possible cases for multiple faults is significantly greater compared to single faults, the diagnosis of multiple faults is complicated. To address this problem, a new diagnosis method that uses a multistage process with fault candidate reduction is proposed. In the proposed method, machine learning is used with fault candidate reduction, and post-processing is performed after the use of machine learning. This proposed method allows for the analysis of multiple faults using only the test responses for single faults, demonstrating that this method can maintain sufficient accuracy and resolution for unexpected faults.
{"title":"Multistage Enhanced Diagnosis With Fault Candidate Reduction","authors":"Hyojoon Yun;Hyeonchan Lim;Hayoung Lee;Sungho Kang","doi":"10.1109/TCAD.2025.3549352","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3549352","url":null,"abstract":"Logic diagnosis is essential for improving reliability and yield. In conventional diagnosis methods, although various methods are proposed to enhance the accuracy and resolution of logic diagnosis, there are still diagnosis results where the reported locations of defects are incorrect. Particularly in logic circuits, which contain a large number of gates, multiple faults can occur, not just single faults. Since the number of possible cases for multiple faults is significantly greater compared to single faults, the diagnosis of multiple faults is complicated. To address this problem, a new diagnosis method that uses a multistage process with fault candidate reduction is proposed. In the proposed method, machine learning is used with fault candidate reduction, and post-processing is performed after the use of machine learning. This proposed method allows for the analysis of multiple faults using only the test responses for single faults, demonstrating that this method can maintain sufficient accuracy and resolution for unexpected faults.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 9","pages":"3648-3652"},"PeriodicalIF":2.9,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144887633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-07DOI: 10.1109/TCAD.2025.3549353
Suhasini Komarraju;Akhil Tammana;Chandramouli N. Amarnath;Abhijit Chatterjee
Modern analog mixed-signal (AMS) devices manufactured in advanced CMOS processes pose significant testing and post-manufacture tuning challenges. Measurement of the specifications of AMS components is generally difficult as this requires the use of a range of dedicated tests while defect-based testing on the other hand, requires extensive defect simulations that are compute-intensive. To overcome these limitations, this research proposes OATT; a testing and post-manufacture tuning approach for AMS circuits that is designed to stress the performance of the device under test (DUT), formalize a statistical (multidimensional Gaussian) distribution of the expected response of known “good” devices (inliers), and use test limits grounded in theoretical statistics to classify all out-of-distribution devices (outliers) as “bad.” It is an alternative test approach in that it does not explicitly target simulation of defect mechanisms. Tuning is performed to transform individual outlier DUT responses to those resembling inlier devices by modulating hardware tuning knobs, such as bias voltages and currents, using a reinforcement learning algorithm. Circuit simulations and hardware results demonstrate the viability and efficiency of the proposed approach.
{"title":"OATT: Outlier-Oriented Alternative Testing and Post-Manufacture Tuning of Analog/Mixed-Signal Circuits","authors":"Suhasini Komarraju;Akhil Tammana;Chandramouli N. Amarnath;Abhijit Chatterjee","doi":"10.1109/TCAD.2025.3549353","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3549353","url":null,"abstract":"Modern analog mixed-signal (AMS) devices manufactured in advanced CMOS processes pose significant testing and post-manufacture tuning challenges. Measurement of the specifications of AMS components is generally difficult as this requires the use of a range of dedicated tests while defect-based testing on the other hand, requires extensive defect simulations that are compute-intensive. To overcome these limitations, this research proposes OATT; a testing and post-manufacture tuning approach for AMS circuits that is designed to stress the performance of the device under test (DUT), formalize a statistical (multidimensional Gaussian) distribution of the expected response of known “good” devices (inliers), and use test limits grounded in theoretical statistics to classify all out-of-distribution devices (outliers) as “bad.” It is an alternative test approach in that it does not explicitly target simulation of defect mechanisms. Tuning is performed to transform individual outlier DUT responses to those resembling inlier devices by modulating hardware tuning knobs, such as bias voltages and currents, using a reinforcement learning algorithm. Circuit simulations and hardware results demonstrate the viability and efficiency of the proposed approach.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 10","pages":"3668-3682"},"PeriodicalIF":2.9,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145090260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-07DOI: 10.1109/TCAD.2025.3549354
Simon Hofmann;Marcel Walter;Robert Wille
As conventional computing technologies approach their physical limits, the quest for increased computational power intensifies, heightening interest in post-CMOS technologies. Among these, Field-coupled Nanocomputing (FCN), which operates through the repulsion of physical fields at the nanoscale, emerges as a promising alternative. However, realizing specific functionalities within this technology necessitates the development of dedicated FCN physical design methods. Although various methods have been proposed, their reliance on heuristic approaches often results in suboptimal quality, highlighting a significant opportunity for enhancement. In the realm of conventional CMOS design, post-layout optimization techniques are employed to capitalize on this potential, yet such methods for FCN are either not scalable or lack efficiency. This work bridges this gap by introducing the first scalable and efficient post-layout optimization algorithm for FCN. Experimental evaluations demonstrate the efficiency of this approach: when applied to layouts obtained by a state-of-the-art heuristic method, the proposed post-layout optimization achieves area reductions of up to $ {mathrm {73.75~%}}~({mathrm {45.58~%}}$ on average). This significant improvement underscores the transformative potential of post-layout optimization in FCN. Moreover, unlike existing algorithms, the method exhibits scalability even in optimizing layouts with over 20 million tiles. Implementations of the proposed methods are publicly available as part of the Munich Nanotech Toolkit (MNT) at https://github.com/cda-tum/fiction.
{"title":"Efficient and Scalable Post-Layout Optimization for Field-Coupled Nanotechnologies","authors":"Simon Hofmann;Marcel Walter;Robert Wille","doi":"10.1109/TCAD.2025.3549354","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3549354","url":null,"abstract":"As conventional computing technologies approach their physical limits, the quest for increased computational power intensifies, heightening interest in post-CMOS technologies. Among these, Field-coupled Nanocomputing (FCN), which operates through the repulsion of physical fields at the nanoscale, emerges as a promising alternative. However, realizing specific functionalities within this technology necessitates the development of dedicated FCN physical design methods. Although various methods have been proposed, their reliance on heuristic approaches often results in suboptimal quality, highlighting a significant opportunity for enhancement. In the realm of conventional CMOS design, post-layout optimization techniques are employed to capitalize on this potential, yet such methods for FCN are either not scalable or lack efficiency. This work bridges this gap by introducing the first scalable and efficient post-layout optimization algorithm for FCN. Experimental evaluations demonstrate the efficiency of this approach: when applied to layouts obtained by a state-of-the-art heuristic method, the proposed post-layout optimization achieves area reductions of up to <inline-formula> <tex-math>$ {mathrm {73.75~%}}~({mathrm {45.58~%}}$ </tex-math></inline-formula> on average). This significant improvement underscores the transformative potential of post-layout optimization in FCN. Moreover, unlike existing algorithms, the method exhibits scalability even in optimizing layouts with over 20 million tiles. Implementations of the proposed methods are publicly available as part of the Munich Nanotech Toolkit (MNT) at <uri>https://github.com/cda-tum/fiction</uri>.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 10","pages":"3790-3803"},"PeriodicalIF":2.9,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10916761","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145090258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-07DOI: 10.1109/TCAD.2025.3567883
Rui Li;Lin Li;Heng Yu;Masahiro Fujita;Weixiong Jiang;Yajun Ha
Formal verification of large-scale optimized integer multipliers remains a critical yet insufficiently addressed challenge in industry and academia. Current methods employ reference multiplier generators to automatically construct structurally similar reference multipliers, which are then used by satisfiability (SAT)-based techniques to verify equivalence with optimized multipliers. However, these approaches face limitations when generating references for large-scale optimized multipliers within acceptable timeframes. To address these limitations, we introduce the RefSCAT-2.0 framework, designed to rapidly produce high-quality large-scale reference multipliers. First, we generate the macro-architecture to determine the number of adders required for constructing the reference multiplier. We propose a novel integer linear programming (ILP)-based macro-architecture generation algorithm that minimizes the number of allocated adders, thereby reducing the overall problem complexity. Second, we organize the allocated adders into groups to simplify the subsequent generation process. We present a multilevel scheduler that automatically decomposes adders into groups with minimized interdependencies, ensuring both the quality of generation and a reduction in overall generation complexity. Third, we generate the micro-architecture for each scheduled group, wherein we finalize the connections between adders. We present a graph-based design space representation coupled with a quantum-inspired ant colony optimization (QACO)-based generation algorithm that can efficiently explores the micro-architectures of each scheduled group. Experimental results show that RefSCAT-2.0 successfully verifies all 124 cases in a 256-bit optimized multiplier benchmark suite, outperforming SCA-based and hybrid methods which solve only 24 cases each.
{"title":"RefSCAT-2.0: Formal Verification of Large-Scale Optimized Multipliers via Quantum-Inspired Ant Colony Optimization-Based Reference Generation","authors":"Rui Li;Lin Li;Heng Yu;Masahiro Fujita;Weixiong Jiang;Yajun Ha","doi":"10.1109/TCAD.2025.3567883","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3567883","url":null,"abstract":"Formal verification of large-scale optimized integer multipliers remains a critical yet insufficiently addressed challenge in industry and academia. Current methods employ reference multiplier generators to automatically construct structurally similar reference multipliers, which are then used by satisfiability (SAT)-based techniques to verify equivalence with optimized multipliers. However, these approaches face limitations when generating references for large-scale optimized multipliers within acceptable timeframes. To address these limitations, we introduce the RefSCAT-2.0 framework, designed to rapidly produce high-quality large-scale reference multipliers. First, we generate the macro-architecture to determine the number of adders required for constructing the reference multiplier. We propose a novel integer linear programming (ILP)-based macro-architecture generation algorithm that minimizes the number of allocated adders, thereby reducing the overall problem complexity. Second, we organize the allocated adders into groups to simplify the subsequent generation process. We present a multilevel scheduler that automatically decomposes adders into groups with minimized interdependencies, ensuring both the quality of generation and a reduction in overall generation complexity. Third, we generate the micro-architecture for each scheduled group, wherein we finalize the connections between adders. We present a graph-based design space representation coupled with a quantum-inspired ant colony optimization (QACO)-based generation algorithm that can efficiently explores the micro-architectures of each scheduled group. Experimental results show that RefSCAT-2.0 successfully verifies all 124 cases in a 256-bit optimized multiplier benchmark suite, outperforming SCA-based and hybrid methods which solve only 24 cases each.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 12","pages":"4828-4841"},"PeriodicalIF":2.9,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145560672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-06DOI: 10.1109/TCAD.2025.3567550
Irith Pomeranz
Defects that are manifested during functional operation, also referred to as functionally possible, have been shown to be responsible for the occurrence of silent data corruption (SDC) in large datacenters. The defects may have occurred because of high workloads that speed up the process of chip aging. This motivated the focus of earlier works on functionally possible faults in sites that are subjected to high functional switching activities. The functional switching activity in earlier works was based on transitions. Pulses were not considered in this context. This article defines the notion of a hazard-based functional switching activity that captures the conditions for pulses to occur during functional operation. It then revisits the hazard-based detection conditions for transition faults, under which faults are activated using pulses instead of transitions. This article describes a procedure that selects target faults that may be susceptible to aging because of pulses, and a test generation procedure for the target faults. Experimental results for benchmark circuits demonstrate the potential importance of considering hazard-based faults. The results also demonstrate that only small numbers of tests need to be added to a conventional transition fault test set for the selected hazard-based transition faults.
{"title":"Hazard-Based Functionally Possible Transition Faults With High Functional Switching Activities","authors":"Irith Pomeranz","doi":"10.1109/TCAD.2025.3567550","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3567550","url":null,"abstract":"Defects that are manifested during functional operation, also referred to as functionally possible, have been shown to be responsible for the occurrence of silent data corruption (SDC) in large datacenters. The defects may have occurred because of high workloads that speed up the process of chip aging. This motivated the focus of earlier works on functionally possible faults in sites that are subjected to high functional switching activities. The functional switching activity in earlier works was based on transitions. Pulses were not considered in this context. This article defines the notion of a hazard-based functional switching activity that captures the conditions for pulses to occur during functional operation. It then revisits the hazard-based detection conditions for transition faults, under which faults are activated using pulses instead of transitions. This article describes a procedure that selects target faults that may be susceptible to aging because of pulses, and a test generation procedure for the target faults. Experimental results for benchmark circuits demonstrate the potential importance of considering hazard-based faults. The results also demonstrate that only small numbers of tests need to be added to a conventional transition fault test set for the selected hazard-based transition faults.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 12","pages":"4818-4827"},"PeriodicalIF":2.9,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145560726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-05DOI: 10.1109/TCAD.2025.3567012
Andrea Calabrese;Stefano Quer;Giovanni Squillero
The in-circuit test checks whether the board’s electrical and electronic components have been correctly soldered when producing printed circuit boards. When such a test is performed using a flying-probe tester, the cost of testing is mainly related to the time required for moving probes over the board and the time necessary for defining such movements, tuning the optimization on the number of devices that will eventually be tested. Since the 2000s, flying probe testing has been gaining popularity. Still, despite its industrial relevance, the research has been impaired by the lack of publicly available benchmarks for testing the new algorithms and comparing the different ideas. This article presents an open test set of realistic boards, ranging from a few thousand to half a million test points, together with a tool for generating more samples. It also presents an optimizer for flying probe tests composed of two separate planners: one global detecting test that could be performed together and reordered to obtain a more efficient probing sequence, and one local, implementing the probe movements and taking care of specific board features. The test set will eventually be used to present a quantitative evaluation of the performance of the proposed approach.
{"title":"Flying-Probe Testing: A Trajectory Planner and a Benchmark Suite","authors":"Andrea Calabrese;Stefano Quer;Giovanni Squillero","doi":"10.1109/TCAD.2025.3567012","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3567012","url":null,"abstract":"The in-circuit test checks whether the board’s electrical and electronic components have been correctly soldered when producing printed circuit boards. When such a test is performed using a flying-probe tester, the cost of testing is mainly related to the time required for moving probes over the board and the time necessary for defining such movements, tuning the optimization on the number of devices that will eventually be tested. Since the 2000s, flying probe testing has been gaining popularity. Still, despite its industrial relevance, the research has been impaired by the lack of publicly available benchmarks for testing the new algorithms and comparing the different ideas. This article presents an open test set of realistic boards, ranging from a few thousand to half a million test points, together with a tool for generating more samples. It also presents an optimizer for flying probe tests composed of two separate planners: one global detecting test that could be performed together and reordered to obtain a more efficient probing sequence, and one local, implementing the probe movements and taking care of specific board features. The test set will eventually be used to present a quantitative evaluation of the performance of the proposed approach.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 12","pages":"4807-4817"},"PeriodicalIF":2.9,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145560727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}