2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)最新文献_第5页

Time-Division Multiplexing Based System-Level FPGA Routing 基于系统级FPGA路由的时分多路复用

2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643558

Wei-Kai Liu, Ming-Hung Chen, Chia-Ming Chang, Chen-Chia Chang, Yao-Wen Chang

Multi-FPGA system prototyping has become popular for modern VLSI logic verification, but such a system realization is often limited by its number of inter-FPGA connections. As a result, time-division multiplexing (TDM) is employed to accommodate more inter-FPGA signals than the connections in a multi-FPGA system. However, the inter-FPGA signal delay induced by TDM becomes significant due to time-multiplexing. Researchers have shown that TDM ratios (signal time-multiplexing ratios) significantly affect the performance of a multi-FPGA system and inter-FPGA routing highly influences the quality of this system. This paper presents a framework to minimize the system clock period for a system-level FPGA while considering the inter-FPGA routing topology and the timing criticality of nets. Our framework consists of two stages: (1) a distributed profiling scheme to generate the desired net-ordering and then alleviate the routing congestion, and (2) a net-/edge-based refinement to assign TDM ratios efficiently with a strict decrease in the ratios. Based on the 2019 CAD contest at ICCAD benchmarks and the contest evaluation metric with both quality and efficiency, experimental results show that our framework achieves the best overall score among all the participating teams and published works.

多fpga系统原型设计已成为现代VLSI逻辑验证的主流，但这种系统实现往往受到fpga间连接数量的限制。因此，采用时分复用(TDM)来容纳比多fpga系统中的连接更多的fpga间信号。然而，时分复用导致的fpga间信号延迟变得非常明显。研究表明，TDM比率(信号时间复用比率)对多fpga系统的性能有显著影响，fpga间路由对系统质量有很大影响。本文在考虑FPGA间路由拓扑和网络时序临界性的情况下，提出了一个最小化系统级FPGA系统时钟周期的框架。我们的框架由两个阶段组成:(1)一个分布式分析方案，以生成所需的网络排序，然后缓解路由拥塞;(2)一个基于网络/边缘的改进，以严格降低比率，有效地分配TDM比率。基于ICCAD基准的2019年CAD竞赛和竞赛质量和效率的评价指标，实验结果表明，我们的框架在所有参赛团队和发表作品中获得了最好的总分。

{"title":"Time-Division Multiplexing Based System-Level FPGA Routing","authors":"Wei-Kai Liu, Ming-Hung Chen, Chia-Ming Chang, Chen-Chia Chang, Yao-Wen Chang","doi":"10.1109/ICCAD51958.2021.9643558","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643558","url":null,"abstract":"Multi-FPGA system prototyping has become popular for modern VLSI logic verification, but such a system realization is often limited by its number of inter-FPGA connections. As a result, time-division multiplexing (TDM) is employed to accommodate more inter-FPGA signals than the connections in a multi-FPGA system. However, the inter-FPGA signal delay induced by TDM becomes significant due to time-multiplexing. Researchers have shown that TDM ratios (signal time-multiplexing ratios) significantly affect the performance of a multi-FPGA system and inter-FPGA routing highly influences the quality of this system. This paper presents a framework to minimize the system clock period for a system-level FPGA while considering the inter-FPGA routing topology and the timing criticality of nets. Our framework consists of two stages: (1) a distributed profiling scheme to generate the desired net-ordering and then alleviate the routing congestion, and (2) a net-/edge-based refinement to assign TDM ratios efficiently with a strict decrease in the ratios. Based on the 2019 CAD contest at ICCAD benchmarks and the contest evaluation metric with both quality and efficiency, experimental results show that our framework achieves the best overall score among all the participating teams and published works.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"176 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127241030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

DAPA: A Dataflow-Aware Analytical Placement Algorithm for Modern Mixed-Size Circuit Designs DAPA:现代混合尺寸电路设计的数据流感知分析布局算法

2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643441

Jai-Ming Lin, Weikang Huang, Yao-Chieh Chen, Yi-Ting Wang, Po-Wen Wang

This article presents an analytical-based placement algorithm to handle dataflow constraint for mixed-size circuits. To quickly obtain a better placement at an early stage, engineers often reference dataflow of a design to determine the relative locations of cells and macros. To achieve this target, this paper presents two methods to make a placement follow this constraint. First, we give larger weights to those nets which connect to datapath-oriented objects in the beginning, and then gradually shrink the values by the modified Gompertz curve according to the status of placement utilization in order to shorten their distances without interfering with object distribution. Second, we define desirable placement regions for each datapath-oriented object and propose a novel sigmoid function to give additional penalties to these objects in the analytical placement formulation if they are not in the regions. The experiment demonstrates that our methodology can obtain better results than the other approach which does not consider dataflow constraint. Not only wirelength but also routability will be improved in the resulting placement. Furthermore, our placer outperforms the RTL-aware dataflow-driven macro placer.

本文提出了一种基于解析的布局算法来处理混合大小电路的数据流约束。为了在早期阶段快速获得更好的位置，工程师通常会参考设计的数据流来确定单元和宏的相对位置。为了实现这一目标，本文提出了两种方法，使放置符合这一约束。首先，我们对一开始连接到面向数据路径的目标的网络赋予较大的权重，然后根据布局利用情况，通过改进的Gompertz曲线逐渐缩小这些权重，从而在不影响目标分布的情况下缩短它们的距离。其次，我们为每个面向数据路径的对象定义了理想的放置区域，并提出了一个新的sigmoid函数，如果这些对象不在该区域内，则在分析放置公式中对这些对象给予额外的惩罚。实验表明，与不考虑数据流约束的其他方法相比，我们的方法可以获得更好的结果。在这样的布局中，不仅可以提高无线网络的长度，还可以提高可达性。此外，我们的放砂器优于rtl感知数据流驱动的宏放砂器。

{"title":"DAPA: A Dataflow-Aware Analytical Placement Algorithm for Modern Mixed-Size Circuit Designs","authors":"Jai-Ming Lin, Weikang Huang, Yao-Chieh Chen, Yi-Ting Wang, Po-Wen Wang","doi":"10.1109/ICCAD51958.2021.9643441","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643441","url":null,"abstract":"This article presents an analytical-based placement algorithm to handle dataflow constraint for mixed-size circuits. To quickly obtain a better placement at an early stage, engineers often reference dataflow of a design to determine the relative locations of cells and macros. To achieve this target, this paper presents two methods to make a placement follow this constraint. First, we give larger weights to those nets which connect to datapath-oriented objects in the beginning, and then gradually shrink the values by the modified Gompertz curve according to the status of placement utilization in order to shorten their distances without interfering with object distribution. Second, we define desirable placement regions for each datapath-oriented object and propose a novel sigmoid function to give additional penalties to these objects in the analytical placement formulation if they are not in the regions. The experiment demonstrates that our methodology can obtain better results than the other approach which does not consider dataflow constraint. Not only wirelength but also routability will be improved in the resulting placement. Furthermore, our placer outperforms the RTL-aware dataflow-driven macro placer.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123759291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Heuristics for Million-scale Two-level Logic Minimization 百万尺度两级逻辑最小化的启发式算法

2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643572

M. Nazemi, Hitarth Kanakia, M. Pedram

Existing two-level logic minimization methods suffer from scalability problems, i.e. they cannot handle the optimization of Boolean functions with more than about 50k or so product terms. However, applications have arisen that produce Boolean functions with hundreds of thousands to millions of minterms. To ameliorate the aforesaid scalability problem, this work presents a suite of heuristics that enables exact or approximate two-level logic minimization of such large Boolean functions by employing a divide and conquer technique. All proposed heuristics first deploy a decision tree to iteratively partition the original specification of a given Boolean function. Next, they apply one of different leaf optimization techniques (e.g., those based on support vector machines or error budgets) to each leaf node of the tree, and, finally, they merge the locally optimized leaves at the root of the tree to perform one round of the global optimization. We show that our support vector machine-based heuristic compresses Boolean functions with 300,000 minterms by a factor of about 100 (i.e. 3,000 cubes in the optimized function), and achieves 98% accuracy. Similarly, our error-budget-driven heuristic compresses a Boolean function with about 3,000,000 minterms by a factor of 1,273, and achieves 95 % accuracy while it only takes 67 seconds to complete the whole optimization process. This is a significant improvement compared to well-known two-level logic minimization tools such as ESPRESSO-II and BOOM, which fail to optimize the same Boolean functions even after running for a few days.

现有的两级逻辑最小化方法存在可扩展性问题，即它们无法处理超过50k左右乘积项的布尔函数的优化。然而，已经出现了产生具有数十万到数百万个最小项的布尔函数的应用程序。为了改善上述可伸缩性问题，本工作提出了一套启发式方法，通过采用分而治之技术，可以精确或近似地实现这种大型布尔函数的两级逻辑最小化。所有提出的启发式方法首先部署决策树来迭代划分给定布尔函数的原始规范。接下来，他们将一种不同的叶子优化技术(例如，基于支持向量机或误差预算的技术)应用于树的每个叶子节点，最后，他们将树的根处的局部优化的叶子合并，以执行一轮全局优化。我们表明，我们基于支持向量机的启发式算法将具有300,000分钟项的布尔函数压缩了大约100倍(即优化函数中有3,000个立方体)，并达到98%的准确率。类似地，我们的错误预算驱动的启发式算法将一个大约有3,000,000分钟项的布尔函数压缩了1,273倍，并且在只需要67秒完成整个优化过程的情况下达到95%的准确率。与众所周知的两级逻辑最小化工具(如ESPRESSO-II和BOOM)相比，这是一个显著的改进，这些工具即使在运行几天后也无法优化相同的布尔函数。

{"title":"Heuristics for Million-scale Two-level Logic Minimization","authors":"M. Nazemi, Hitarth Kanakia, M. Pedram","doi":"10.1109/ICCAD51958.2021.9643572","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643572","url":null,"abstract":"Existing two-level logic minimization methods suffer from scalability problems, i.e. they cannot handle the optimization of Boolean functions with more than about 50k or so product terms. However, applications have arisen that produce Boolean functions with hundreds of thousands to millions of minterms. To ameliorate the aforesaid scalability problem, this work presents a suite of heuristics that enables exact or approximate two-level logic minimization of such large Boolean functions by employing a divide and conquer technique. All proposed heuristics first deploy a decision tree to iteratively partition the original specification of a given Boolean function. Next, they apply one of different leaf optimization techniques (e.g., those based on support vector machines or error budgets) to each leaf node of the tree, and, finally, they merge the locally optimized leaves at the root of the tree to perform one round of the global optimization. We show that our support vector machine-based heuristic compresses Boolean functions with 300,000 minterms by a factor of about 100 (i.e. 3,000 cubes in the optimized function), and achieves 98% accuracy. Similarly, our error-budget-driven heuristic compresses a Boolean function with about 3,000,000 minterms by a factor of 1,273, and achieves 95 % accuracy while it only takes 67 seconds to complete the whole optimization process. This is a significant improvement compared to well-known two-level logic minimization tools such as ESPRESSO-II and BOOM, which fail to optimize the same Boolean functions even after running for a few days.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126333222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

BeGAN: Power Grid Benchmark Generation Using a Process-portable GAN-based Methodology 开始:使用过程可移植的基于gan的方法生成电网基准

2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643566

Vidya A. Chhabria, K. Kunal, Masoud Zabihi, S. Sapatnekar

Evaluating CAD solutions to physical implementation problems has been extremely challenging due to the unavailability of modern benchmarks in the public domain. This work aims to address this challenge by proposing a process-portable machine learning (ML)-based methodology for synthesizing synthetic power delivery network (PDN) benchmarks that obfuscate intellectual property information. In particular, the proposed approach leverages generative adversarial networks (GAN) and transfer learning techniques to create realistic PDN benchmarks from a small set of available real circuit data. BeGAN generates thousands of PDN benchmarks with significant histogram correlation (p-value ≤ 0.05) demonstrating its realism and an average L1 Norm of more than 7.1 %, highlighting its IP obfuscation capabilities. The original and thousands of ML-generated synthetic PDN benchmarks for four different open-source technologies are released in the public domain to advance research in this field.

由于在公共领域没有可用的现代基准，评估针对物理实现问题的CAD解决方案非常具有挑战性。这项工作旨在通过提出一种基于过程便携式机器学习(ML)的方法来解决这一挑战，该方法用于合成电力输送网络(PDN)基准，从而混淆知识产权信息。特别是，所提出的方法利用生成对抗网络(GAN)和迁移学习技术，从一小部分可用的真实电路数据中创建现实的PDN基准。begin生成了数千个具有显著直方图相关性(p值≤0.05)的PDN基准，证明了它的真实感和平均L1规范超过7.1%，突出了它的IP混淆能力。在公共领域发布了针对四种不同开源技术的原始和数千个ml生成的合成PDN基准，以推进该领域的研究。

引用次数: 0

OpenSAR: An Open Source Automated End-to-end SAR ADC Compiler OpenSAR:一个开源的自动化端到端SAR ADC编译器

2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643494

Mingjie Liu, Xiyuan Tang, Keren Zhu, Hao Chen, Nan Sun, D. Pan

Despite recent developments in automated analog sizing and analog layout generation, there is doubt whether analog design automation techniques could scale to system-level designs. On the other hand, analog designs are considered major roadblocks for open source hardware with limited available design automation tools. In this work, we present OpenSAR, the first open source automated end-to-end successive approximation register (SAR) analog-to-digital converter (ADC) compiler. OpenSAR only requires system performance specifications as the minimal input and outputs DRC and LVS clean layouts. Compared with prior work, we leverage automated placement and routing to generate analog building blocks, removing the need to design layout templates or libraries. We optimize the redundant non-binary capacitor digital-to-analog converter (CDAC) array design for yield considerations with a template-based layout generator that interleaves capacitor rows and columns to reduce process gradient mismatch. Post layout simulations demonstrate that the generated prototype designs achieve state-of-the-art resolution, speed, and energy efficiency.

尽管最近在自动化模拟尺寸和模拟布局生成方面有了发展，但模拟设计自动化技术是否可以扩展到系统级设计仍然存在疑问。另一方面，由于可用的设计自动化工具有限，模拟设计被认为是开源硬件的主要障碍。在这项工作中，我们提出了OpenSAR，第一个开源的自动化端到端逐次逼近寄存器(SAR)模数转换器(ADC)编译器。OpenSAR只需要系统性能规范作为最小输入和输出DRC和LVS干净的布局。与以前的工作相比，我们利用自动放置和路由来生成模拟构建块，从而消除了设计布局模板或库的需要。我们优化冗余非二进制电容数模转换器(CDAC)阵列设计，考虑良率因素，采用基于模板的布局生成器，交错电容器行和列，以减少工艺梯度不匹配。后布局仿真表明，生成的原型设计实现了最先进的分辨率，速度和能源效率。

{"title":"OpenSAR: An Open Source Automated End-to-end SAR ADC Compiler","authors":"Mingjie Liu, Xiyuan Tang, Keren Zhu, Hao Chen, Nan Sun, D. Pan","doi":"10.1109/ICCAD51958.2021.9643494","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643494","url":null,"abstract":"Despite recent developments in automated analog sizing and analog layout generation, there is doubt whether analog design automation techniques could scale to system-level designs. On the other hand, analog designs are considered major roadblocks for open source hardware with limited available design automation tools. In this work, we present OpenSAR, the first open source automated end-to-end successive approximation register (SAR) analog-to-digital converter (ADC) compiler. OpenSAR only requires system performance specifications as the minimal input and outputs DRC and LVS clean layouts. Compared with prior work, we leverage automated placement and routing to generate analog building blocks, removing the need to design layout templates or libraries. We optimize the redundant non-binary capacitor digital-to-analog converter (CDAC) array design for yield considerations with a template-based layout generator that interleaves capacitor rows and columns to reduce process gradient mismatch. Post layout simulations demonstrate that the generated prototype designs achieve state-of-the-art resolution, speed, and energy efficiency.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131866471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Routability-driven Global Placer Target on Removing Global and Local Congestion for VLSI Designs 基于可达性驱动的超大规模集成电路设计中全局和局部拥塞消除方法研究

2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643544

Jai-Ming Lin, Chung-Wei Huang, Liang-Chi Zane, Min-Chia Tsai, Che-Li Lin, Chen-Fa Tsai

Cell placement remains a big challenge in the modern VLSI design especially in routability. Routing overflow may come from global and local routing congestion in a placement. To target on resolving these problems, this paper proposes two techniques in a global placement algorithm based on an analytical placement formulation and the multilevel framework. To remove global routing congestion, we consider each net as a movable soft module and propose a novel congestion-aware net penalty model so that a net will receive a larger penalty if it covers more routing congested regions. Therefore, our placement formulation can be more easier to move nets away from routing congested regions than other approaches and has less impact on wirelength. In addition, to relieve local congestion, we propose an inflation technique to expand the area of a cluster according to its internal connectivity intensity and routing congestion occupied by the cluster. The experimental results demonstrate that our approaches can get better routability and wirelength compared to other approaches such as NTUplace4h, NTUplace4dr, and RePlAce.

单元放置仍然是现代VLSI设计的一大挑战，特别是在可达性方面。路由溢出可能来自全局和本地路由拥塞。为了解决这些问题，本文提出了两种基于解析布局公式和多层框架的全局布局算法。为了消除全局路由拥塞，我们将每个网络视为一个可移动的软模块，并提出了一种新的拥塞感知网络惩罚模型，使得网络覆盖更多的路由拥塞区域将获得更大的惩罚。因此，与其他方法相比，我们的布局方案可以更容易地将网络从路由拥塞区域移开，并且对无线长度的影响较小。此外，为了缓解局部拥塞，我们提出了一种膨胀技术，根据集群内部的连接强度和集群占用的路由拥塞来扩大集群的面积。实验结果表明，与NTUplace4h、NTUplace4dr和RePlAce等方法相比，我们的方法可以获得更好的路由可达性和无线长度。

{"title":"Routability-driven Global Placer Target on Removing Global and Local Congestion for VLSI Designs","authors":"Jai-Ming Lin, Chung-Wei Huang, Liang-Chi Zane, Min-Chia Tsai, Che-Li Lin, Chen-Fa Tsai","doi":"10.1109/ICCAD51958.2021.9643544","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643544","url":null,"abstract":"Cell placement remains a big challenge in the modern VLSI design especially in routability. Routing overflow may come from global and local routing congestion in a placement. To target on resolving these problems, this paper proposes two techniques in a global placement algorithm based on an analytical placement formulation and the multilevel framework. To remove global routing congestion, we consider each net as a movable soft module and propose a novel congestion-aware net penalty model so that a net will receive a larger penalty if it covers more routing congested regions. Therefore, our placement formulation can be more easier to move nets away from routing congested regions than other approaches and has less impact on wirelength. In addition, to relieve local congestion, we propose an inflation technique to expand the area of a cluster according to its internal connectivity intensity and routing congestion occupied by the cluster. The experimental results demonstrate that our approaches can get better routability and wirelength compared to other approaches such as NTUplace4h, NTUplace4dr, and RePlAce.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131725392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Doomed Run Prediction in Physical Design by Exploiting Sequential Flow and Graph Learning 利用顺序流和图学习的物理设计中注定运行预测

2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643435

Yi-Chen Lu, S. Nath, Vishal Khandelwal, S. Lim

Modern designs are increasingly reliant on physical design (PD) tools to derive full technology scaling benefits of Moore's Law. Designers often perform power, performance, and area (PPA) exploration through parallel PD runs with different tool configurations. Efficient exploration of PPA is mission-critical for chip designers who are working with stringent time-to-market constraints and finite compute resources. Therefore, a framework that can accurately predict a “doomed run” (i.e., will not meet the PPA targets) at early phases of the PD flow can provide a significant productivity boost by enabling early termination of such runs. Multiple QoR metrics can be leveraged to classify successful or doomed PD runs. In this paper, we specifically focus on the aspect of timing, where our goal is to identify the PD runs that cannot achieve end-of-flow timing results by predicting the post-route total negative slack (TNS) values in early PD phases. To achieve our goal, we develop an end-to-end machine learning (ML) framework that performs TNS prediction by modeling PD implementation as a sequential flow. Particularly, our framework leverages graph neural networks (GNNs) to encode netlist graphs extracted from various PD phases, and utilize long short-term memory (LSTM) networks to perform sequential modeling based on the GNN-encoded features. Experimental results on seven industrial designs with 5:2 train/test split ratio demonstrate that our framework predicts post-route TNS values in high fidelity within 5.2% normalized root mean squared error (NRMSE) in early design stages (e.g., placement, CTS) on the two validation designs that are unseen during training.

现代设计越来越依赖于物理设计(PD)工具来获得摩尔定律的全面技术扩展优势。设计人员通常通过使用不同工具配置的并行PD运行来进行功率、性能和面积(PPA)探索。对于面临严格的上市时间限制和有限的计算资源的芯片设计人员来说，高效地探索PPA至关重要。因此，能够在PD流的早期阶段准确预测“注定要失败的运行”(即，将不满足PPA目标)的框架可以通过支持早期终止此类运行来显著提高生产率。可以利用多个QoR指标对成功或失败的PD运行进行分类。在本文中，我们特别关注时序方面，我们的目标是通过预测PD早期阶段的路径后总负松弛(TNS)值来识别无法实现流末时序结果的PD运行。为了实现我们的目标，我们开发了一个端到端机器学习(ML)框架，该框架通过将PD实现建模为顺序流来执行TNS预测。特别是，我们的框架利用图神经网络(gnn)来编码从各个PD阶段提取的网表图，并利用长短期记忆(LSTM)网络来执行基于gnn编码特征的顺序建模。在七个训练/测试分割比为5:2的工业设计上的实验结果表明，我们的框架在训练期间未见的两个验证设计的早期设计阶段(例如，放置，CTS)以5.2%的标准化均方根误差(NRMSE)高保真度预测路线后TNS值。

{"title":"Doomed Run Prediction in Physical Design by Exploiting Sequential Flow and Graph Learning","authors":"Yi-Chen Lu, S. Nath, Vishal Khandelwal, S. Lim","doi":"10.1109/ICCAD51958.2021.9643435","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643435","url":null,"abstract":"Modern designs are increasingly reliant on physical design (PD) tools to derive full technology scaling benefits of Moore's Law. Designers often perform power, performance, and area (PPA) exploration through parallel PD runs with different tool configurations. Efficient exploration of PPA is mission-critical for chip designers who are working with stringent time-to-market constraints and finite compute resources. Therefore, a framework that can accurately predict a “doomed run” (i.e., will not meet the PPA targets) at early phases of the PD flow can provide a significant productivity boost by enabling early termination of such runs. Multiple QoR metrics can be leveraged to classify successful or doomed PD runs. In this paper, we specifically focus on the aspect of timing, where our goal is to identify the PD runs that cannot achieve end-of-flow timing results by predicting the post-route total negative slack (TNS) values in early PD phases. To achieve our goal, we develop an end-to-end machine learning (ML) framework that performs TNS prediction by modeling PD implementation as a sequential flow. Particularly, our framework leverages graph neural networks (GNNs) to encode netlist graphs extracted from various PD phases, and utilize long short-term memory (LSTM) networks to perform sequential modeling based on the GNN-encoded features. Experimental results on seven industrial designs with 5:2 train/test split ratio demonstrate that our framework predicts post-route TNS values in high fidelity within 5.2% normalized root mean squared error (NRMSE) in early design stages (e.g., placement, CTS) on the two validation designs that are unseen during training.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131737454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Compatible Equivalence Checking of X-Valued Circuits x值电路的相容等价性检验

2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643515

Yu-Neng Wang, Yun-Rong Luo, Po-Chun Chien, Ping-Lun Wang, Hao-Ren Wang, Wan-Hsuan Lin, J. H. Jiang, Chung-Yang Huang

The X-value arises in various contexts of system design. It often represents an unknown value or a don't-care value depending on the application. Verification of X-valued circuits is a crucial task but relatively unaddressed. The challenge of equivalence checking for X-valued circuits, named compatible equivalence checking, is posed in the 2020 ICCAD CAD Contest. In this paper, we present our winning method based on X-value preserving dual-rail encoding and incremental identification of compatible equivalence relation. Experimental results demonstrate the effectiveness of the proposed techniques and the outperformance of our approach in solving more cases than the commercial tool and the other teams among the top 3 of the contest.

x值出现在系统设计的各种环境中。根据应用程序的不同，它通常表示未知值或不关心的值。验证x值电路是一个关键的任务，但相对尚未解决。在2020年ICCAD CAD竞赛中提出了x值电路等效性检验的挑战，即兼容等效性检验。本文提出了一种基于x值保持双轨编码和相容等价关系增量识别的获胜方法。实验结果证明了所提出的技术的有效性，并且我们的方法在解决更多案例方面的表现优于商业工具和比赛前3名中的其他团队。

引用次数: 0

Reliable Memristor-based Neuromorphic Design Using Variation- and Defect-Aware Training 使用变异和缺陷感知训练可靠的基于忆阻器的神经形态设计

2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643468

Di Gao, Grace Li Zhang, Xunzhao Yin, Bing Li, Ulf Schlichtmann, Cheng Zhuo

The memristor crossbar provides a unique opportunity to develop a neuromorphic computing system (NCS) with high scalability and energy efficiency. However, the reliability issues that arise from the immature fabrication process and physical device limitations, i.e., variations and stuck-at-faults (SAF), dramatically prevent its wide application in practice. Specifically, variations make the programmed weights deviate from their expected values. On the other hand, defective mem-ristors cannot even represent the weights effectively. In this work, we propose a variation- and defect-aware framework to improve the reliability of memristor-based NCS while minimizing the inference performance loss. We propose to develop analytical weight models to characterize the non-ideal effects of variations and SAFs, which can then be incorporated into a Bayesian neural network as priori and constraint. We then convert the reliability improvement to the neural network training for optimal weights that can accommodate variations and defects across the chips, which does not require computation-intensive retraining or cost-expensive testing. Extensive experimental results with the proposed framework confirm its effective capability of improving the reliability of NCS, while significantly mitigating the inference accuracy degradation under even severe variations and SAFs.

忆阻交叉棒为开发具有高可扩展性和高能效的神经形态计算系统(NCS)提供了独特的机会。然而，由于不成熟的制造工艺和物理设备的限制，如变化和故障卡滞(SAF)，可靠性问题极大地阻碍了其在实际中的广泛应用。具体地说，变化使程序权重偏离其期望值。另一方面，缺陷的memr甚至不能有效地表示权值。在这项工作中，我们提出了一个变化和缺陷感知框架，以提高基于忆阻器的NCS的可靠性，同时最大限度地减少推理性能损失。我们建议开发分析权模型来表征变异和SAFs的非理想影响，然后将其作为先验和约束纳入贝叶斯神经网络。然后，我们将可靠性改进转换为神经网络训练，以获得可以适应芯片变化和缺陷的最优权重，这不需要计算密集型的再训练或成本昂贵的测试。大量的实验结果证实了该框架能够有效地提高NCS的可靠性，同时显著地减轻了在严重变化和af下推理精度的下降。

{"title":"Reliable Memristor-based Neuromorphic Design Using Variation- and Defect-Aware Training","authors":"Di Gao, Grace Li Zhang, Xunzhao Yin, Bing Li, Ulf Schlichtmann, Cheng Zhuo","doi":"10.1109/ICCAD51958.2021.9643468","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643468","url":null,"abstract":"The memristor crossbar provides a unique opportunity to develop a neuromorphic computing system (NCS) with high scalability and energy efficiency. However, the reliability issues that arise from the immature fabrication process and physical device limitations, i.e., variations and stuck-at-faults (SAF), dramatically prevent its wide application in practice. Specifically, variations make the programmed weights deviate from their expected values. On the other hand, defective mem-ristors cannot even represent the weights effectively. In this work, we propose a variation- and defect-aware framework to improve the reliability of memristor-based NCS while minimizing the inference performance loss. We propose to develop analytical weight models to characterize the non-ideal effects of variations and SAFs, which can then be incorporated into a Bayesian neural network as priori and constraint. We then convert the reliability improvement to the neural network training for optimal weights that can accommodate variations and defects across the chips, which does not require computation-intensive retraining or cost-expensive testing. Extensive experimental results with the proposed framework confirm its effective capability of improving the reliability of NCS, while significantly mitigating the inference accuracy degradation under even severe variations and SAFs.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130794707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

A Convergence Monitoring Method for DNN Training of On-Device Task Adaptation 一种设备上任务自适应DNN训练的收敛监测方法

2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643522

Seungkyu Choi, Jaekang Shin, L. Kim

DNN training has become a major workload in on-device situations to execute various vision tasks with high performance. Accordingly, training architectures accompanying approximate computing have been steadily studied for efficient acceleration. However, most of the works examine their scheme on from-the-scratch training where inaccurate computing is not tolerable. Moreover, previous solutions are mostly provided as an extended version of the inference works, e.g., sparsity/pruning, quantization, dataflow, etc. Therefore, unresolved issues in practical workloads that hinder the total speed of the DNN training process remain still. In this work, with targeting the transfer learning-based task adaptation of the practical on-device training workload, we propose a convergence monitoring method to resolve the redundancy in massive training iterations. By utilizing the network's output value, we detect the training intensity of incoming tasks and monitor the prediction convergence with the given intensity to provide early-exits in the scheduled training iteration. As a result, an accurate approximation over various tasks is performed with minimal overhead. Unlike the sparsity-driven approximation, our method enables runtime optimization and can be easily applicable to off-the-shelf accelerators achieving significant speedup. Evaluation results on various datasets show a geomean of $2.2times$ speedup over baseline and $1.8times$ speedup over the latest convergence-related training method.

深度神经网络训练已经成为设备上执行各种高性能视觉任务的主要工作量。因此，伴随近似计算的训练体系结构一直在稳步研究，以获得有效的加速。然而，大多数作品都是在从头开始的训练中检查他们的方案，其中不准确的计算是不可容忍的。此外，以前的解决方案大多是作为推理工作的扩展版本提供的，例如，稀疏/修剪，量化，数据流等。因此，在实际工作负载中，阻碍DNN训练过程总速度的未解决问题仍然存在。本文针对基于迁移学习的任务适应实际设备上训练工作量的问题，提出了一种收敛监测方法来解决大规模训练迭代中的冗余问题。利用网络的输出值检测输入任务的训练强度，并在给定强度下监测预测收敛性，从而在计划的训练迭代中提供早期退出。因此，可以以最小的开销对各种任务进行精确的近似。与稀疏驱动的近似不同，我们的方法支持运行时优化，可以很容易地应用于实现显著加速的现成加速器。在各种数据集上的评估结果显示，与基线相比，加速速度提高了2.2倍，与最新的收敛相关训练方法相比，加速速度提高了1.8倍。

{"title":"A Convergence Monitoring Method for DNN Training of On-Device Task Adaptation","authors":"Seungkyu Choi, Jaekang Shin, L. Kim","doi":"10.1109/ICCAD51958.2021.9643522","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643522","url":null,"abstract":"DNN training has become a major workload in on-device situations to execute various vision tasks with high performance. Accordingly, training architectures accompanying approximate computing have been steadily studied for efficient acceleration. However, most of the works examine their scheme on from-the-scratch training where inaccurate computing is not tolerable. Moreover, previous solutions are mostly provided as an extended version of the inference works, e.g., sparsity/pruning, quantization, dataflow, etc. Therefore, unresolved issues in practical workloads that hinder the total speed of the DNN training process remain still. In this work, with targeting the transfer learning-based task adaptation of the practical on-device training workload, we propose a convergence monitoring method to resolve the redundancy in massive training iterations. By utilizing the network's output value, we detect the training intensity of incoming tasks and monitor the prediction convergence with the given intensity to provide early-exits in the scheduled training iteration. As a result, an accurate approximation over various tasks is performed with minimal overhead. Unlike the sparsity-driven approximation, our method enables runtime optimization and can be easily applicable to off-the-shelf accelerators achieving significant speedup. Evaluation results on various datasets show a geomean of $2.2times$ speedup over baseline and $1.8times$ speedup over the latest convergence-related training method.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130766808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1