2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)最新文献_第7页

Bounded Model Checking of Speculative Non-Interference 推测不干扰的有界模型检验

2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643462

Emmanuel Pescosta, Georg Weissenbacher, Florian Zuleger

Spectre, a hardware vulnerability that breaks the isolation between applications, has received ample attention in recent years. Spectre-style attacks exploit speculative execution to leak information through micro-architectural side-channels, breaking down abstractions software developers relied on for decades. As these attacks are based on fundamental optimization techniques present in most modern micro-processors, salvation seems to lie in software-based countermeasures for now. Comprehensive software mitigation, however, has proved to be an exceptionally challenging task with ample of room for failure. To support the automated analysis of mitigation attempts, we present a technique that relies on Bounded Model Checking to detect violations of non-interference in speculative executions. Since off-the-shelf software model checking tools are nescient of micro-architectural state, we base our effort on an operational semantics of speculative executions of micro-assembly code. Our semantics is parameterized with micro-architectural components (such as the cache or the branch predictor), allowing for precise models of various side-channels. We evaluate our approach on widely used benchmark instances, report the detection of a zeroday vulnerability in the Linux kernel, and demonstrate that our approach is more exhaustive than symbolic simulation (with comparable computational effort).

Spectre是一种破坏应用程序之间隔离的硬件漏洞，近年来受到了广泛关注。幽灵式攻击利用推测性执行，通过微架构侧通道泄露信息，破坏软件开发人员几十年来所依赖的抽象。由于这些攻击基于大多数现代微处理器中存在的基本优化技术，因此目前的解决方案似乎在于基于软件的对策。然而，全面的软件缓解已被证明是一项极具挑战性的任务，有足够的失败空间。为了支持对缓解尝试的自动分析，我们提出了一种依赖于有界模型检查(Bounded Model Checking)的技术，以检测推测执行中违反非干扰的情况。由于现成的软件模型检查工具不了解微体系结构状态，我们的工作基于微汇编代码推测执行的操作语义。我们的语义是用微体系结构组件(如缓存或分支预测器)参数化的，允许各种侧通道的精确模型。我们在广泛使用的基准实例上评估了我们的方法，报告了在Linux内核中检测到的零日漏洞，并证明了我们的方法比符号模拟更详尽(计算工作量相当)。

{"title":"Bounded Model Checking of Speculative Non-Interference","authors":"Emmanuel Pescosta, Georg Weissenbacher, Florian Zuleger","doi":"10.1109/ICCAD51958.2021.9643462","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643462","url":null,"abstract":"Spectre, a hardware vulnerability that breaks the isolation between applications, has received ample attention in recent years. Spectre-style attacks exploit speculative execution to leak information through micro-architectural side-channels, breaking down abstractions software developers relied on for decades. As these attacks are based on fundamental optimization techniques present in most modern micro-processors, salvation seems to lie in software-based countermeasures for now. Comprehensive software mitigation, however, has proved to be an exceptionally challenging task with ample of room for failure. To support the automated analysis of mitigation attempts, we present a technique that relies on Bounded Model Checking to detect violations of non-interference in speculative executions. Since off-the-shelf software model checking tools are nescient of micro-architectural state, we base our effort on an operational semantics of speculative executions of micro-assembly code. Our semantics is parameterized with micro-architectural components (such as the cache or the branch predictor), allowing for precise models of various side-channels. We evaluate our approach on widely used benchmark instances, report the detection of a zeroday vulnerability in the Linux kernel, and demonstrate that our approach is more exhaustive than symbolic simulation (with comparable computational effort).","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126460258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

LoopBreaker: Disabling Interconnects to Mitigate Voltage-Based Attacks in Multi-Tenant FPGAs 环路断路器:禁用互连以减轻多租户fpga中的基于电压的攻击

2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643485

Hassan Nassar, Hanna AlZughbi, Dennis R. E. Gnad, L. Bauer, M. Tahoori, J. Henkel

FPGAs are being offered in the cloud as accelerator resources that can be shared among multiple users (i.e. tenants). Recently, various approaches have shown that fault attacks launched from one tenant region to another are possible, leading to timing faults or crashes of the FPGA. It is, therefore, important that malicious tenants are limited in their ability to cause such security problems. So far, the existing countermeasures against such attacks check the configuration bitstreams before they are reconfigured. Such offline approaches have various practical limitations, e.g. they may force the tenants to unveil their design secrets. In this paper, we present LoopBreaker, a novel runtime solution that can disable the entire activity of a malicious tenant region, in order to rapidly stop a potential attack before it results in a crash (i.e. Denial-of-Service). We implemented and tested multiple attack types and found that realistic attacks demand at least 12–26 µs to be successful. A partial reconfiguration to overwrite the malicious tenant region demands 200 µs in our realworld implementation, which is too slow to prevent the attack from leading to a crash. Instead, our proposed LoopBreaker method only needs 1.5 µs to stop a malicious tenant, which makes it the first online approach that can successfully stop challenging voltage drop-based attacks from causing a crash.

fpga作为加速器资源在云端提供，可以在多个用户(即租户)之间共享。最近，各种方法表明，从一个租户区域到另一个租户区域的故障攻击是可能的，导致FPGA的定时故障或崩溃。因此，限制恶意租户造成此类安全问题的能力是很重要的。到目前为止，针对此类攻击的现有对策都是在重新配置比特流之前检查配置比特流。这种离线方式有各种实际限制，例如，它们可能迫使租户揭开他们的设计秘密。在本文中，我们提出了LoopBreaker，一种新的运行时解决方案，可以禁用恶意租户区域的整个活动，以便在导致崩溃(即拒绝服务)之前快速阻止潜在的攻击。我们实施并测试了多种攻击类型，发现实际攻击至少需要12-26µs才能成功。在我们的现实世界实现中，覆盖恶意租户区域的部分重新配置需要200µs，这太慢了，无法防止攻击导致崩溃。相反，我们提出的LoopBreaker方法只需要1.5µs就可以阻止恶意租户，这使得它成为第一个可以成功阻止具有挑战性的基于电压降的攻击而导致崩溃的在线方法。

{"title":"LoopBreaker: Disabling Interconnects to Mitigate Voltage-Based Attacks in Multi-Tenant FPGAs","authors":"Hassan Nassar, Hanna AlZughbi, Dennis R. E. Gnad, L. Bauer, M. Tahoori, J. Henkel","doi":"10.1109/ICCAD51958.2021.9643485","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643485","url":null,"abstract":"FPGAs are being offered in the cloud as accelerator resources that can be shared among multiple users (i.e. tenants). Recently, various approaches have shown that fault attacks launched from one tenant region to another are possible, leading to timing faults or crashes of the FPGA. It is, therefore, important that malicious tenants are limited in their ability to cause such security problems. So far, the existing countermeasures against such attacks check the configuration bitstreams before they are reconfigured. Such offline approaches have various practical limitations, e.g. they may force the tenants to unveil their design secrets. In this paper, we present LoopBreaker, a novel runtime solution that can disable the entire activity of a malicious tenant region, in order to rapidly stop a potential attack before it results in a crash (i.e. Denial-of-Service). We implemented and tested multiple attack types and found that realistic attacks demand at least 12–26 µs to be successful. A partial reconfiguration to overwrite the malicious tenant region demands 200 µs in our realworld implementation, which is too slow to prevent the attack from leading to a crash. Instead, our proposed LoopBreaker method only needs 1.5 µs to stop a malicious tenant, which makes it the first online approach that can successfully stop challenging voltage drop-based attacks from causing a crash.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125294802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

ToPro: A Topology Projector and Waveguide Router for Wavelength-Routed Optical Networks-on-Chip 芯片上波长路由光网络的拓扑投影仪和波导路由器

2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643451

Zhidan Zheng, Mengchu Li, Tsun-Ming Tseng, Ulf Schlichtmann

To meet the ever-increasing requirements of on-chip communication, the trend is towards wavelength-routed optical networks-on-chip (WRONoCs), which support high-speed communication with low power. A typical WRONoC design flow consists of two consecutive steps: topological design and physical design. Current physical design tools interpret the input topology as a pure logic scheme and perform placement and routing for all network components from scratch. Due to the large design complexity and the layout constraints, additional waveguide crossings in the synthesized layouts are hardly avoidable, which results in an increase in insertion loss and crosstalk noise and thus degrades the network performance. In this work, we propose a physical design tool, ToPro, which retains the interconnection among the optical switching elements by projecting the structure of a WRONoC topology onto the physical plane, and focuses on the waveguide routing to the IP-cores. To avoid the increase in insertion loss and crosstalk noise, ToPro removes the extra crossings and long detours of waveguides by changing the routing order of nets. The experimental results demonstrate the superiority of ToPro in time- and energy-efficiency. For example, compared to a state-of-the-art design automation tool, ToPro synthesizes a network with 16 IP-cores with a 17% reduction on the worst-case insertion loss and decreases the synthesis time from more than six days to less than one second.

为了满足日益增长的片上通信需求，支持低功耗高速通信的波长路由片上光网络(WRONoCs)是发展的趋势。典型的WRONoC设计流程包括两个连续的步骤:拓扑设计和物理设计。当前的物理设计工具将输入拓扑解释为纯逻辑方案，并从头开始执行所有网络组件的放置和路由。由于大的设计复杂性和布局限制，在合成布局中难以避免额外的波导交叉，这会导致插入损耗和串扰噪声的增加，从而降低网络性能。在这项工作中，我们提出了一个物理设计工具，ToPro，它通过将WRONoC拓扑结构投影到物理平面上来保持光交换元件之间的互连，并专注于到ip核的波导路由。为了避免插入损耗和串扰噪声的增加，ToPro通过改变网络的路由顺序来消除波导的额外交叉和长弯路。实验结果证明了ToPro在时间效率和能量效率方面的优势。例如，与最先进的设计自动化工具相比，ToPro合成了一个包含16个ip核的网络，将最坏情况下的插入损失减少了17%，并将合成时间从6天多减少到不到1秒。

{"title":"ToPro: A Topology Projector and Waveguide Router for Wavelength-Routed Optical Networks-on-Chip","authors":"Zhidan Zheng, Mengchu Li, Tsun-Ming Tseng, Ulf Schlichtmann","doi":"10.1109/ICCAD51958.2021.9643451","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643451","url":null,"abstract":"To meet the ever-increasing requirements of on-chip communication, the trend is towards wavelength-routed optical networks-on-chip (WRONoCs), which support high-speed communication with low power. A typical WRONoC design flow consists of two consecutive steps: topological design and physical design. Current physical design tools interpret the input topology as a pure logic scheme and perform placement and routing for all network components from scratch. Due to the large design complexity and the layout constraints, additional waveguide crossings in the synthesized layouts are hardly avoidable, which results in an increase in insertion loss and crosstalk noise and thus degrades the network performance. In this work, we propose a physical design tool, ToPro, which retains the interconnection among the optical switching elements by projecting the structure of a WRONoC topology onto the physical plane, and focuses on the waveguide routing to the IP-cores. To avoid the increase in insertion loss and crosstalk noise, ToPro removes the extra crossings and long detours of waveguides by changing the routing order of nets. The experimental results demonstrate the superiority of ToPro in time- and energy-efficiency. For example, compared to a state-of-the-art design automation tool, ToPro synthesizes a network with 16 IP-cores with a 17% reduction on the worst-case insertion loss and decreases the synthesis time from more than six days to less than one second.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"121 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114167510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Quarry: Quantization-based ADC Reduction for ReRAM-based Deep Neural Network Accelerators 基于reram的深度神经网络加速器的量化ADC缩减

2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643502

Azat Azamat, Faaiz Asim, Jongeun Lee

ReRAM (Resistive Random-Access Memory) crossbar arrays have the potential to provide extremely fast and low-cost DNN (Deep Neural Network) acceleration. However, peripheral circuits, in particular ADCs (Analog-Digital Converters), can be a large overhead and/or slow down the operation considerably. In this paper we propose to use advanced quantization techniques to reduce the ADC overhead of ReRAM crossbar arrays. Our method does not require any hardware change but can reduce the overhead of ADC greatly. Our methodology is also general, having no restriction in terms of DNN type (binarized or multi-bit) or ReRAM crossbar array size. Our experimental results using ResNet on ImageNet dataset demonstrate that our method can reduce the size of ADC by 32× compared with ISAAC at very little accuracy loss of 0.24%p.

ReRAM(电阻随机存取存储器)交叉棒阵列具有提供极快和低成本DNN(深度神经网络)加速的潜力。然而，外围电路，特别是adc(模数转换器)，可能是一个很大的开销和/或相当慢的操作。在本文中，我们建议使用先进的量化技术来减少ReRAM交叉棒阵列的ADC开销。我们的方法不需要任何硬件改变，但可以大大减少ADC的开销。我们的方法也是通用的，在DNN类型(二值化或多比特)或ReRAM交叉条数组大小方面没有限制。我们在ImageNet数据集上使用ResNet进行的实验结果表明，与ISAAC相比，我们的方法可以将ADC的大小减少32倍，精度损失极小，为0.24%p。

引用次数: 10

FL-DISCO: Federated Generative Adversarial Network for Graph-based Molecule Drug Discovery: Special Session Paper 基于图的分子药物发现的联邦生成对抗网络:特别会议论文

2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643440

Daniel Manu, Yi Sheng, Junhuan Yang, Jieren Deng, Tong Geng, Ang Li, Caiwen Ding, Weiwen Jiang, Lei Yang

The outbreak of the global COVID-19 pandemic emphasizes the importance of collaborative drug discovery for high effectiveness; however, due to the stringent data regulation, data privacy becomes an imminent issue needing to be addressed to enable collaborative drug discovery. In addition to the data privacy issue, the efficiency of drug discovery is another key objective since infectious diseases spread exponentially and effectively conducting drug discovery could save lives. Advanced Artificial Intelligence (AI) techniques are promising to solve these problems: (1) Federated Learning (FL) is born to keep data privacy while learning data from distributed clients; (2) graph neural network (GNN) can extract structural properties of molecules whose underlying architecture is the connected atoms; and (3) generative adversarial network (GAN) can generate novel molecules while retaining the properties learned from the training data. In this work, we make the first attempt to build a holistic collaborative and privacy-preserving FL framework, namely FL-DISCO, which integrates GAN and GNN to generate molecular graphs. Experimental results demonstrate the effectiveness of FL-DISCO on: (1) IID data for ESOL and QM9, where FL-DISCO can generate highly novel compounds with high drug-likeliness, uniqueness and LogP scores compared to the baseline; (2) non-IID data for ESOL and QM9, where FL-DISCO generates 100% novel compounds with high validity and LogP scores compared to the baseline. We also demonstrate how different fractions of clients, generator and discriminator architectures affect our evaluation scores.

2019冠状病毒病(COVID-19)全球大流行的爆发凸显了协同研发药物对高效的重要性;然而，由于严格的数据监管，数据隐私成为一个迫切需要解决的问题，以实现协同药物发现。除了数据隐私问题外，药物发现的效率是另一个关键目标，因为传染病呈指数级传播，有效地进行药物发现可以挽救生命。先进的人工智能(AI)技术有望解决这些问题:(1)联邦学习(FL)的诞生是为了在从分布式客户端学习数据时保护数据隐私;(2)图神经网络(GNN)可以提取以连接原子为底层结构的分子的结构性质;(3)生成式对抗网络(GAN)可以生成新的分子，同时保留从训练数据中学习到的特性。在这项工作中，我们首次尝试构建一个整体协作和隐私保护的FL框架，即FL- disco，它集成了GAN和GNN来生成分子图。实验结果表明，FL-DISCO在ESOL和QM9的IID数据上的有效性:(1)与基线相比，FL-DISCO可以生成具有高药物可能性、唯一性和LogP分数的高度新颖的化合物;(2) ESOL和QM9的非iid数据，其中FL-DISCO产生100%的新化合物，与基线相比具有较高的效度和LogP评分。我们还演示了客户端、生成器和鉴别器架构的不同部分如何影响我们的评估分数。

{"title":"FL-DISCO: Federated Generative Adversarial Network for Graph-based Molecule Drug Discovery: Special Session Paper","authors":"Daniel Manu, Yi Sheng, Junhuan Yang, Jieren Deng, Tong Geng, Ang Li, Caiwen Ding, Weiwen Jiang, Lei Yang","doi":"10.1109/ICCAD51958.2021.9643440","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643440","url":null,"abstract":"The outbreak of the global COVID-19 pandemic emphasizes the importance of collaborative drug discovery for high effectiveness; however, due to the stringent data regulation, data privacy becomes an imminent issue needing to be addressed to enable collaborative drug discovery. In addition to the data privacy issue, the efficiency of drug discovery is another key objective since infectious diseases spread exponentially and effectively conducting drug discovery could save lives. Advanced Artificial Intelligence (AI) techniques are promising to solve these problems: (1) Federated Learning (FL) is born to keep data privacy while learning data from distributed clients; (2) graph neural network (GNN) can extract structural properties of molecules whose underlying architecture is the connected atoms; and (3) generative adversarial network (GAN) can generate novel molecules while retaining the properties learned from the training data. In this work, we make the first attempt to build a holistic collaborative and privacy-preserving FL framework, namely FL-DISCO, which integrates GAN and GNN to generate molecular graphs. Experimental results demonstrate the effectiveness of FL-DISCO on: (1) IID data for ESOL and QM9, where FL-DISCO can generate highly novel compounds with high drug-likeliness, uniqueness and LogP scores compared to the baseline; (2) non-IID data for ESOL and QM9, where FL-DISCO generates 100% novel compounds with high validity and LogP scores compared to the baseline. We also demonstrate how different fractions of clients, generator and discriminator architectures affect our evaluation scores.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123543736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

An OCV-Aware Clock Tree Synthesis Methodology 一种ocv感知时钟树合成方法

2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643585

Necati Uysal, Rickard Ewetz

Closing timing after clock tree synthesis (CTS) is very challenging in the presence of on-chip variations (OCVs). State-of-the-art design flows first synthesize an initial clock tree that contains timing violations introduced by OCVs. Next, aggressive clock tree optimization (CTO) is applied to eliminate the timing violations. Unfortunately, it may be impossible to eliminate all violations given the structure of the initial clock tree. In this paper, we propose an OCV-aware clock tree synthesis methodology that aims to rethink how to account for OCVs. The key idea is to predict the impact of OCVs early in the synthesis process, which allows the variations to be compensated for using non-uniform safety margins. This results in a synthesis flow that is almost correct-by-design. In contrast, state-of-the-art design flows often have an unpredictable success rate because the OCVs are considered too late in the synthesis process. Concretely, this is achieved by top-down constructing a virtual clock tree that is refined bottom-up into a real clock tree implementation. To balance the quality of results (QoR) and runtime, multiple top-level tree topologies are enumerated and pruned in the synthesis process. Compared with the CTO based approach, the experimental results demonstrate that the proposed methodology reduces the total negative slack (TNS) and worst negative slack (WNS) with 90% and 75%, respectively.

时钟树合成后的关闭时序(CTS)在片上变化(ocv)的存在下是非常具有挑战性的。最先进的设计流首先合成一个初始时钟树，其中包含由ocv引入的时间冲突。其次，采用主动时钟树优化(CTO)来消除时间冲突。不幸的是，给定初始时钟树的结构，可能不可能消除所有违规。在本文中，我们提出了一种ocv感知时钟树合成方法，旨在重新思考如何考虑ocv。关键思想是在合成过程的早期预测ocv的影响，这允许使用不均匀的安全裕度来补偿变化。这导致合成流几乎是设计正确的。相比之下，最先进的设计流程通常具有不可预测的成功率，因为ocv在合成过程中被认为太晚了。具体来说，这是通过自顶向下构造一个虚拟时钟树来实现的，该虚拟时钟树由底向上细化为一个真实的时钟树实现。为了平衡结果质量(QoR)和运行时，在合成过程中枚举和修剪多个顶级树拓扑。实验结果表明，与基于CTO的方法相比，该方法可将总负松弛(TNS)和最坏负松弛(WNS)分别降低90%和75%。

{"title":"An OCV-Aware Clock Tree Synthesis Methodology","authors":"Necati Uysal, Rickard Ewetz","doi":"10.1109/ICCAD51958.2021.9643585","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643585","url":null,"abstract":"Closing timing after clock tree synthesis (CTS) is very challenging in the presence of on-chip variations (OCVs). State-of-the-art design flows first synthesize an initial clock tree that contains timing violations introduced by OCVs. Next, aggressive clock tree optimization (CTO) is applied to eliminate the timing violations. Unfortunately, it may be impossible to eliminate all violations given the structure of the initial clock tree. In this paper, we propose an OCV-aware clock tree synthesis methodology that aims to rethink how to account for OCVs. The key idea is to predict the impact of OCVs early in the synthesis process, which allows the variations to be compensated for using non-uniform safety margins. This results in a synthesis flow that is almost correct-by-design. In contrast, state-of-the-art design flows often have an unpredictable success rate because the OCVs are considered too late in the synthesis process. Concretely, this is achieved by top-down constructing a virtual clock tree that is refined bottom-up into a real clock tree implementation. To balance the quality of results (QoR) and runtime, multiple top-level tree topologies are enumerated and pruned in the synthesis process. Compared with the CTO based approach, the experimental results demonstrate that the proposed methodology reduces the total negative slack (TNS) and worst negative slack (WNS) with 90% and 75%, respectively.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124829213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

From Specification to Silicon: Towards Analog/Mixed-Signal Design Automation using Surrogate NN Models with Transfer Learning 从规格到硅:使用迁移学习代理神经网络模型实现模拟/混合信号设计自动化

2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643445

Juzheng Liu, Shiyu Su, Meghna Madhusudan, Mohsen Hassanpourghadi, Samuel Saunders, Qiaochu Zhang, Rezwan A. Rasul, Yaguang Li, Jiang Hu, A. Sharma, S. Sapatnekar, R. Harjani, Anthony Levi, S. Gupta, M. Chen

We propose a complete analog mixed-signal circuit design flow from specification to silicon with minimum human-in-the-loop interaction, and verify the flow in a 12nm FinFET CMOS process. The flow consists of three key elements: neural network (NN) modeling of the parameterized circuit component, a search algorithm based on NN models to determine its sizing, and layout automation. To reduce the required training data for NN model creation, we utilize transfer learning to improve the NN accuracy from a relatively small amount of post-layout/silicon data. To prove the concept, we use a voltage-controlled oscillator (VCO) as a test vehicle and demonstrate that our design methodology can accurately model the circuit and generate designs with a wide range of specifications. We show that circuit sizing based on the transfer learned NN model from silicon measurement data yields the most accurate results.

我们提出了一个完整的模拟混合信号电路设计流程，从规格到硅，以最小的人机交互，并在12nm FinFET CMOS工艺中验证了该流程。该流程包括三个关键要素:参数化电路元件的神经网络建模，基于神经网络模型的搜索算法确定其尺寸，以及布局自动化。为了减少神经网络模型创建所需的训练数据，我们利用迁移学习从相对少量的布局后/硅数据中提高神经网络的准确性。为了证明这一概念，我们使用压控振荡器(VCO)作为测试工具，并证明我们的设计方法可以准确地模拟电路并生成具有广泛规格的设计。我们表明，基于从硅测量数据迁移学习的神经网络模型的电路尺寸产生最准确的结果。

{"title":"From Specification to Silicon: Towards Analog/Mixed-Signal Design Automation using Surrogate NN Models with Transfer Learning","authors":"Juzheng Liu, Shiyu Su, Meghna Madhusudan, Mohsen Hassanpourghadi, Samuel Saunders, Qiaochu Zhang, Rezwan A. Rasul, Yaguang Li, Jiang Hu, A. Sharma, S. Sapatnekar, R. Harjani, Anthony Levi, S. Gupta, M. Chen","doi":"10.1109/ICCAD51958.2021.9643445","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643445","url":null,"abstract":"We propose a complete analog mixed-signal circuit design flow from specification to silicon with minimum human-in-the-loop interaction, and verify the flow in a 12nm FinFET CMOS process. The flow consists of three key elements: neural network (NN) modeling of the parameterized circuit component, a search algorithm based on NN models to determine its sizing, and layout automation. To reduce the required training data for NN model creation, we utilize transfer learning to improve the NN accuracy from a relatively small amount of post-layout/silicon data. To prove the concept, we use a voltage-controlled oscillator (VCO) as a test vehicle and demonstrate that our design methodology can accurately model the circuit and generate designs with a wide range of specifications. We show that circuit sizing based on the transfer learned NN model from silicon measurement data yields the most accurate results.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"183 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127523868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

FlowTuner: A Multi-Stage EDA Flow Tuner Exploiting Parameter Knowledge Transfer FlowTuner:利用参数知识转移的多级EDA流量调谐器

2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643564

Rongjian Liang, Jinwook Jung, Hua Xiang, L. Reddy, Alexey Lvov, Jiang Hu, Gi-Joon Nam

EDA tools provide a large spectrum of parameters to help designers achieve the maximized PPA of designs. The corresponding enormous solution space, however, hinders designers from navigating towards optimal solutions. In this paper, we propose a multi-stage automatic flow tuning tool, named FlowTuner, for efficient and effective parameter tuning of VLSI design flow. It utilizes both exploitation using transferred parameter knowledge from archival design data and exploration via a multi-stage cooperative co-evolutionary framework. Furthermore, novel flow jump-start and early-stop techniques are developed to reduce the overall runtime for tuning. Experiments on a set of IWLS 2005 benchmark circuits through a commercial tool flow demonstrate that FlowTuner produces considerably better design outcomes in 50 % shorter turnaround time compared to the state-of-the-art flow tuning techniques.

EDA工具提供了大量的参数，以帮助设计人员实现设计的最大PPA。然而，相应的巨大解决方案空间阻碍了设计师找到最优解决方案。本文提出了一种多级自动流程调谐工具FlowTuner，以实现VLSI设计流程的高效参数调谐。它利用从档案设计数据中转移的参数知识进行开发，并通过多阶段协同进化框架进行探索。此外，还开发了新的流启动和早停技术，以减少调优的总运行时间。通过商业工具流在IWLS 2005基准电路上的实验表明，与最先进的流量调谐技术相比，FlowTuner在缩短50%的周转时间内产生了更好的设计结果。

引用次数: 3

IPA: Floorplan-Aware SystemC Interconnect Performance Modeling and Generation for HLS-based SoCs 基于hls的soc的平面感知系统互连性能建模和生成

2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643499

N. Pinckney, Rangharajan Venkatesan, Ben Keller, Brucek Khailany

High-level synthesis (HLS) has recently been used to improve design productivity for many units in today's complex SoCs. HLS tools and flows improve chip design productivity by enabling prototyping and automated implementation of RTL from a single codebase. Although interconnect design is a critical part of today's highly complex SoCs, HLS has not historically been used for SoC-level interconnect. One reason for this is that interconnect architecture and physical floorplan are tightly coupled, and can be difficult to estimate early in the design process. To address this gap, we propose IPA (Interconnect Prototyping Assistant), a framework for interconnect prototyping and implementation in HLS-based SoC flows. IPA includes an application programming interface (API) and accompanying tools that automate interconnect modeling and generation for SystemC-based designs. Our framework is used during early architectural prototyping by abstracting specifics of interconnect implementation. IPA then generates interconnect models, including interfaces, for SystemC cycle-accurate simulations. If the design requires long wires between communication units, IPA automatically inserts retiming stages to meet clock frequency targets. IPA's SystemC code is fully HLS-compatible for RTL creation, and thus can be used within a full-chip HLS flow for pushbutton interconnect generation once a design point is selected. IPA provides accurate architectural performance feedback in minutes and can generate high-quality RTL implementations for SoC interconnect in hours. We demonstrate IPA by exploring the design space for an on-chip interconnect on a micro-benchmark and a deep learning accelerator.

高级综合(HLS)最近被用于提高当今复杂soc中许多单元的设计生产率。HLS工具和流程通过从单个代码库实现原型和RTL的自动化实现，提高了芯片设计的生产率。虽然互连设计是当今高度复杂的soc的关键部分，但HLS在历史上并未用于soc级互连。其中一个原因是互连架构和物理平面是紧密耦合的，在设计过程的早期很难估计。为了解决这一差距，我们提出了IPA(互连原型助理)，这是一个在基于hls的SoC流中进行互连原型设计和实现的框架。IPA包括一个应用程序编程接口(API)和附带的工具，这些工具可以自动地为基于systemc的设计进行互连建模和生成。我们的框架通过抽象互连实现的细节在早期的架构原型中使用。然后，IPA生成互连模型，包括接口，用于SystemC周期精确模拟。如果设计需要通信单元之间的长导线，IPA会自动插入重定时级以满足时钟频率目标。IPA的SystemC代码是完全HLS兼容的RTL创建，因此可以在全芯片HLS流程中用于按钮互连生成一旦设计点被选中。IPA在几分钟内提供准确的架构性能反馈，并可以在几小时内为SoC互连生成高质量的RTL实现。

{"title":"IPA: Floorplan-Aware SystemC Interconnect Performance Modeling and Generation for HLS-based SoCs","authors":"N. Pinckney, Rangharajan Venkatesan, Ben Keller, Brucek Khailany","doi":"10.1109/ICCAD51958.2021.9643499","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643499","url":null,"abstract":"High-level synthesis (HLS) has recently been used to improve design productivity for many units in today's complex SoCs. HLS tools and flows improve chip design productivity by enabling prototyping and automated implementation of RTL from a single codebase. Although interconnect design is a critical part of today's highly complex SoCs, HLS has not historically been used for SoC-level interconnect. One reason for this is that interconnect architecture and physical floorplan are tightly coupled, and can be difficult to estimate early in the design process. To address this gap, we propose IPA (Interconnect Prototyping Assistant), a framework for interconnect prototyping and implementation in HLS-based SoC flows. IPA includes an application programming interface (API) and accompanying tools that automate interconnect modeling and generation for SystemC-based designs. Our framework is used during early architectural prototyping by abstracting specifics of interconnect implementation. IPA then generates interconnect models, including interfaces, for SystemC cycle-accurate simulations. If the design requires long wires between communication units, IPA automatically inserts retiming stages to meet clock frequency targets. IPA's SystemC code is fully HLS-compatible for RTL creation, and thus can be used within a full-chip HLS flow for pushbutton interconnect generation once a design point is selected. IPA provides accurate architectural performance feedback in minutes and can generate high-quality RTL implementations for SoC interconnect in hours. We demonstrate IPA by exploring the design space for an on-chip interconnect on a micro-benchmark and a deep learning accelerator.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121088737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

pGRASS-Solver: A Parallel Iterative Solver for Scalable Power Grid Analysis Based on Graph Spectral Sparsification pGRASS-Solver:一个基于图谱稀疏化的可扩展电网分析并行迭代求解器

2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643489

Zhiqiang Liu, Wenjian Yu

Due to the rapid advance of the integrated circuit technology, power grid analysis usually imposes a severe computational challenge, where linear equations with millions or even billions of unknowns need to be solved. Recent graph spectral sparsification techniques have shown promising performance in accelerating power grid analysis. However, previous graph sparsification based iterative solvers are restricted by difficulty of parallelization. Existing graph sparsification algorithms are implemented under the assumption of serial computing, while factorization and backward/forward substitution of the spar-sifier's Laplacian matrix are also hard to parallelize. On the other hand, partition based iterative methods which can be easily parallelized lack a direct control of the relative condition number of the preconditioner and consume more memory. In this work, we propose a novel parallel iterative solver for scalable power grid analysis by integrating graph sparsification techniques and partition based methods. We first propose a practically-efficient parallel graph sparsification algorithm. Then, domain decomposition method is leveraged to solve the sparsifier's Laplacian matrix. An efficient graph sparsification based parallel preconditioner is obtained, which not only leads to fast convergence but also enjoys ease of parallelization. Extensive experiments are carried out to demonstrate the superior efficiency of the proposed solver for large-scale power grid analysis, showing 5.2X speedup averagely over the state-of-the-art parallel iterative solver. Moreover, it solves a real-world power grid matrix with 0.36 billion nodes and 8.7 billion nonzeros within 23 minutes on a 16-core machine, which is 9.5X faster than the best result of sequential graph sparsification based solver.

由于集成电路技术的快速发展，电网分析通常会带来严峻的计算挑战，其中需要求解具有数百万甚至数十亿未知数的线性方程。近年来的图形谱稀疏化技术在加速电网分析方面显示出良好的性能。然而，以往基于图稀疏化的迭代求解算法受到并行化难度的限制。现有的图稀疏化算法是在串行计算的假设下实现的，而稀疏化器的拉普拉斯矩阵的因式分解和前向/后向替换也难以并行化。另一方面，易于并行化的基于分区的迭代方法缺乏对前置条件的相对条件数的直接控制，并且占用更多的内存。在这项工作中，我们提出了一种新的并行迭代求解器，通过集成图稀疏化技术和基于分区的方法来进行可扩展电网分析。首先提出了一种实用的并行图稀疏化算法。然后，利用区域分解方法求解稀疏子的拉普拉斯矩阵。得到了一种高效的基于图稀疏化的并行预条件，该预条件不仅收敛速度快，而且易于并行化。大量的实验证明了所提出的求解器在大规模电网分析中的优越效率，比最先进的并行迭代求解器平均加速5.2倍。此外，在16核机器上，它在23分钟内解决了一个具有3.6亿个节点和87亿个非零的现实世界电网矩阵，比基于顺序图稀疏化的求解器的最佳结果快9.5倍。

{"title":"pGRASS-Solver: A Parallel Iterative Solver for Scalable Power Grid Analysis Based on Graph Spectral Sparsification","authors":"Zhiqiang Liu, Wenjian Yu","doi":"10.1109/ICCAD51958.2021.9643489","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643489","url":null,"abstract":"Due to the rapid advance of the integrated circuit technology, power grid analysis usually imposes a severe computational challenge, where linear equations with millions or even billions of unknowns need to be solved. Recent graph spectral sparsification techniques have shown promising performance in accelerating power grid analysis. However, previous graph sparsification based iterative solvers are restricted by difficulty of parallelization. Existing graph sparsification algorithms are implemented under the assumption of serial computing, while factorization and backward/forward substitution of the spar-sifier's Laplacian matrix are also hard to parallelize. On the other hand, partition based iterative methods which can be easily parallelized lack a direct control of the relative condition number of the preconditioner and consume more memory. In this work, we propose a novel parallel iterative solver for scalable power grid analysis by integrating graph sparsification techniques and partition based methods. We first propose a practically-efficient parallel graph sparsification algorithm. Then, domain decomposition method is leveraged to solve the sparsifier's Laplacian matrix. An efficient graph sparsification based parallel preconditioner is obtained, which not only leads to fast convergence but also enjoys ease of parallelization. Extensive experiments are carried out to demonstrate the superior efficiency of the proposed solver for large-scale power grid analysis, showing 5.2X speedup averagely over the state-of-the-art parallel iterative solver. Moreover, it solves a real-world power grid matrix with 0.36 billion nodes and 8.7 billion nonzeros within 23 minutes on a 16-core machine, which is 9.5X faster than the best result of sequential graph sparsification based solver.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126373963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6