2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)最新文献

Big-Little Chiplets for In-Memory Acceleration of DNNs: A Scalable Heterogeneous Architecture 基于大小芯片的dnn内存加速:一个可扩展的异构架构

2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549447

Gokul Krishnan, A. Goksoy, Sumit K. Mandal, Zhenyu Wang, C. Chakrabarti, Jae-sun Seo, U. Ogras, Yu Cao

Monolithic in-memory computing (IMC) architectures face significant yield and fabrication cost challenges as the complexity of DNNs increases. Chiplet-based IMCs that integrate multiple dies with advanced 2.5D/3D packaging offers a low-cost and scalable solution. They enable heterogeneous architectures where the chiplets and their associated interconnection can be tailored to the non-uniform algorithmic structures to maximize IMC utilization and reduce energy consumption. This paper proposes a heterogeneous IMC architecture with big-little chiplets and a hybrid network-on-package (NoP) to optimize the utilization, interconnect bandwidth, and energy efficiency. For a given DNN, we develop a custom methodology to map the model onto the big-little architecture such that the early layers in the DNN are mapped to the little chiplets with higher NoP bandwidth and the subsequent layers are mapped to the big chiplets with lower NoP bandwidth. Furthermore, we achieve a scalable solution by incorporating a DRAM into each chiplet to support a wide range of DNNs beyond the area limit. Compared to a homogeneous chiplet-based IMC architecture, the proposed big-little architecture achieves up to 329× improvement in the energy-delay-area product (EDAP) and up to 2× higher IMC utilization. Experimental evaluation of the proposed big-little chiplet-based RRAM IMC architecture for ResNet-50 on ImageNet shows 259×, 139×, and 48× improvement in energy-efficiency at lower area compared to Nvidia V100 GPU, Nvidia T4 GPU, and SIMBA architecture, respectively.

随着深度神经网络复杂性的增加，单片内存计算(IMC)架构面临着显著的成收率和制造成本挑战。基于芯片的集成集成电路集成了多个芯片和先进的2.5D/3D封装，提供了低成本和可扩展的解决方案。它们支持异构架构，其中小芯片及其相关互连可以根据非统一算法结构进行定制，以最大限度地提高IMC利用率并降低能耗。本文提出了一种具有大小芯片和混合包上网络(NoP)的异构IMC架构，以优化利用率、互连带宽和能源效率。对于给定的DNN，我们开发了一种自定义方法将模型映射到大-小架构上，这样DNN中的早期层被映射到具有较高NoP带宽的小芯片上，随后的层被映射到具有较低NoP带宽的大芯片上。此外，我们通过将DRAM集成到每个芯片中来实现可扩展的解决方案，以支持超出区域限制的大范围dnn。与基于同构芯片的IMC架构相比，该架构的能量延迟面积积(EDAP)提高了329倍，IMC利用率提高了2倍。在ImageNet上对提出的基于大大小芯片的ResNet-50 RRAM IMC架构进行了实验评估，结果显示，与Nvidia V100 GPU、Nvidia T4 GPU和SIMBA架构相比，该架构在低区域的能效分别提高了259倍、139倍和48倍。

{"title":"Big-Little Chiplets for In-Memory Acceleration of DNNs: A Scalable Heterogeneous Architecture","authors":"Gokul Krishnan, A. Goksoy, Sumit K. Mandal, Zhenyu Wang, C. Chakrabarti, Jae-sun Seo, U. Ogras, Yu Cao","doi":"10.1145/3508352.3549447","DOIUrl":"https://doi.org/10.1145/3508352.3549447","url":null,"abstract":"Monolithic in-memory computing (IMC) architectures face significant yield and fabrication cost challenges as the complexity of DNNs increases. Chiplet-based IMCs that integrate multiple dies with advanced 2.5D/3D packaging offers a low-cost and scalable solution. They enable heterogeneous architectures where the chiplets and their associated interconnection can be tailored to the non-uniform algorithmic structures to maximize IMC utilization and reduce energy consumption. This paper proposes a heterogeneous IMC architecture with big-little chiplets and a hybrid network-on-package (NoP) to optimize the utilization, interconnect bandwidth, and energy efficiency. For a given DNN, we develop a custom methodology to map the model onto the big-little architecture such that the early layers in the DNN are mapped to the little chiplets with higher NoP bandwidth and the subsequent layers are mapped to the big chiplets with lower NoP bandwidth. Furthermore, we achieve a scalable solution by incorporating a DRAM into each chiplet to support a wide range of DNNs beyond the area limit. Compared to a homogeneous chiplet-based IMC architecture, the proposed big-little architecture achieves up to 329× improvement in the energy-delay-area product (EDAP) and up to 2× higher IMC utilization. Experimental evaluation of the proposed big-little chiplet-based RRAM IMC architecture for ResNet-50 on ImageNet shows 259×, 139×, and 48× improvement in energy-efficiency at lower area compared to Nvidia V100 GPU, Nvidia T4 GPU, and SIMBA architecture, respectively.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115618410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Fast, Robust and Accurate Detection of Cache-based Spectre Attack Phases 基于缓存的幽灵攻击阶段快速、鲁棒和准确检测

2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549330

A. Pashrashid, Ali Hajiabadi, Trevor E. Carlson

Modern processors achieve high performance and efficiency by employing techniques such as speculative execution and sharing resources such as caches. However, recent attacks like Spectre and Meltdown exploit the speculative execution of modern processors to leak sensitive information from the system. Many mitigation strategies have been proposed to restrict the speculative execution of processors and protect potential side-channels. Currently, these techniques have shown a significant performance overhead. A solution that can detect memory leaks before the attacker has a chance to exploit them would allow the processor to reduce the performance overhead by enabling protections only when the system is at risk.In this paper, we propose a mechanism to detect speculative execution attacks that use caches as a side-channel. In this detector we track the phases of a successful attack and raise an alert before the attacker gets a chance to recover sensitive information. We accomplish this through monitoring the microarchitectural changes in the core and caches, and detect the memory locations that can be potential memory data leaks. We achieve 100% accuracy and negligible false positive rate in detecting Spectre attacks and evasive versions of Spectre that the state-of-the-art detectors are unable to detect. Our detector has no performance overhead with negligible power and area overheads.

现代处理器通过采用推测执行和共享资源(如缓存)等技术来实现高性能和高效率。然而，最近像Spectre和Meltdown这样的攻击利用了现代处理器的推测执行来泄露系统中的敏感信息。已经提出了许多缓解策略来限制处理器的推测执行并保护潜在的侧信道。目前，这些技术已经显示出显著的性能开销。能够在攻击者有机会利用内存泄漏之前检测到内存泄漏的解决方案将允许处理器仅在系统处于危险中时启用保护，从而降低性能开销。在本文中，我们提出了一种机制来检测使用缓存作为侧通道的推测执行攻击。在这个检测器中，我们跟踪成功攻击的各个阶段，并在攻击者有机会恢复敏感信息之前发出警报。我们通过监视核心和缓存中的微体系结构变化，并检测可能存在潜在内存数据泄漏的内存位置来实现这一点。我们在检测幽灵攻击和最先进的探测器无法检测到的幽灵逃避版本方面达到100%的准确性和可忽略不计的误报率。我们的探测器没有性能开销，功率和面积开销可以忽略不计。

{"title":"Fast, Robust and Accurate Detection of Cache-based Spectre Attack Phases","authors":"A. Pashrashid, Ali Hajiabadi, Trevor E. Carlson","doi":"10.1145/3508352.3549330","DOIUrl":"https://doi.org/10.1145/3508352.3549330","url":null,"abstract":"Modern processors achieve high performance and efficiency by employing techniques such as speculative execution and sharing resources such as caches. However, recent attacks like Spectre and Meltdown exploit the speculative execution of modern processors to leak sensitive information from the system. Many mitigation strategies have been proposed to restrict the speculative execution of processors and protect potential side-channels. Currently, these techniques have shown a significant performance overhead. A solution that can detect memory leaks before the attacker has a chance to exploit them would allow the processor to reduce the performance overhead by enabling protections only when the system is at risk.In this paper, we propose a mechanism to detect speculative execution attacks that use caches as a side-channel. In this detector we track the phases of a successful attack and raise an alert before the attacker gets a chance to recover sensitive information. We accomplish this through monitoring the microarchitectural changes in the core and caches, and detect the memory locations that can be potential memory data leaks. We achieve 100% accuracy and negligible false positive rate in detecting Spectre attacks and evasive versions of Spectre that the state-of-the-art detectors are unable to detect. Our detector has no performance overhead with negligible power and area overheads.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121031663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Machine Learning for Testing Machine-Learning Hardware: A Virtuous Cycle∗ 机器学习测试机器学习硬件:一个良性循环

2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2022-10-29 DOI: 10.1145/3508352.3561121

Arjun Chaudhuri, Jonti Talukdar, K. Chakrabarty

The ubiquitous application of deep neural networks (DNN) has led to a rise in demand for AI accelerators. DNN-specific functional criticality analysis identifies faults that cause measurable and significant deviations from acceptable requirements such as the inferencing accuracy. This paper examines the problem of classifying structural faults in the processing elements (PEs) of systolic-array accelerators. We first present a two-tier machine-learning (ML) based method to assess the functional criticality of faults. While supervised learning techniques can be used to accurately estimate fault criticality, it requires a considerable amount of ground truth for model training. We therefore describe a neural-twin framework for analyzing fault criticality with a negligible amount of ground-truth data. We further describe a topological and probabilistic framework to estimate the expected number of PE’s primary outputs (POs) flipping in the presence of defects and use the PO-flip count as a surrogate for determining fault criticality. We demonstrate that the combination of PO-flip count and neural twin-enabled sensitivity analysis of internal nets can be used as additional features in existing ML-based criticality classifiers.

深度神经网络(DNN)的广泛应用导致了对人工智能加速器的需求上升。dnn特定的功能临界性分析识别导致可接受要求(如推理精度)的可测量和重大偏差的故障。本文研究了收缩阵列加速器处理单元中结构故障的分类问题。我们首先提出了一种基于双层机器学习(ML)的方法来评估故障的功能临界性。虽然监督学习技术可以用来准确地估计故障的严重性，但它需要大量的基础真值来进行模型训练。因此，我们描述了一个神经-孪生框架，用于用可忽略不计的真实数据分析故障临界性。我们进一步描述了一个拓扑和概率框架来估计在存在缺陷的情况下PE的主输出(POs)翻转的预期数量，并使用PO-flip计数作为确定故障临界性的替代。我们证明了PO-flip计数和内部网络的神经孪生敏感性分析的组合可以用作现有的基于ml的临界分类器的附加特征。

{"title":"Machine Learning for Testing Machine-Learning Hardware: A Virtuous Cycle∗","authors":"Arjun Chaudhuri, Jonti Talukdar, K. Chakrabarty","doi":"10.1145/3508352.3561121","DOIUrl":"https://doi.org/10.1145/3508352.3561121","url":null,"abstract":"The ubiquitous application of deep neural networks (DNN) has led to a rise in demand for AI accelerators. DNN-specific functional criticality analysis identifies faults that cause measurable and significant deviations from acceptable requirements such as the inferencing accuracy. This paper examines the problem of classifying structural faults in the processing elements (PEs) of systolic-array accelerators. We first present a two-tier machine-learning (ML) based method to assess the functional criticality of faults. While supervised learning techniques can be used to accurately estimate fault criticality, it requires a considerable amount of ground truth for model training. We therefore describe a neural-twin framework for analyzing fault criticality with a negligible amount of ground-truth data. We further describe a topological and probabilistic framework to estimate the expected number of PE’s primary outputs (POs) flipping in the presence of defects and use the PO-flip count as a surrogate for determining fault criticality. We demonstrate that the combination of PO-flip count and neural twin-enabled sensitivity analysis of internal nets can be used as additional features in existing ML-based criticality classifiers.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121349038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Embracing Graph Neural Networks for Hardware Security 拥抱图神经网络硬件安全

2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2022-10-29 DOI: 10.1145/3508352.3561096

Lilas Alrahis, Satwik Patnaik, M. Shafique, O. Sinanoglu

Graph neural networks (GNNs) have attracted increasing attention due to their superior performance in deep learning on graph-structured data. GNNs have succeeded across various domains such as social networks, chemistry, and electronic design automation (EDA). Electronic circuits have a long history of being represented as graphs, and to no surprise, GNNs have demonstrated state-of-the-art performance in solving various EDA tasks. More importantly, GNNs are now employed to address several hardware security problems, such as detecting intellectual property (IP) piracy and hardware Trojans (HTs), to name a few.In this survey, we first provide a comprehensive overview of the usage of GNNs in hardware security and propose the first taxonomy to divide the state-of-the-art GNN-based hardware security systems into four categories: (i) HT detection systems, (ii) IP piracy detection systems, (iii) reverse engineering platforms, and (iv) attacks on logic locking. We summarize the different architectures, graph types, node features, benchmark data sets, and model evaluation of the employed GNNs. Finally, we elaborate on the lessons learned and discuss future directions.

图神经网络(gnn)由于其在图结构数据的深度学习方面的优异性能而受到越来越多的关注。gnn在社交网络、化学和电子设计自动化(EDA)等各个领域都取得了成功。电子电路有很长的用图形表示的历史，毫不奇怪，gnn在解决各种EDA任务方面表现出了最先进的性能。更重要的是，gnn现在被用于解决几个硬件安全问题，例如检测知识产权(IP)盗版和硬件木马(ht)，仅举几例。在本调查中，我们首先全面概述了gnn在硬件安全中的使用，并提出了第一个分类法，将最先进的基于gnn的硬件安全系统分为四类:(i) HT检测系统，(ii) IP盗版检测系统，(iii)逆向工程平台，以及(iv)对逻辑锁定的攻击。我们总结了gnn的不同架构、图类型、节点特征、基准数据集和模型评估。最后，我们详细阐述了经验教训，并讨论了未来的发展方向。

{"title":"Embracing Graph Neural Networks for Hardware Security","authors":"Lilas Alrahis, Satwik Patnaik, M. Shafique, O. Sinanoglu","doi":"10.1145/3508352.3561096","DOIUrl":"https://doi.org/10.1145/3508352.3561096","url":null,"abstract":"Graph neural networks (GNNs) have attracted increasing attention due to their superior performance in deep learning on graph-structured data. GNNs have succeeded across various domains such as social networks, chemistry, and electronic design automation (EDA). Electronic circuits have a long history of being represented as graphs, and to no surprise, GNNs have demonstrated state-of-the-art performance in solving various EDA tasks. More importantly, GNNs are now employed to address several hardware security problems, such as detecting intellectual property (IP) piracy and hardware Trojans (HTs), to name a few.In this survey, we first provide a comprehensive overview of the usage of GNNs in hardware security and propose the first taxonomy to divide the state-of-the-art GNN-based hardware security systems into four categories: (i) HT detection systems, (ii) IP piracy detection systems, (iii) reverse engineering platforms, and (iv) attacks on logic locking. We summarize the different architectures, graph types, node features, benchmark data sets, and model evaluation of the employed GNNs. Finally, we elaborate on the lessons learned and discuss future directions.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125570212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

A Novel Semi-Analytical Approach for Fast Electromigration Stress Analysis in Multi-Segment Interconnects 一种新的半解析方法用于多段互连中的快速电迁移应力分析

2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549476

Olympia Axelou, N. Evmorfopoulos, G. Floros, G. Stamoulis, S. Sapatnekar

As integrated circuit technologies move below 10 nm, Electromigration (EM) has become an issue of great concern for the longterm reliability due to the stricter performance, thermal and power requirements. The problem of EM becomes even more pronounced in power grids due to the large unidirectional currents flowing in these structures. The attention for EM analysis during the past years has been drawn to accurate physics-based models describing the interplay between the electron wind force and the back stress force, in a single Partial Differential Equation (PDE) involving wire stress. In this paper, we present a fast semi-analytical approach for the solution of the stress PDE at discrete spatial points in multi-segment lines of power grids, which allows the analytical calculation of EM stress independently at any time in these lines. Our method exploits the specific form of the discrete stress coefficient matrix whose eigenvalues and eigenvectors are known beforehand. Thus, a closed-form equation can be constructed with almost linear time complexity without the need of time discretization. This closed-form equation can be subsequently used at any given time in transient stress analysis. Our experimental results, using the industrial IBM power grid benchmarks, demonstrate that our method has excellent accuracy compared to the industrial tool COMSOL while being orders of magnitude times faster.

随着集成电路技术发展到10nm以下，由于对性能、散热和功耗的要求越来越严格，电迁移(EM)已经成为长期可靠性的重要问题。由于电网结构中存在较大的单向电流，因此电磁问题在电网中变得更加突出。在过去的几年里，对电磁分析的关注已经被吸引到精确的基于物理的模型中，该模型描述了电子风力和背应力之间的相互作用，在一个涉及电线应力的单一偏微分方程(PDE)中。本文提出了一种求解电网多段线中离散空间点应力PDE的快速半解析方法，可实现对多段线中任意时刻的电磁应力独立解析计算。我们的方法利用了离散应力系数矩阵的具体形式，其特征值和特征向量是事先已知的。因此，可以构造具有几乎线性时间复杂度的闭型方程，而不需要时间离散化。该封闭式方程可在任何给定时间用于瞬态应力分析。我们使用工业IBM电网基准测试的实验结果表明，与工业工具COMSOL相比，我们的方法具有出色的准确性，同时速度快了几个数量级。

{"title":"A Novel Semi-Analytical Approach for Fast Electromigration Stress Analysis in Multi-Segment Interconnects","authors":"Olympia Axelou, N. Evmorfopoulos, G. Floros, G. Stamoulis, S. Sapatnekar","doi":"10.1145/3508352.3549476","DOIUrl":"https://doi.org/10.1145/3508352.3549476","url":null,"abstract":"As integrated circuit technologies move below 10 nm, Electromigration (EM) has become an issue of great concern for the longterm reliability due to the stricter performance, thermal and power requirements. The problem of EM becomes even more pronounced in power grids due to the large unidirectional currents flowing in these structures. The attention for EM analysis during the past years has been drawn to accurate physics-based models describing the interplay between the electron wind force and the back stress force, in a single Partial Differential Equation (PDE) involving wire stress. In this paper, we present a fast semi-analytical approach for the solution of the stress PDE at discrete spatial points in multi-segment lines of power grids, which allows the analytical calculation of EM stress independently at any time in these lines. Our method exploits the specific form of the discrete stress coefficient matrix whose eigenvalues and eigenvectors are known beforehand. Thus, a closed-form equation can be constructed with almost linear time complexity without the need of time discretization. This closed-form equation can be subsequently used at any given time in transient stress analysis. Our experimental results, using the industrial IBM power grid benchmarks, demonstrate that our method has excellent accuracy compared to the industrial tool COMSOL while being orders of magnitude times faster.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"282 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122088583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

IEEE CEDA DATC: Expanding Research Foundations for IC Physical Design and ML-Enabled EDA IEEE CEDA DATC:扩展集成电路物理设计和机器学习支持EDA的研究基础

2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2022-10-29 DOI: 10.1145/3508352.3561379

Jinwook Jung, A. Kahng, R. Varadarajan, Zhiang Wang

This paper describes new elements in the RDF-2022 release of the DATC Robust Design Flow, along with other activities of the IEEE CEDA DATC. The RosettaStone initiated with RDF-2021 has been augmented to include 35 benchmarks and four open-source technologies (ASAP7, NanGate45 and SkyWater130HS/HD), plus timing-sensible versions created using path-cutting. The Hier-RTLMP macro placer is now part of DATC RDF, enabling macro placement for large modern designs with hundreds of macros. To establish a clear baseline for macro placers, new open-source benchmark suites on open PDKs, with corresponding flows for fully reproducible results, are provided. METRICS2.1 infrastructure in OpenROAD and OpenROAD-flow-scripts now uses native JSON metrics reporting, which is more robust and general than the previous Python script-based method. Calibrations on open enablements have also seen notable updates in the RDF. Finally, we also describe an approach to establishing a generic, cloud-native large-scale design of experiments for ML-enabled EDA. Our paper closes with future research directions related to DATC’s efforts.

本文描述了DATC稳健设计流程的RDF-2022版本中的新元素，以及IEEE CEDA DATC的其他活动。由RDF-2021启动的RosettaStone已经扩展到包括35个基准测试和4个开源技术(ASAP7、NanGate45和SkyWater130HS/HD)，以及使用路径切割创建的时间敏感版本。Hier-RTLMP宏放置器现在是DATC RDF的一部分，它支持对具有数百个宏的大型现代设计进行宏放置。为了为宏放置器建立一个清晰的基线，在开放的pdk上提供了新的开源基准套件，并提供了相应的流程以获得完全可重复的结果。OpenROAD和OpenROAD-flow-scripts中的METRICS2.1基础设施现在使用原生JSON指标报告，这比以前基于Python脚本的方法更健壮和通用。对开放启用的校准也在RDF中得到了显著的更新。最后，我们还描述了一种为支持ml的EDA建立通用的云原生大规模实验设计的方法。本文最后提出了与DATC工作相关的未来研究方向。

{"title":"IEEE CEDA DATC: Expanding Research Foundations for IC Physical Design and ML-Enabled EDA","authors":"Jinwook Jung, A. Kahng, R. Varadarajan, Zhiang Wang","doi":"10.1145/3508352.3561379","DOIUrl":"https://doi.org/10.1145/3508352.3561379","url":null,"abstract":"This paper describes new elements in the RDF-2022 release of the DATC Robust Design Flow, along with other activities of the IEEE CEDA DATC. The RosettaStone initiated with RDF-2021 has been augmented to include 35 benchmarks and four open-source technologies (ASAP7, NanGate45 and SkyWater130HS/HD), plus timing-sensible versions created using path-cutting. The Hier-RTLMP macro placer is now part of DATC RDF, enabling macro placement for large modern designs with hundreds of macros. To establish a clear baseline for macro placers, new open-source benchmark suites on open PDKs, with corresponding flows for fully reproducible results, are provided. METRICS2.1 infrastructure in OpenROAD and OpenROAD-flow-scripts now uses native JSON metrics reporting, which is more robust and general than the previous Python script-based method. Calibrations on open enablements have also seen notable updates in the RDF. Finally, we also describe an approach to establishing a generic, cloud-native large-scale design of experiments for ML-enabled EDA. Our paper closes with future research directions related to DATC’s efforts.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128539418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

On Advancing Physical Design using Graph Neural Networks (Invited Paper) 利用图神经网络推进物理设计(特邀论文)

2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2022-10-29 DOI: 10.1145/3508352.3561094

Yi-Chen Lu, S. Lim

As modern Physical Design (PD) algorithms and methodologies evolve into the post-Moore era with the aid of machine learning, Graph Neural Networks (GNNs) are becoming increasingly ubiquitous given that netlists are essentially graphs. Recently, their ability to perform effective graph learning has provided significant insights to understand the underlying dynamics during netlist-to-layout transformations. GNNs follow a message-passing scheme, where the goal is to construct meaningful representations either at the entire graph or node-level by recursively aggregating and transforming the initial features. In the realm of PD, the GNN-learned representations have been leveraged to solve the tasks such as cell clustering, quality-of-result prediction, activity simulation, etc., which often overcome the limitations of traditional PD algorithms. In this work, we first revisit recent advancements that GNNs have made in PD. Second, we discuss how GNNs serve as the backbone of novel PD flows. Finally, we present our thoughts on ongoing and future PD challenges that GNNs can tackle and succeed.

随着现代物理设计(PD)算法和方法在机器学习的帮助下发展到后摩尔时代，图神经网络(gnn)变得越来越普遍，因为网络列表本质上是图。最近，他们执行有效图学习的能力为理解网络列表到布局转换过程中的潜在动态提供了重要的见解。gnn遵循消息传递方案，其目标是通过递归聚合和转换初始特征，在整个图或节点级构建有意义的表示。在PD领域，利用gnn学习的表示来解决诸如细胞聚类、结果质量预测、活动模拟等任务，这些任务通常克服了传统PD算法的局限性。在这项工作中，我们首先回顾了gnn在PD中取得的最新进展。其次，我们讨论了gnn如何作为新型PD流的支柱。最后，我们提出了关于gnn可以解决并取得成功的当前和未来PD挑战的想法。

引用次数: 3

Personalized Heterogeneity-aware Federated Search Towards Better Accuracy and Energy Efficiency 个性化异构感知联邦搜索，提高准确性和能源效率

2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549403

Zhao Yang, Qingshuang Sun

Federated learning (FL), a new distributed technology, allows us to train the global model on the edge and embedded devices without local data sharing. However, due to the wide distribution of different types of devices, FL faces severe heterogeneity issues. The accuracy and efficiency of FL deployment at the edge are severely impacted by heterogeneous data and heterogeneous systems. In this paper, we perform joint FL model personalization for heterogeneous systems and heterogeneous data to address the challenges posed by heterogeneities. We begin by using model inference efficiency as a starting point to personalize network scale on each node. Furthermore, it can be used to guide the efficient FL training process, which can help to ease the problem of straggler devices and improve FL’s energy efficiency. During FL training, federated search is then used to acquire highly accurate personalized network structures. By taking into account the unique characteristics of FL deployment at edge devices, the personalized network structures obtained by our federated search framework with a lightweight search controller can achieve competitive accuracy with state-of-the-art (SOTA) methods, while reducing inference and training energy consumption by up to 3.57× and 1.82×, respectively.

联邦学习(FL)是一种新的分布式技术，它允许我们在边缘和嵌入式设备上训练全局模型，而不需要本地数据共享。然而，由于不同类型的设备分布广泛，FL面临着严重的异构问题。异构数据和异构系统严重影响边缘FL部署的准确性和效率。在本文中，我们对异构系统和异构数据执行联合FL模型个性化，以解决异构带来的挑战。我们首先以模型推理效率为起点，在每个节点上个性化网络规模。此外，它可以用来指导高效的FL训练过程，有助于缓解离散器件的问题，提高FL的能源效率。在FL训练过程中，使用联邦搜索获得高度精确的个性化网络结构。考虑到FL在边缘设备上部署的独特特征，我们的联邦搜索框架和轻量级搜索控制器获得的个性化网络结构可以达到与最先进(SOTA)方法相媲美的精度，同时将推理和训练能耗分别降低3.57倍和1.82倍。

{"title":"Personalized Heterogeneity-aware Federated Search Towards Better Accuracy and Energy Efficiency","authors":"Zhao Yang, Qingshuang Sun","doi":"10.1145/3508352.3549403","DOIUrl":"https://doi.org/10.1145/3508352.3549403","url":null,"abstract":"Federated learning (FL), a new distributed technology, allows us to train the global model on the edge and embedded devices without local data sharing. However, due to the wide distribution of different types of devices, FL faces severe heterogeneity issues. The accuracy and efficiency of FL deployment at the edge are severely impacted by heterogeneous data and heterogeneous systems. In this paper, we perform joint FL model personalization for heterogeneous systems and heterogeneous data to address the challenges posed by heterogeneities. We begin by using model inference efficiency as a starting point to personalize network scale on each node. Furthermore, it can be used to guide the efficient FL training process, which can help to ease the problem of straggler devices and improve FL’s energy efficiency. During FL training, federated search is then used to acquire highly accurate personalized network structures. By taking into account the unique characteristics of FL deployment at edge devices, the personalized network structures obtained by our federated search framework with a lightweight search controller can achieve competitive accuracy with state-of-the-art (SOTA) methods, while reducing inference and training energy consumption by up to 3.57× and 1.82×, respectively.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"9 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123650829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Scalable Methodology for Agile Chip Development with Open-Source Hardware Components : (Invited Paper) 基于开源硬件组件的敏捷芯片开发的可扩展方法(特邀论文)

2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2022-10-29 DOI: 10.1145/3508352.3561102

Maico Cassel dos Santos, Tianyu Jia, M. Cochet, Karthik Swaminathan, Joseph Zuckerman, Paolo Mantovani, Davide Giri, J. Zhang, Erik Jens Loscalzo, Gabriele Tombesi, Kevin Tien, Nandhini Chandramoorthy, J. Wellman, David Brooks, Gu-Yeon Wei, K. Shepard, L. Carloni, P. Bose

We present a scalable methodology for the agile physical design of tile-based heterogeneous system-on-chip (SoC) architectures that simplifies the reuse and integration of open-source hardware components. The methodology leverages the regularity of the on-chip communication infrastructure, which is based on a multi-plane network-on-chip (NoC), and the modularity of socket interfaces, which connect the tiles to the NoC. Each socket also provides its tile with a set of platform services, including independent clocking and voltage control. As a result, the physical design of each tile can be decoupled from its location in the top-level floorplan of the SoC and the overall SoC design can benefit from a hierarchical timing-closure flow, design reuse and, if necessary, fast respin. With the proposed methodology we completed two SoC tapeouts of increasing complexity, which illustrate its capabilities and the resulting gains in terms of design productivity.

我们提出了一种可扩展的方法，用于基于tile的异构片上系统(SoC)架构的敏捷物理设计，简化了开源硬件组件的重用和集成。该方法利用了基于多平面片上网络(NoC)的片上通信基础设施的规律性，以及将组件连接到NoC的套接字接口的模块化。每个插座还为其组件提供一组平台服务，包括独立的时钟和电压控制。因此，每个瓷砖的物理设计可以与其在SoC顶层平面图中的位置解耦，整体SoC设计可以从分层时间关闭流、设计重用和(如有必要)快速衍生中受益。使用所提出的方法，我们完成了两个日益复杂的SoC条带，这说明了它的功能和在设计生产力方面的收益。

{"title":"A Scalable Methodology for Agile Chip Development with Open-Source Hardware Components : (Invited Paper)","authors":"Maico Cassel dos Santos, Tianyu Jia, M. Cochet, Karthik Swaminathan, Joseph Zuckerman, Paolo Mantovani, Davide Giri, J. Zhang, Erik Jens Loscalzo, Gabriele Tombesi, Kevin Tien, Nandhini Chandramoorthy, J. Wellman, David Brooks, Gu-Yeon Wei, K. Shepard, L. Carloni, P. Bose","doi":"10.1145/3508352.3561102","DOIUrl":"https://doi.org/10.1145/3508352.3561102","url":null,"abstract":"We present a scalable methodology for the agile physical design of tile-based heterogeneous system-on-chip (SoC) architectures that simplifies the reuse and integration of open-source hardware components. The methodology leverages the regularity of the on-chip communication infrastructure, which is based on a multi-plane network-on-chip (NoC), and the modularity of socket interfaces, which connect the tiles to the NoC. Each socket also provides its tile with a set of platform services, including independent clocking and voltage control. As a result, the physical design of each tile can be decoupled from its location in the top-level floorplan of the SoC and the overall SoC design can benefit from a hierarchical timing-closure flow, design reuse and, if necessary, fast respin. With the proposed methodology we completed two SoC tapeouts of increasing complexity, which illustrate its capabilities and the resulting gains in terms of design productivity.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121797923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

HECTOR: A Multi-level Intermediate Representation for Hardware Synthesis Methodologies 硬件综合方法的多层次中间表示

2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549370

Ruifan Xu, You-lin Xiao, Jin Luo, Yun Liang

Hardware synthesis requires a complicated process to generate synthesizable register transfer level (RTL) code. High-level synthesis tools can automatically transform a high-level description into hardware design, while hardware generators adopt domain specific languages and synthesis flows for specific applications. The implementation of these tools generally requires substantial engineering efforts due to RTL’s weak expressivity and low level of abstraction. Furthermore, different synthesis tools adopt different levels of intermediate representations (IR) and transformations. A unified IR obviously is a good way to lower the engineering cost and get competitive hardware design rapidly by exploring different synthesis methodologies.In this paper, we propose Hector, a two-level IR providing a unified intermediate representation for hardware synthesis methodologies. The high-level IR binds computation with a control graph annotated with timing information, while the low-level IR provides a concise way to describe hardware modules and elastic interconnections among them. Implemented based on the multi-level compiler infrastructure (MLIR), Hector’s IRs can be converted to synthesizable RTL designs. To demonstrate the expressivity and versatility, we implement three synthesis approaches based on Hector: a high-level synthesis (HLS) tool, a systolic array generator, and a hardware accelerator. The hardware generated by Hector’s HLS approach is comparable to that generated by the state-of-the-art HLS tools, and the other two cases outperform HLS implementations in performance and productivity.

硬件合成需要一个复杂的过程来生成可合成的寄存器传输级(RTL)代码。高级合成工具可以自动地将高级描述转换为硬件设计，而硬件生成器则采用领域特定语言和针对特定应用程序的合成流。由于RTL的弱表现力和低抽象水平，这些工具的实现通常需要大量的工程工作。此外，不同的合成工具采用不同级别的中间表示(IR)和转换。通过探索不同的综合方法，统一的集成电路设计显然是降低工程成本，快速获得有竞争力的硬件设计的好方法。在本文中，我们提出了Hector，一个两级IR，为硬件综合方法提供了统一的中间表示。高级IR将计算与带有时序信息注释的控制图绑定在一起，而低级IR提供了一种简明的方式来描述硬件模块和它们之间的弹性互连。基于多级编译器基础结构(MLIR)， Hector的ir可以转换为可合成的RTL设计。为了展示其表现力和多功能性，我们基于Hector实现了三种合成方法:高级合成(HLS)工具、收缩阵列生成器和硬件加速器。Hector的HLS方法生成的硬件可以与最先进的HLS工具生成的硬件相媲美，另外两种情况在性能和生产力方面都优于HLS实现。

{"title":"HECTOR: A Multi-level Intermediate Representation for Hardware Synthesis Methodologies","authors":"Ruifan Xu, You-lin Xiao, Jin Luo, Yun Liang","doi":"10.1145/3508352.3549370","DOIUrl":"https://doi.org/10.1145/3508352.3549370","url":null,"abstract":"Hardware synthesis requires a complicated process to generate synthesizable register transfer level (RTL) code. High-level synthesis tools can automatically transform a high-level description into hardware design, while hardware generators adopt domain specific languages and synthesis flows for specific applications. The implementation of these tools generally requires substantial engineering efforts due to RTL’s weak expressivity and low level of abstraction. Furthermore, different synthesis tools adopt different levels of intermediate representations (IR) and transformations. A unified IR obviously is a good way to lower the engineering cost and get competitive hardware design rapidly by exploring different synthesis methodologies.In this paper, we propose Hector, a two-level IR providing a unified intermediate representation for hardware synthesis methodologies. The high-level IR binds computation with a control graph annotated with timing information, while the low-level IR provides a concise way to describe hardware modules and elastic interconnections among them. Implemented based on the multi-level compiler infrastructure (MLIR), Hector’s IRs can be converted to synthesizable RTL designs. To demonstrate the expressivity and versatility, we implement three synthesis approaches based on Hector: a high-level synthesis (HLS) tool, a systolic array generator, and a hardware accelerator. The hardware generated by Hector’s HLS approach is comparable to that generated by the state-of-the-art HLS tools, and the other two cases outperform HLS implementations in performance and productivity.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130238568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6