2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)最新文献_第5页

DEEP: Developing Extremely Efficient Runtime On-Chip Power Meters DEEP:开发极其高效的运行时芯片功率计

2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549427

Zhiyao Xie, Shiyu Li, Mingyuan Ma, Chen-Chia Chang, Jingyu Pan, Yiran Chen, Jiangkun Hu

Accurate and efficient on-chip power modeling is crucial to runtime power, energy, and voltage management. Such power monitoring can be achieved by designing and integrating on-chip power meters (OPMs) into the target design. In this work, we propose a new method named DEEP to automatically develop extremely efficient OPM solutions for a given design. DEEP selects OPM inputs from all individual bits in RTL signals. Such bit-level selection provides an unprecedentedly large number ofinput candidates and supports lower hardware cost, compared with signal-level selection in prior works. In addition, DEEP proposes a powerful two-step OPM input selection method, and it supports reporting both total power and the power of major design components. Experiments on a commercial microprocessor demonstrate that DEEP's OPM solution achieves correlation R > 0.97 in per-cycle power prediction with an unprecedented low area overhead on hardware, i.e., < 0.1% of the microprocessor layout. This reduces the OPM hardware cost by 4 – 6× compared with the state-of-the-art solution.

准确、高效的片上电源建模对于运行时电源、能量和电压管理至关重要。这种功率监测可以通过设计和集成片上功率计(opm)到目标设计中来实现。在这项工作中，我们提出了一种名为DEEP的新方法，可以为给定的设计自动开发极其高效的OPM解决方案。DEEP从RTL信号的所有单个比特中选择OPM输入。与之前的信号电平选择相比，这种位电平选择提供了前所未有的大量候选输入，并且支持更低的硬件成本。此外，DEEP还提出了一种功能强大的两步OPM输入选择方法，并支持报告总功率和主要设计组件的功率。在商用微处理器上的实验表明，DEEP的OPM解决方案在每周期功率预测中实现了相关R > 0.97，并且硬件面积开销前所未有的低，即微处理器布局的< 0.1%。与最先进的解决方案相比，这将OPM硬件成本降低了4 - 6倍。

引用次数: 5

Aging-Aware Training for Printed Neuromorphic Circuits 打印神经形态电路的衰老感知训练

2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549411

Hai-qiang Zhao, Michael Hefenbrock, M. Beigl, M. Tahoori

Printed electronics allow for ultra-low-cost circuit fabrication with unique properties such as flexibility, non-toxicity, and stretchability. Because of these advanced properties, there is a growing interest in adapting printed electronics for emerging areas such as fast-moving consumer goods and wearable technologies. In such domains, analog signal processing in or near the sensor is favorable. Printed neuromorphic circuits have been recently proposed as a solution to perform such analog processing natively. Additionally, their learning-based design process allows high eﬃciency of their optimization and enables them to mitigate the high process variations associated with low-cost printed processes. In this work, we address the aging of the printed components. This effect can significantly degrade the accuracy of printed neuromorphic circuits over time. For this, we develop a stochastic aging-model to describe the behavior of aged printed resistors and modify the training objective by considering the expected loss over the lifetime of the device. This approach ensures to provide acceptable accuracy over the device lifetime. Our experiments show that an overall 35.8% improvement in terms of expected accuracy over the device lifetime can be achieved using the proposed learning approach.

印刷电子产品允许超低成本的电路制造，具有独特的特性，如灵活性，无毒性和可拉伸性。由于这些先进的特性，人们对将印刷电子产品应用于快速消费品和可穿戴技术等新兴领域的兴趣越来越大。在这些领域中，传感器内部或附近的模拟信号处理是有利的。印刷神经形态电路最近被提出作为一种解决方案来执行这种模拟处理。此外，他们基于学习的设计过程允许他们的优化效率很高，并使他们能够减轻与低成本印刷工艺相关的高工艺变化。在这项工作中，我们解决了印刷部件的老化问题。随着时间的推移，这种效应会显著降低打印神经形态回路的准确性。为此，我们开发了一个随机老化模型来描述老化印刷电阻的行为，并通过考虑器件寿命期间的预期损耗来修改训练目标。这种方法确保在设备寿命期间提供可接受的精度。我们的实验表明，在设备使用寿命期间，使用所提出的学习方法可以实现35.8%的预期精度改进。

{"title":"Aging-Aware Training for Printed Neuromorphic Circuits","authors":"Hai-qiang Zhao, Michael Hefenbrock, M. Beigl, M. Tahoori","doi":"10.1145/3508352.3549411","DOIUrl":"https://doi.org/10.1145/3508352.3549411","url":null,"abstract":"Printed electronics allow for ultra-low-cost circuit fabrication with unique properties such as flexibility, non-toxicity, and stretchability. Because of these advanced properties, there is a growing interest in adapting printed electronics for emerging areas such as fast-moving consumer goods and wearable technologies. In such domains, analog signal processing in or near the sensor is favorable. Printed neuromorphic circuits have been recently proposed as a solution to perform such analog processing natively. Additionally, their learning-based design process allows high eﬃciency of their optimization and enables them to mitigate the high process variations associated with low-cost printed processes. In this work, we address the aging of the printed components. This effect can significantly degrade the accuracy of printed neuromorphic circuits over time. For this, we develop a stochastic aging-model to describe the behavior of aged printed resistors and modify the training objective by considering the expected loss over the lifetime of the device. This approach ensures to provide acceptable accuracy over the device lifetime. Our experiments show that an overall 35.8% improvement in terms of expected accuracy over the device lifetime can be achieved using the proposed learning approach.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129820862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

A Mixed Open-Source and Proprietary EDA Commons for Education and Prototyping : Invited Paper 用于教育和原型设计的混合开源和专有EDA共享:邀请论文

2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2022-10-29 DOI: 10.1145/3508352.3561378

A. Kahng

In recent years, several open-source projects have shown potential to serve a future technology commons for EDA and design prototyping. This paper examines how open-source and proprietary EDA technologies will inevitably take on complementary roles within a future technology commons. Proprietary EDA technologies offer numerous benefits that will endure, including (i) exceptional technology and engineering; (ii) ever-increasing importance in design-based equivalent scaling and the overall semiconductor value chain; and (iii) well-established commercial and partner relationships. On the other hand, proprietary EDA technologies face challenges that will also endure, including (i) inability to pursue directions such as massive leverage of cloud compute, extreme reduction of turnaround times, or "free tools"; and (ii) difficulty in evolving and addressing new applications and markets. By contrast, open-source EDA technologies offer benefits that include (i) the capability to serve as a friction-free, democratized platform for education and future workforce development (i.e., as a platform for EDA research, and as a means of teaching / training both designers and EDA developers with public code); and (ii) addressing the needs of underserved, non-enterprise account markets (e.g., older nodes, research flows, cost-sensitive IoT, new devices and integrations, system-design-technology pathfinding). This said, open-source will always face challenges such as sustainability, governance, and how to achieve critical mass and critical quality. The paper will conclude with key directions and synergies for open-source and proprietary EDA within an EDA Commons for education and prototyping.

近年来，一些开源项目已经显示出为EDA和设计原型提供未来技术共享的潜力。本文探讨了开源和专有EDA技术如何在未来的技术共享中不可避免地发挥互补作用。专有的EDA技术提供了许多将持续存在的优势，包括(i)卓越的技术和工程;(ii)在基于设计的等效缩放和整个半导体价值链中的重要性日益增加;(iii)建立良好的商业和合作伙伴关系。另一方面，专有EDA技术面临的挑战也将持续存在，包括(i)无法追求诸如大规模利用云计算、极度缩短周转时间或“免费工具”等方向;(ii)发展和应对新应用和新市场的困难。相比之下，开源EDA技术提供的好处包括:(i)作为教育和未来劳动力发展的无障碍、民主化平台的能力(即，作为EDA研究的平台，以及作为使用公共代码教学/培训设计人员和EDA开发人员的手段);(ii)满足服务不足的非企业客户市场的需求(例如，旧节点、研究流程、成本敏感的物联网、新设备和集成、系统设计技术寻路)。也就是说，开源将始终面临诸如可持续性、治理以及如何达到临界质量等挑战。本文将总结为教育和原型设计的EDA共享中的开源和专有EDA的关键方向和协同作用。

{"title":"A Mixed Open-Source and Proprietary EDA Commons for Education and Prototyping : Invited Paper","authors":"A. Kahng","doi":"10.1145/3508352.3561378","DOIUrl":"https://doi.org/10.1145/3508352.3561378","url":null,"abstract":"In recent years, several open-source projects have shown potential to serve a future technology commons for EDA and design prototyping. This paper examines how open-source and proprietary EDA technologies will inevitably take on complementary roles within a future technology commons. Proprietary EDA technologies offer numerous benefits that will endure, including (i) exceptional technology and engineering; (ii) ever-increasing importance in design-based equivalent scaling and the overall semiconductor value chain; and (iii) well-established commercial and partner relationships. On the other hand, proprietary EDA technologies face challenges that will also endure, including (i) inability to pursue directions such as massive leverage of cloud compute, extreme reduction of turnaround times, or \"free tools\"; and (ii) difficulty in evolving and addressing new applications and markets. By contrast, open-source EDA technologies offer benefits that include (i) the capability to serve as a friction-free, democratized platform for education and future workforce development (i.e., as a platform for EDA research, and as a means of teaching / training both designers and EDA developers with public code); and (ii) addressing the needs of underserved, non-enterprise account markets (e.g., older nodes, research flows, cost-sensitive IoT, new devices and integrations, system-design-technology pathfinding). This said, open-source will always face challenges such as sustainability, governance, and how to achieve critical mass and critical quality. The paper will conclude with key directions and synergies for open-source and proprietary EDA within an EDA Commons for education and prototyping.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133676126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Sparse-T: Hardware accelerator thread for unstructured sparse data processing sparse - t:用于非结构化稀疏数据处理的硬件加速线程

2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549441

Pranathi Vasireddy, K. Kavi, Gayatri Mehta

Sparse matrix-dense vector (SpMV) multiplication is inherent in most scientific, neural networks and machine learning algorithms. To efficiently exploit sparsity of data in SpMV computations, several compressed data representations have been used. However, compressed data representations of sparse data can result in overheads of locating nonzero values, requiring indirect memory accesses which increases instruction count and memory access delays. We call these translations of compressed representations as metadata processing. We propose a memory-side accelerator for metadata (or indexing) computations and supplying only the required nonzero values to the processor, additionally permitting an overlap of indexing with core computations on nonzero elements. In this contribution, we target our accelerator for low-end micro-controllers with very limited memory and processing capabilities. In this paper we will explore two dedicated ASIC designs of the proposed accelerator that handles the indexed memory accesses for compressed sparse row (CSR) format working alongside a simple RISC-like programmable core. One version of the accelerator supplies only vector values corresponding to nonzero matrix values and the second version supplies both nonzero matrix and matching vector values for SpMV computations. Our experiments show speedups ranging between 1.3 and 2.1 times for SpMV for different levels of sparsity. Our accelerator also results in energy savings ranging between 15.8% and 52.7% over different matrix sizes, when compared to the baseline system with primary RISC-V core performing all computations. We use smaller synthetic matrices with different sparsity levels and larger real-world matrices with higher sparsity (below 1% non-zeros) in our experimental evaluations.

稀疏矩阵密集向量(SpMV)乘法是大多数科学、神经网络和机器学习算法所固有的。为了有效地利用SpMV计算中数据的稀疏性，使用了几种压缩数据表示。然而，稀疏数据的压缩数据表示可能导致查找非零值的开销，需要间接内存访问，这增加了指令计数和内存访问延迟。我们把这些压缩表示的转换称为元数据处理。我们提出了一个用于元数据(或索引)计算的内存端加速器，并且只向处理器提供所需的非零值，另外还允许索引与非零元素的核心计算重叠。在这个贡献中，我们的目标是我们的加速器用于内存和处理能力非常有限的低端微控制器。在本文中，我们将探讨所提出的加速器的两个专用ASIC设计，它们处理压缩稀疏行(CSR)格式的索引内存访问，并与简单的risc类可编程内核一起工作。一个版本的加速器只提供与非零矩阵值相对应的向量值，第二个版本为SpMV计算提供非零矩阵和匹配的向量值。我们的实验表明，对于不同的稀疏度级别，SpMV的加速范围在1.3到2.1倍之间。与使用主RISC-V内核执行所有计算的基准系统相比，我们的加速器在不同矩阵大小的情况下还可以节省15.8%到52.7%的能源。在我们的实验评估中，我们使用了具有不同稀疏度级别的较小的合成矩阵和具有更高稀疏度(低于1%非零)的较大的真实矩阵。

{"title":"Sparse-T: Hardware accelerator thread for unstructured sparse data processing","authors":"Pranathi Vasireddy, K. Kavi, Gayatri Mehta","doi":"10.1145/3508352.3549441","DOIUrl":"https://doi.org/10.1145/3508352.3549441","url":null,"abstract":"Sparse matrix-dense vector (SpMV) multiplication is inherent in most scientific, neural networks and machine learning algorithms. To efficiently exploit sparsity of data in SpMV computations, several compressed data representations have been used. However, compressed data representations of sparse data can result in overheads of locating nonzero values, requiring indirect memory accesses which increases instruction count and memory access delays. We call these translations of compressed representations as metadata processing. We propose a memory-side accelerator for metadata (or indexing) computations and supplying only the required nonzero values to the processor, additionally permitting an overlap of indexing with core computations on nonzero elements. In this contribution, we target our accelerator for low-end micro-controllers with very limited memory and processing capabilities. In this paper we will explore two dedicated ASIC designs of the proposed accelerator that handles the indexed memory accesses for compressed sparse row (CSR) format working alongside a simple RISC-like programmable core. One version of the accelerator supplies only vector values corresponding to nonzero matrix values and the second version supplies both nonzero matrix and matching vector values for SpMV computations. Our experiments show speedups ranging between 1.3 and 2.1 times for SpMV for different levels of sparsity. Our accelerator also results in energy savings ranging between 15.8% and 52.7% over different matrix sizes, when compared to the baseline system with primary RISC-V core performing all computations. We use smaller synthetic matrices with different sparsity levels and larger real-world matrices with higher sparsity (below 1% non-zeros) in our experimental evaluations.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"121 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134161514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Multi-Package Co-Design for Chiplet Integration 芯片集成的多封装协同设计

2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549404

Zhen Zhuang, Bei Yu, Kai-Yuan Chao, Tsung-Yi Ho

Due to the cost and design complexity associated with advanced technology nodes, it is difficult for traditional monolithic System-on-Chip to follow the Moore’s Law, which means the economic benefits have been weakened. Semiconductor industries are looking for advanced packages to improve the economic advantages. Since the multi-chiplet architecture supporting heterogeneous integration has the robust re-usability and effective cost reduction, chiplet integration has become the mainstream of advanced packages. Nowadays, the number of mounted chiplets in a package is continuously increasing with the requirement of high system performance. However, the large area caused by the increasing of chiplets leads to the serious reliability issues, including warpage and bump stress, which worsens the yield and cost. The multi-package architecture, which can distribute chiplets to multiple packages and use less area of each package, is a popular alternative to enhance the reliability and reduce the cost in advanced packages. However, the primary challenge of the multi-package architecture lies in the tradeoff between the inter-package costs, i.e., the interconnection among packages, and the intra-package costs, i.e., the reliability caused by warpage and bump stress. Therefore, a co-design methodology is indispensable to optimize multiple packages simultaneously to improve the quality of the whole system. To tackle this challenge, we adopt mathematical programming methods in the multi-package co-design problem regarding the nature of the synergistic optimization of multiple packages. To the best of our knowledge, this is the first work to solve the multi-package co-design problem.

由于与先进技术节点相关的成本和设计复杂性，传统的单片系统芯片难以遵循摩尔定律，这意味着经济效益被削弱。半导体行业正在寻求先进的封装，以提高经济优势。由于支持异构集成的多芯片架构具有强大的可重用性和有效的成本降低，芯片集成已成为先进封装的主流。如今，随着对系统性能的要求越来越高，封装中的芯片数量也在不断增加。然而，由于小晶片的增加而导致的大面积导致了严重的可靠性问题，包括翘曲和碰撞应力，从而恶化了成品率和成本。多封装架构可以将小芯片分布到多个封装中，并且每个封装占用的面积更小，是高级封装中提高可靠性和降低成本的一种流行的替代方案。然而，多封装架构的主要挑战在于如何在封装间成本(即封装之间的互连)和封装内成本(即翘曲和碰撞应力引起的可靠性)之间进行权衡。因此，为了同时优化多个封装以提高整个系统的质量，协同设计方法是必不可少的。为了解决这一挑战，我们在多封装协同设计问题中采用了数学规划方法，考虑了多封装协同优化的本质。据我们所知，这是第一个解决多封装协同设计问题的工作。

{"title":"Multi-Package Co-Design for Chiplet Integration","authors":"Zhen Zhuang, Bei Yu, Kai-Yuan Chao, Tsung-Yi Ho","doi":"10.1145/3508352.3549404","DOIUrl":"https://doi.org/10.1145/3508352.3549404","url":null,"abstract":"Due to the cost and design complexity associated with advanced technology nodes, it is difficult for traditional monolithic System-on-Chip to follow the Moore’s Law, which means the economic benefits have been weakened. Semiconductor industries are looking for advanced packages to improve the economic advantages. Since the multi-chiplet architecture supporting heterogeneous integration has the robust re-usability and effective cost reduction, chiplet integration has become the mainstream of advanced packages. Nowadays, the number of mounted chiplets in a package is continuously increasing with the requirement of high system performance. However, the large area caused by the increasing of chiplets leads to the serious reliability issues, including warpage and bump stress, which worsens the yield and cost. The multi-package architecture, which can distribute chiplets to multiple packages and use less area of each package, is a popular alternative to enhance the reliability and reduce the cost in advanced packages. However, the primary challenge of the multi-package architecture lies in the tradeoff between the inter-package costs, i.e., the interconnection among packages, and the intra-package costs, i.e., the reliability caused by warpage and bump stress. Therefore, a co-design methodology is indispensable to optimize multiple packages simultaneously to improve the quality of the whole system. To tackle this challenge, we adopt mathematical programming methods in the multi-package co-design problem regarding the nature of the synergistic optimization of multiple packages. To the best of our knowledge, this is the first work to solve the multi-package co-design problem.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130543474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

HierPINN-EM: Fast Learning-Based Electromigration Analysis for Multi-Segment Interconnects Using Hierarchical Physics-informed Neural Network HierPINN-EM:基于快速学习的多段互连电迁移分析，使用分层物理信息神经网络

2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549371

Wentian Jin, Liang Chen, Subed Lamichhane, M. Kavousi, S. Tan

Electromigration (EM) becomes a major concern for VLSI circuits as the technology advances in the nanometer regime. The crux of problem is to solve the partial differential Korhonen equations, which remains challenging due to the increasing integrated density. Recently, scientific m achine l earning has been explored to solve partial differential equations ( PDE) due to breakthrough success in deep neural networks and existing approach such as physics-informed neural networks (PINN) shows promising results for some small PDE problems. However, for large engineering problems like EM analysis for large interconnect trees, it was shown that the plain PINN does not work well due the to large number of variables. In this work, we propose a novel hierarchical PINN approach, HierPINN-EM for fast EM induced stress analysis for multi-segment interconnects. Instead of solving the interconnect tree as a whole, we first solve EM problem for one wire segment under different boundary and geometrical parameters using supervised learning. Then we apply unsupervised PINN concept to solve the whole interconnects by enforcing the physics laws in the boundaries for all wire segments. In this way, HierPINN-EM can significantly reduce the number of variables at plain PINN solver. Numerical results on a number of synthetic interconnect trees show that HierPINN-EM can lead to orders of magnitude speedup in training and more than 79× better accuracy over the plain PINN method. Furthermore, HierPINN-EM yields 19% better accuracy with 99% reduction in training cost over recently proposed Graph Neural Network-based EM solver, EMGraph.

随着纳米技术的发展，电迁移(EM)成为VLSI电路的一个主要问题。问题的关键是求解偏微分Korhonen方程，由于积分密度的增加，这一问题仍然具有挑战性。近年来，由于深度神经网络的突破性成功，科学机器学习已经被用于解决偏微分方程(PDE)，现有的方法如物理信息神经网络(PINN)在一些小的偏微分方程问题上显示出有希望的结果。然而，对于大型工程问题，如大型互连树的EM分析，结果表明，由于变量太多，普通的PINN不能很好地工作。在这项工作中，我们提出了一种新的分层PINN方法，HierPINN-EM用于多段互连的快速电磁诱导应力分析。我们不是将互连树作为一个整体来求解，而是首先使用监督学习方法求解不同边界和几何参数下的单个线段的电磁问题。然后，我们应用无监督的PINN概念，通过在所有线段的边界上执行物理定律来解决整个互连。通过这种方式，HierPINN-EM可以显著减少普通PINN求解器的变量数量。在许多合成互连树上的数值结果表明，与普通的PINN方法相比，HierPINN-EM方法的训练速度提高了几个数量级，精度提高了79倍以上。此外，与最近提出的基于图神经网络的EM求解器EMGraph相比，HierPINN-EM的准确率提高了19%，训练成本降低了99%。

{"title":"HierPINN-EM: Fast Learning-Based Electromigration Analysis for Multi-Segment Interconnects Using Hierarchical Physics-informed Neural Network","authors":"Wentian Jin, Liang Chen, Subed Lamichhane, M. Kavousi, S. Tan","doi":"10.1145/3508352.3549371","DOIUrl":"https://doi.org/10.1145/3508352.3549371","url":null,"abstract":"Electromigration (EM) becomes a major concern for VLSI circuits as the technology advances in the nanometer regime. The crux of problem is to solve the partial differential Korhonen equations, which remains challenging due to the increasing integrated density. Recently, scientific m achine l earning has been explored to solve partial differential equations ( PDE) due to breakthrough success in deep neural networks and existing approach such as physics-informed neural networks (PINN) shows promising results for some small PDE problems. However, for large engineering problems like EM analysis for large interconnect trees, it was shown that the plain PINN does not work well due the to large number of variables. In this work, we propose a novel hierarchical PINN approach, HierPINN-EM for fast EM induced stress analysis for multi-segment interconnects. Instead of solving the interconnect tree as a whole, we first solve EM problem for one wire segment under different boundary and geometrical parameters using supervised learning. Then we apply unsupervised PINN concept to solve the whole interconnects by enforcing the physics laws in the boundaries for all wire segments. In this way, HierPINN-EM can significantly reduce the number of variables at plain PINN solver. Numerical results on a number of synthetic interconnect trees show that HierPINN-EM can lead to orders of magnitude speedup in training and more than 79× better accuracy over the plain PINN method. Furthermore, HierPINN-EM yields 19% better accuracy with 99% reduction in training cost over recently proposed Graph Neural Network-based EM solver, EMGraph.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128860760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Hardware Computation Graph for DNN Accelerator Design Automation without Inter-PU Templates 无pu间模板的DNN加速器设计自动化硬件计算图

2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549342

Jun Yu Li, Wei Wang, Wufeng Li

Existing deep neural network (DNN) accelerator design automation (ADA) methods adopt architecture templates to predetermine parts of design choices and then explore the left design choices beyond templates. These templates can be classified into intra-PU templates and inter-PU templates according to the architecture hierarchy. Since templates limit the flexibility of ADA, designing effective ADA methods without templates has become an important research topic. Although there have appeared some works to enhance the flexibility of ADA by removing intra-PU templates, to the best of our knowledge no existing works have studied ADA methods without inter-PU templates. ADA with predetermined inter-PU templates is typically inefficient in terms of resource utilization, especially for DNNs with complex topology. In this paper, we propose a novel method, called hardware computation graph (HCG), for ADA without inter-PU templates. Experiments show that HCG method can achieve competitive latency while using only 1.4x ~ 5x fewer on-chip memory, compared with existing state-of-the-art ADA methods.

现有的深度神经网络(DNN)加速器设计自动化(ADA)方法采用架构模板预先确定设计选择的部分，然后在模板之外探索剩余的设计选择。根据体系结构的不同，这些模板可以分为pu内模板和pu间模板。由于模板限制了ADA的灵活性，设计有效的无模板ADA方法已成为重要的研究课题。虽然已经出现了一些通过去除pu内模板来增强ADA灵活性的研究，但据我们所知，目前还没有研究没有pu间模板的ADA方法。具有预先确定的pu间模板的ADA在资源利用方面通常效率低下，特别是对于具有复杂拓扑结构的dnn。在本文中，我们提出了一种新的方法，称为硬件计算图(HCG)，为ADA没有内部pu模板。实验表明，与现有最先进的ADA方法相比，HCG方法可以在只使用1.4 ~ 5倍片上内存的情况下实现竞争性延迟。

引用次数: 0

Polynomial Formal Verification: Ensuring Correctness under Resource Constraints : (Invited Paper) 多项式形式验证:保证资源约束下的正确性:(特邀论文)

2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2022-10-29 DOI: 10.1145/3508352.3561104

R. Drechsler, Alireza Mahzoon

Recently, a lot of effort has been put into developing formal verification approaches by both academic and industrial research. In practice, these techniques often give satisfying results for some types of circuits, while they fail for others. A major challenge in this domain is that the verification techniques suffer from unpredictability in their performance. The only way to overcome this challenge is the calculation of bounds for the space and time complexities. If a verification method has polynomial space and time complexities, scalability can be guaranteed.In this tutorial paper, we review recent developments in formal verification techniques and give a comprehensive overview of Polynomial Formal Verification (PFV). In PFV, polynomial upper bounds for the run-time and memory needed during the entire verification task hold. Thus, correctness under resource constraints can be ensured. We discuss the importance and advantages of PFV in the design flow. Formal methods on the bit-level and the word-level, and their complexities when used to verify different types of circuits, like adders, multipliers, or ALUs are presented. The current status of this new research field and directions for future work are discussed.

最近，学术界和工业界的研究都投入了大量的精力来开发正式的验证方法。在实践中，这些技术通常对某些类型的电路给出令人满意的结果，而对其他类型的电路则失败。这个领域的一个主要挑战是验证技术在性能上受到不可预测性的影响。克服这一挑战的唯一方法是计算空间和时间复杂性的边界。如果验证方法具有多项式的空间复杂度和时间复杂度，则可以保证可扩展性。在这篇教程中，我们回顾了形式验证技术的最新发展，并对多项式形式验证(PFV)进行了全面的概述。在PFV中，整个验证任务期间所需的运行时间和内存的多项式上界。因此，可以保证资源约束下的正确性。讨论了PFV在设计流程中的重要性和优势。介绍了位级和字级的形式化方法，以及用于验证不同类型电路(如加法器、乘法器或alu)时的复杂性。讨论了这一新领域的研究现状和今后的工作方向。

{"title":"Polynomial Formal Verification: Ensuring Correctness under Resource Constraints : (Invited Paper)","authors":"R. Drechsler, Alireza Mahzoon","doi":"10.1145/3508352.3561104","DOIUrl":"https://doi.org/10.1145/3508352.3561104","url":null,"abstract":"Recently, a lot of effort has been put into developing formal verification approaches by both academic and industrial research. In practice, these techniques often give satisfying results for some types of circuits, while they fail for others. A major challenge in this domain is that the verification techniques suffer from unpredictability in their performance. The only way to overcome this challenge is the calculation of bounds for the space and time complexities. If a verification method has polynomial space and time complexities, scalability can be guaranteed.In this tutorial paper, we review recent developments in formal verification techniques and give a comprehensive overview of Polynomial Formal Verification (PFV). In PFV, polynomial upper bounds for the run-time and memory needed during the entire verification task hold. Thus, correctness under resource constraints can be ensured. We discuss the importance and advantages of PFV in the design flow. Formal methods on the bit-level and the word-level, and their complexities when used to verify different types of circuits, like adders, multipliers, or ALUs are presented. The current status of this new research field and directions for future work are discussed.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132658407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Reconfigurable Logic for Hardware IP Protection: Opportunities and Challenges (Invited Paper) 硬件知识产权保护的可重构逻辑:机遇与挑战(特邀论文)

2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2022-10-29 DOI: 10.1145/3508352.3561117

L. Collini, Benjamin Tan, C. Pilato, R. Karri

Protecting the intellectual property (IP) of integrated circuit (IC) design is becoming a significant concern of fab-less semiconductor design houses. Malicious actors can access the chip design at any stage, reverse engineer the functionality, and create illegal copies. On the one hand, defenders are crafting more and more solutions to hide the critical portions of the circuit. On the other hand, attackers are designing more and more powerful tools to extract useful information from the design and reverse engineer the functionality, especially when they can get access to working chips. In this context, the use of custom reconfigurable fabrics has recently been investigated for hardware IP protection. This paper will discuss recent trends in hardware obfuscation with embedded FP-GAs, focusing also on the open challenges that must be necessarily addressed for making this solution viable.

保护集成电路(IC)设计的知识产权(IP)正成为无晶圆厂半导体设计公司的一个重要关注点。恶意行为者可以在任何阶段访问芯片设计，对其功能进行逆向工程，并创建非法副本。一方面，防御者正在设计越来越多的解决方案来隐藏电路的关键部分。另一方面，攻击者正在设计越来越强大的工具来从设计中提取有用的信息并对功能进行反向工程，特别是当他们可以访问工作芯片时。在这种情况下，最近研究了使用自定义可重构结构来保护硬件IP。本文将讨论嵌入式FP-GAs在硬件混淆方面的最新趋势，并重点讨论为使该解决方案可行而必须解决的开放挑战。

引用次数: 1

RT-NeRF: Real-Time On-Device Neural Radiance Fields Towards Immersive AR/VR Rendering RT-NeRF:面向沉浸式AR/VR渲染的实时设备上神经辐射场

2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549380

Chaojian Li, Sixu Li, Yang Zhao, Wenbo Zhu, Yingyan Lin

Neural Radiance Field (NeRF) based rendering has attracted growing attention thanks to its state-of-the-art (SOTA) rendering quality and wide applications in Augmented and Virtual Reality (AR/VR). However, immersive real-time (> 30 FPS) NeRF based rendering enabled interactions are still limited due to the low achievable throughput on AR/VR devices. To this end, we first profile SOTA efficient NeRF algorithms on commercial devices and identify two primary causes of the aforementioned inefficiency: (1) the uniform point sampling and (2) the dense accesses and computations of the required embeddings in NeRF. Furthermore, we propose RT-NeRF, which to the best of our knowledge is the first algorithm-hardware co-design acceleration of NeRF. Specifically, on the algorithm level, RT-NeRF integrates an efficient rendering pipeline for largely alleviating the inefficiency due to the commonly adopted uniform point sampling method in NeRF by directly computing the geometry of pre-existing points. Additionally, RT-NeRF leverages a coarse-grained view-dependent computing ordering scheme for eliminating the (unnecessary) processing of invisible points. On the hardware level, our proposed RT-NeRF accelerator (1) adopts a hybrid encoding scheme to adaptively switch between a bitmap- or coordinate-based sparsity encoding format for NeRF’s sparse embeddings, aiming to maximize the storage savings and thus reduce the required DRAM accesses while supporting efficient NeRF decoding; and (2) integrates both a high-density sparse search unit and a dual-purpose bi-direction adder & search tree to coordinate the two aforementioned encoding formats. Extensive experiments on eight datasets consistently validate the effectiveness of RT-NeRF, achieving a large throughput improvement (e.g., 9.7×∼3,201×) while maintaining the rendering quality as compared with SOTA efficient NeRF solutions.

基于神经辐射场(NeRF)的渲染由于其最先进的(SOTA)渲染质量和在增强现实和虚拟现实(AR/VR)中的广泛应用而引起了越来越多的关注。然而，沉浸式实时(> 30 FPS)基于NeRF的渲染交互仍然受到限制，因为AR/VR设备的可实现吞吐量较低。为此，我们首先对商用设备上的SOTA高效NeRF算法进行了分析，并确定了上述低效率的两个主要原因:(1)均匀点采样和(2)NeRF中所需嵌入的密集访问和计算。此外，我们提出了RT-NeRF，据我们所知，这是NeRF的第一个算法-硬件协同设计加速。具体而言，在算法层面，RT-NeRF集成了高效的渲染流水线，通过直接计算已有点的几何形状，极大地缓解了NeRF中普遍采用的均匀点采样方法所带来的低效率。此外，RT-NeRF利用一种粗粒度的依赖于视图的计算排序方案来消除对不可见点的(不必要的)处理。在硬件层面，我们提出的RT-NeRF加速器(1)采用混合编码方案，在NeRF稀疏嵌入的位图或坐标稀疏编码格式之间自适应切换，旨在最大限度地节省存储，从而减少所需的DRAM访问，同时支持高效的NeRF解码;(2)集成了高密度稀疏搜索单元和双向加法器&搜索树来协调上述两种编码格式。在8个数据集上进行的大量实验一致地验证了RT-NeRF的有效性，与SOTA高效NeRF解决方案相比，在保持渲染质量的同时实现了巨大的吞吐量改进(例如9.7× ~ 3,201×)。

{"title":"RT-NeRF: Real-Time On-Device Neural Radiance Fields Towards Immersive AR/VR Rendering","authors":"Chaojian Li, Sixu Li, Yang Zhao, Wenbo Zhu, Yingyan Lin","doi":"10.1145/3508352.3549380","DOIUrl":"https://doi.org/10.1145/3508352.3549380","url":null,"abstract":"Neural Radiance Field (NeRF) based rendering has attracted growing attention thanks to its state-of-the-art (SOTA) rendering quality and wide applications in Augmented and Virtual Reality (AR/VR). However, immersive real-time (> 30 FPS) NeRF based rendering enabled interactions are still limited due to the low achievable throughput on AR/VR devices. To this end, we first profile SOTA efficient NeRF algorithms on commercial devices and identify two primary causes of the aforementioned inefficiency: (1) the uniform point sampling and (2) the dense accesses and computations of the required embeddings in NeRF. Furthermore, we propose RT-NeRF, which to the best of our knowledge is the first algorithm-hardware co-design acceleration of NeRF. Specifically, on the algorithm level, RT-NeRF integrates an efficient rendering pipeline for largely alleviating the inefficiency due to the commonly adopted uniform point sampling method in NeRF by directly computing the geometry of pre-existing points. Additionally, RT-NeRF leverages a coarse-grained view-dependent computing ordering scheme for eliminating the (unnecessary) processing of invisible points. On the hardware level, our proposed RT-NeRF accelerator (1) adopts a hybrid encoding scheme to adaptively switch between a bitmap- or coordinate-based sparsity encoding format for NeRF’s sparse embeddings, aiming to maximize the storage savings and thus reduce the required DRAM accesses while supporting efficient NeRF decoding; and (2) integrates both a high-density sparse search unit and a dual-purpose bi-direction adder & search tree to coordinate the two aforementioned encoding formats. Extensive experiments on eight datasets consistently validate the effectiveness of RT-NeRF, achieving a large throughput improvement (e.g., 9.7×∼3,201×) while maintaining the rendering quality as compared with SOTA efficient NeRF solutions.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114191296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12