首页 > 最新文献

2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)最新文献

英文 中文
Tunable Precision Control for Approximate Image Filtering in an In-Memory Architecture with Embedded Neurons 基于嵌入神经元的内存结构中近似图像滤波的可调精度控制
Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549385
Ayushi Dube, Ankit Wagle, G. Singh, S. Vrudhula
This paper presents a novel hardware-software co-design consisting of a Processing in-Memory (PiM) architecture with embedded neural processing elements (NPE) that are highly reconfigurable. The PiM platform and proposed approximation strategies are employed for various image filtering applications while providing the user with fine-grain dynamic control over energy efficiency, precision, and throughput (EPT). The proposed co-design can change the Peak Signal to Noise Ratio (PSNR, output quality metric for image filtering applications) from 25dB to 50dB (acceptable PSNR range for image filtering applications) without incurring any extra cost in terms of energy or latency. While switching from accurate to approximate mode of computation in the proposed co-design, the maximum improvement in energy efficiency and throughput is 2X. However, the gains in energy efficiency against a MAC-based PE array with the proposed memory platform are 3X-6X. The corresponding improvements in throughput are 2.26X-4.52X, respectively.
本文提出了一种新的硬件-软件协同设计方案,该方案由内存处理(PiM)体系结构和高度可重构的嵌入式神经处理元件(NPE)组成。PiM平台及其提出的近似策略可用于各种图像滤波应用,同时为用户提供对能效、精度和吞吐量(EPT)的细粒度动态控制。所提出的协同设计可以将峰值信噪比(PSNR,图像滤波应用的输出质量指标)从25dB更改为50dB(图像滤波应用可接受的PSNR范围),而不会在能量或延迟方面产生任何额外成本。当在提出的协同设计中从精确计算模式切换到近似计算模式时,能源效率和吞吐量的最大改进是2倍。然而,与基于mac的PE阵列相比,该存储平台的能效提高了3X-6X。相应的吞吐量提升分别为2.26X-4.52X。
{"title":"Tunable Precision Control for Approximate Image Filtering in an In-Memory Architecture with Embedded Neurons","authors":"Ayushi Dube, Ankit Wagle, G. Singh, S. Vrudhula","doi":"10.1145/3508352.3549385","DOIUrl":"https://doi.org/10.1145/3508352.3549385","url":null,"abstract":"This paper presents a novel hardware-software co-design consisting of a Processing in-Memory (PiM) architecture with embedded neural processing elements (NPE) that are highly reconfigurable. The PiM platform and proposed approximation strategies are employed for various image filtering applications while providing the user with fine-grain dynamic control over energy efficiency, precision, and throughput (EPT). The proposed co-design can change the Peak Signal to Noise Ratio (PSNR, output quality metric for image filtering applications) from 25dB to 50dB (acceptable PSNR range for image filtering applications) without incurring any extra cost in terms of energy or latency. While switching from accurate to approximate mode of computation in the proposed co-design, the maximum improvement in energy efficiency and throughput is 2X. However, the gains in energy efficiency against a MAC-based PE array with the proposed memory platform are 3X-6X. The corresponding improvements in throughput are 2.26X-4.52X, respectively.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127653229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards High-Quality CGRA Mapping with Graph Neural Networks and Reinforcement Learning 用图神经网络和强化学习实现高质量的CGRA映射
Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549458
Yan Zhuang, Zhihao Zhang, Dajiang Liu
Coarse-Grained Reconfigurable Architectures (CGRA) is a promising solution to accelerate domain applications due to its good combination of energy-efficiency and flexibility. Loops, as computation-intensive parts of applications, are often mapped onto CGRA and modulo scheduling is commonly used to improve the execution performance. However, the actual performance using modulo scheduling is highly dependent on the mapping ability of the Data Dependency Graph (DDG) extracted from a loop. As existing approaches usually separate routing exploration of multi-cycle dependence from mapping for fast compilation, they may easily suffer from poor mapping quality. In this paper, we integrate the routing explorations into the mapping process and make it have more opportunities to find a globally optimized solution. Meanwhile, with a reduced resource graph defined, the searching space of the new mapping problem is not greatly increased. To efficiently solve the problem, we introduce graph neural network based reinforcement learning to predict a placement distribution over different resource nodes for all operations in a DDG. Using the routing connectivity as the reward signal, we optimize the parameters of neural network to find a valid mapping solution with a policy gradient method. Without much engineering and heuristic designing, our approach achieves 1.57× mapping quality, as compared to the state-of-the-art heuristic.
粗粒度可重构体系结构(CGRA)具有良好的能效和灵活性,是一种很有前途的加速领域应用的解决方案。循环作为应用程序的计算密集型部分,通常映射到CGRA,模调度通常用于提高执行性能。然而,使用模调度的实际性能高度依赖于从循环中提取的数据依赖图(DDG)的映射能力。由于现有的方法通常将多循环依赖的路由探索与快速编译的映射分离开来,因此容易出现映射质量差的问题。在本文中,我们将路径探索融入到映射过程中,使其有更多的机会找到全局最优解。同时,通过定义一个简化的资源图,新映射问题的搜索空间并没有大大增加。为了有效地解决这个问题,我们引入了基于图神经网络的强化学习来预测DDG中所有操作在不同资源节点上的放置分布。以路由连通性作为奖励信号,利用策略梯度法对神经网络参数进行优化,找到有效的映射解。在没有太多工程和启发式设计的情况下,与最先进的启发式方法相比,我们的方法实现了1.57倍的映射质量。
{"title":"Towards High-Quality CGRA Mapping with Graph Neural Networks and Reinforcement Learning","authors":"Yan Zhuang, Zhihao Zhang, Dajiang Liu","doi":"10.1145/3508352.3549458","DOIUrl":"https://doi.org/10.1145/3508352.3549458","url":null,"abstract":"Coarse-Grained Reconfigurable Architectures (CGRA) is a promising solution to accelerate domain applications due to its good combination of energy-efficiency and flexibility. Loops, as computation-intensive parts of applications, are often mapped onto CGRA and modulo scheduling is commonly used to improve the execution performance. However, the actual performance using modulo scheduling is highly dependent on the mapping ability of the Data Dependency Graph (DDG) extracted from a loop. As existing approaches usually separate routing exploration of multi-cycle dependence from mapping for fast compilation, they may easily suffer from poor mapping quality. In this paper, we integrate the routing explorations into the mapping process and make it have more opportunities to find a globally optimized solution. Meanwhile, with a reduced resource graph defined, the searching space of the new mapping problem is not greatly increased. To efficiently solve the problem, we introduce graph neural network based reinforcement learning to predict a placement distribution over different resource nodes for all operations in a DDG. Using the routing connectivity as the reward signal, we optimize the parameters of neural network to find a valid mapping solution with a policy gradient method. Without much engineering and heuristic designing, our approach achieves 1.57× mapping quality, as compared to the state-of-the-art heuristic.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114394819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Automatic Test Configuration and Pattern Generation (ATCPG) for Neuromorphic Chips 神经形态芯片的自动测试配置和模式生成(ATCPG
Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549422
I. Chiu, Xin-Ping Chen, Jennifer Shueh-Inn Hu, C. Li
The demand for low-power, high-performance neuromorphic chips is increasing. However, conventional testing is not applicable to neuromorphic chips due to three reasons: (1) lack of scan DfT, (2) stochastic characteristic, and (3) configurable functionality. In this paper, we present an automatic test configuration and pattern generation (ATCPG) method for testing a configurable stochastic neuromorphic chip without using scan DfT. We use machine learning to generate test configurations. Then, we apply a modified fast gradient sign method to generate test patterns. Finally, we determine test repetitions with statistical power of test. We conduct experiments on one of the neuromorphic architectures, spiking neural network, to evaluate the effectiveness of our ATCPG. The experimental results show that our ATCPG can achieve 100% fault coverage for the five fault models we use. For testing a 3-layer model at 0.05 significant level, we produce 5 test configurations and 67 test patterns. The average test repetitions of neuron faults and synapse faults are 2,124 and 4,557, respectively. Besides, our simulation results show that the overkill matched our significance level perfectly.
对低功耗、高性能神经形态芯片的需求正在增加。然而,由于以下三个原因,常规测试并不适用于神经形态芯片:(1)缺乏扫描DfT,(2)随机特性,(3)可配置功能。在本文中,我们提出了一种自动测试配置和模式生成(ATCPG)方法来测试一个可配置的随机神经形态芯片,而不使用扫描DfT。我们使用机器学习来生成测试配置。然后,我们应用一种改进的快速梯度符号方法来生成测试模式。最后,我们用检验的统计能力来确定检验的重复次数。我们在其中一种神经形态架构——脉冲神经网络上进行了实验,以评估我们的ATCPG的有效性。实验结果表明,对于我们使用的5种故障模型,我们的ATCPG可以达到100%的故障覆盖率。为了在0.05显著水平上测试三层模型,我们产生了5个测试配置和67个测试模式。神经元故障和突触故障的平均测试次数分别为2124次和4557次。此外,我们的仿真结果表明,过度杀伤与我们的显著性水平完全匹配。
{"title":"Automatic Test Configuration and Pattern Generation (ATCPG) for Neuromorphic Chips","authors":"I. Chiu, Xin-Ping Chen, Jennifer Shueh-Inn Hu, C. Li","doi":"10.1145/3508352.3549422","DOIUrl":"https://doi.org/10.1145/3508352.3549422","url":null,"abstract":"The demand for low-power, high-performance neuromorphic chips is increasing. However, conventional testing is not applicable to neuromorphic chips due to three reasons: (1) lack of scan DfT, (2) stochastic characteristic, and (3) configurable functionality. In this paper, we present an automatic test configuration and pattern generation (ATCPG) method for testing a configurable stochastic neuromorphic chip without using scan DfT. We use machine learning to generate test configurations. Then, we apply a modified fast gradient sign method to generate test patterns. Finally, we determine test repetitions with statistical power of test. We conduct experiments on one of the neuromorphic architectures, spiking neural network, to evaluate the effectiveness of our ATCPG. The experimental results show that our ATCPG can achieve 100% fault coverage for the five fault models we use. For testing a 3-layer model at 0.05 significant level, we produce 5 test configurations and 67 test patterns. The average test repetitions of neuron faults and synapse faults are 2,124 and 4,557, respectively. Besides, our simulation results show that the overkill matched our significance level perfectly.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122121026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
2022 ICCAD CAD Contest Problem C: Microarchitecture Design Space Exploration 2022年ICCAD设计竞赛题目C:微建筑设计空间探索
Pub Date : 2022-10-29 DOI: 10.1145/3508352.3561109
Sicheng Li, Chen Bai, Xuechao Wei, Bizhao Shi, Yen-Kuang Chen, Yuan Xie
It is vital to select microarchitectures to achieve good trade-offs between performance, power, and area in the chip development cycle. Combining high-level hardware description languages and optimization of electronic design automation tools empowers microarchitecture exploration at the circuit level. Due to the extremely large design space and high runtime cost to evaluate a microarchitecture, ICCAD 2022 CAD Contest Problem C calls for an effective design space exploration algorithm to solve the problem. We formulate the research topic as a contest problem and provide benchmark suites, contest benchmark platforms, etc., for all contestants to innovate and estimate their algorithms.
在芯片开发周期中,选择微架构以实现性能、功耗和面积之间的良好权衡是至关重要的。结合高级硬件描述语言和优化电子设计自动化工具,可以在电路级进行微架构探索。由于评估微架构的设计空间非常大,运行时成本很高,ICCAD 2022 CAD竞赛问题C需要一种有效的设计空间探索算法来解决问题。我们将研究课题制定为竞赛问题,并提供基准套件、竞赛基准平台等,供所有参赛者创新和评估自己的算法。
{"title":"2022 ICCAD CAD Contest Problem C: Microarchitecture Design Space Exploration","authors":"Sicheng Li, Chen Bai, Xuechao Wei, Bizhao Shi, Yen-Kuang Chen, Yuan Xie","doi":"10.1145/3508352.3561109","DOIUrl":"https://doi.org/10.1145/3508352.3561109","url":null,"abstract":"It is vital to select microarchitectures to achieve good trade-offs between performance, power, and area in the chip development cycle. Combining high-level hardware description languages and optimization of electronic design automation tools empowers microarchitecture exploration at the circuit level. Due to the extremely large design space and high runtime cost to evaluate a microarchitecture, ICCAD 2022 CAD Contest Problem C calls for an effective design space exploration algorithm to solve the problem. We formulate the research topic as a contest problem and provide benchmark suites, contest benchmark platforms, etc., for all contestants to innovate and estimate their algorithms.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"14 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132119600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Language Equation Solving via Boolean Automata Manipulation 通过布尔自动机操作求解语言方程
Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549428
Wan-Hsuan Lin, Chia-Hsuan Su, J. H. Jiang
Language equations are a powerful tool for compositional synthesis, modeled as the unknown component problem. Given a (sequential) system specification S and a fixed component F, we are asked to synthesize an unknown component X such that whose composition with F fulfills S. The synthesis of X can be formulated with language equation solving. Although prior work exploits partitioned representation for effective finite automata manipulation, it remains challenging to solve language equations involving a large number of states. In this work, we propose variants of Boolean automata as the underlying succinct representation for regular languages. They admit logic circuit manipulation and extend the scalability for solving language equations. Experimental results demonstrate the superiority of our method to the state-of-the-art in solving nine more cases out of the 36 studied benchmarks and achieving an average of 740× speedup.
语言方程是组合综合的有力工具,它被建模为未知成分问题。给定一个(顺序的)系统规范S和一个固定的组件F,要求我们合成一个未知组件X,使其与F的组合满足S。X的合成可以用语言方程求解来表示。尽管先前的工作利用分区表示进行有效的有限自动机操作,但解决涉及大量状态的语言方程仍然具有挑战性。在这项工作中,我们提出了布尔自动机的变体作为正则语言的基础简洁表示。它们允许逻辑电路操作,并扩展了求解语言方程的可扩展性。实验结果表明,我们的方法在解决36个研究基准中的9个案例方面优于最先进的技术,并实现了平均740x的加速。
{"title":"Language Equation Solving via Boolean Automata Manipulation","authors":"Wan-Hsuan Lin, Chia-Hsuan Su, J. H. Jiang","doi":"10.1145/3508352.3549428","DOIUrl":"https://doi.org/10.1145/3508352.3549428","url":null,"abstract":"Language equations are a powerful tool for compositional synthesis, modeled as the unknown component problem. Given a (sequential) system specification S and a fixed component F, we are asked to synthesize an unknown component X such that whose composition with F fulfills S. The synthesis of X can be formulated with language equation solving. Although prior work exploits partitioned representation for effective finite automata manipulation, it remains challenging to solve language equations involving a large number of states. In this work, we propose variants of Boolean automata as the underlying succinct representation for regular languages. They admit logic circuit manipulation and extend the scalability for solving language equations. Experimental results demonstrate the superiority of our method to the state-of-the-art in solving nine more cases out of the 36 studied benchmarks and achieving an average of 740× speedup.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133487612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hardware Architecture of Graph Neural Network-enabled Motion Planner (Invited Paper) 基于图神经网络的运动规划器硬件架构(特邀论文)
Pub Date : 2022-10-29 DOI: 10.1145/3508352.3561113
Lingyi Huang, Xiao Zang, Yu Gong, Bo Yuan
Motion planning aims to find a collision-free trajectory from the start to goal configurations of a robot. As a key cognition task for all the autonomous machines, motion planning is fundamentally required in various real-world robotic applications, such as 2-D/3-D autonomous navigation of unmanned mobile and aerial vehicles and high degree-of-freedom (DoF) autonomous manipulation of industry/medical robot arms and graspers.Motion planning can be performed using either non-learning- based classical algorithms or learning-based neural approaches. Most recently, the powerful capabilities of deep neural networks (DNNs) make neural planners become very attractive because of their superior planning performance over the classical methods. In particular, graph neural network (GNN)-enabled motion planner has demonstrated the state-of-the-art performance across a set of challenging high-dimensional planning tasks, motivating the efficient hardware acceleration to fully unleash its potential and promote its widespread deployment in practical applications.To that end, in this paper we perform preliminary study of the efficient accelerator design of the GNN-based neural planner, especially for the neural explorer as the key component of the entire planning pipeline. By performing in-depth analysis on the different design choices, we identify that the hybrid architecture, instead of the uniform sparse matrix multiplication (SpMM)-based solution that is popularly adopted in the existing GNN hardware, is more suitable for our target neural explorer. With a set of optimization on microarchitecture and dataflow, several design challenges incurred by using hybrid architecture, such as extensive memory access and imbalanced workload, can be efficiently mitigated. Evaluation results show that our proposed customized hardware architecture achieves order-of-magnitude performance improvement over the CPU/GPU-based implementation with respect to area and energy efficiency in various working environments.
运动规划的目的是寻找机器人从起点到目标构型的无碰撞轨迹。作为所有自主机器的关键认知任务,运动规划在各种现实世界的机器人应用中都是必不可少的,例如无人驾驶移动和飞行器的2d / 3d自主导航以及工业/医疗机器人手臂和抓取器的高自由度自主操作。运动规划可以使用非基于学习的经典算法或基于学习的神经方法来执行。近年来,深度神经网络(dnn)的强大功能使神经规划器因其优于经典方法的规划性能而变得非常有吸引力。特别是,基于图形神经网络(GNN)的运动规划器在一系列具有挑战性的高维规划任务中展示了最先进的性能,激发了高效的硬件加速,以充分释放其潜力,并促进其在实际应用中的广泛部署。为此,本文对基于gnn的神经规划器的高效加速器设计进行了初步研究,特别是对作为整个规划管道关键组成部分的神经探索者进行了研究。通过对不同设计选择的深入分析,我们发现混合架构,而不是现有GNN硬件中普遍采用的基于均匀稀疏矩阵乘法(SpMM)的解决方案,更适合我们的目标神经探测器。通过对微体系结构和数据流进行优化,可以有效地缓解混合体系结构带来的大量内存访问和工作负载不平衡等设计难题。评估结果表明,我们提出的定制硬件架构在各种工作环境下的面积和能源效率方面比基于CPU/ gpu的实现实现了数量级的性能改进。
{"title":"Hardware Architecture of Graph Neural Network-enabled Motion Planner (Invited Paper)","authors":"Lingyi Huang, Xiao Zang, Yu Gong, Bo Yuan","doi":"10.1145/3508352.3561113","DOIUrl":"https://doi.org/10.1145/3508352.3561113","url":null,"abstract":"Motion planning aims to find a collision-free trajectory from the start to goal configurations of a robot. As a key cognition task for all the autonomous machines, motion planning is fundamentally required in various real-world robotic applications, such as 2-D/3-D autonomous navigation of unmanned mobile and aerial vehicles and high degree-of-freedom (DoF) autonomous manipulation of industry/medical robot arms and graspers.Motion planning can be performed using either non-learning- based classical algorithms or learning-based neural approaches. Most recently, the powerful capabilities of deep neural networks (DNNs) make neural planners become very attractive because of their superior planning performance over the classical methods. In particular, graph neural network (GNN)-enabled motion planner has demonstrated the state-of-the-art performance across a set of challenging high-dimensional planning tasks, motivating the efficient hardware acceleration to fully unleash its potential and promote its widespread deployment in practical applications.To that end, in this paper we perform preliminary study of the efficient accelerator design of the GNN-based neural planner, especially for the neural explorer as the key component of the entire planning pipeline. By performing in-depth analysis on the different design choices, we identify that the hybrid architecture, instead of the uniform sparse matrix multiplication (SpMM)-based solution that is popularly adopted in the existing GNN hardware, is more suitable for our target neural explorer. With a set of optimization on microarchitecture and dataflow, several design challenges incurred by using hybrid architecture, such as extensive memory access and imbalanced workload, can be efficiently mitigated. Evaluation results show that our proposed customized hardware architecture achieves order-of-magnitude performance improvement over the CPU/GPU-based implementation with respect to area and energy efficiency in various working environments.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116647261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Obstacle-Avoiding Multiple Redistribution Layer Routing with Irregular Structures* 不规则结构的多重分布层避障路由*
Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549419
Yen-Ting Chen, Yao-Wen Chang
In advanced packages, redistribution layers (RDLs) are extra metal layers for high interconnections among the chips and printed circuit board (PCB). To better utilize the routing resources of RDLs, published works adopted flexible vias such that they can place the vias everywhere. Furthermore, some regions may be blocked for signal integrity protection or manually prerouted nets (such as power/ground nets or feeding lines of antennas) to achieve higher performance. These blocked regions will be treated as obstacles in the routing process. Since the positions of pads, obstacles, and vias can be arbitrary, the structures of RDLs become irregular. The obstacles and irregular structures substantially increase the difficulty of the routing process. This paper proposes a three-stage algorithm: First, the layout is partitioned by a method based on constrained Delaunay triangulation (CDT). Then we present a global routing graph model and generate routing guides for unified-assignment netlists. Finally, a novel tile routing method is developed to obtain detailed routes. Experiment results demonstrate the robustness and effectiveness of our proposed algorithm.
在高级封装中,再分配层(rdl)是用于芯片和印刷电路板(PCB)之间高互连的额外金属层。为了更好地利用rdl的路由资源,已发表的作品采用了灵活的过孔,可以将过孔放置在任何地方。此外,为了信号完整性保护或手动预路由网(如电源/接地网或天线馈线),某些区域可能会被阻塞,以实现更高的性能。这些被阻塞的区域将被视为路由过程中的障碍。由于护垫、障碍物和过孔的位置可以是任意的,因此rdl的结构变得不规则。障碍物和不规则结构大大增加了布线过程的难度。本文提出了一种基于约束Delaunay三角剖分(CDT)的布局分割算法。然后给出了全局路由图模型,并生成了统一分配网络的路由指南。最后,提出了一种新的瓦片路由方法来获取详细的瓦片路由。实验结果证明了该算法的鲁棒性和有效性。
{"title":"Obstacle-Avoiding Multiple Redistribution Layer Routing with Irregular Structures*","authors":"Yen-Ting Chen, Yao-Wen Chang","doi":"10.1145/3508352.3549419","DOIUrl":"https://doi.org/10.1145/3508352.3549419","url":null,"abstract":"In advanced packages, redistribution layers (RDLs) are extra metal layers for high interconnections among the chips and printed circuit board (PCB). To better utilize the routing resources of RDLs, published works adopted flexible vias such that they can place the vias everywhere. Furthermore, some regions may be blocked for signal integrity protection or manually prerouted nets (such as power/ground nets or feeding lines of antennas) to achieve higher performance. These blocked regions will be treated as obstacles in the routing process. Since the positions of pads, obstacles, and vias can be arbitrary, the structures of RDLs become irregular. The obstacles and irregular structures substantially increase the difficulty of the routing process. This paper proposes a three-stage algorithm: First, the layout is partitioned by a method based on constrained Delaunay triangulation (CDT). Then we present a global routing graph model and generate routing guides for unified-assignment netlists. Finally, a novel tile routing method is developed to obtain detailed routes. Experiment results demonstrate the robustness and effectiveness of our proposed algorithm.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115295626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analyzing and Improving Resilience and Robustness of Autonomous Systems (Invited Paper) 分析和改进自治系统的弹性和鲁棒性(特邀论文)
Pub Date : 2022-10-29 DOI: 10.1145/3508352.3561111
Zishen Wan, Karthik Swaminathan, Pin-Yu Chen, Nandhini Chandramoorthy, A. Raychowdhury
Autonomous systems have reached a tipping point, with a myriad of self-driving cars, unmanned aerial vehicles (UAVs), and robots being widely applied and revolutionizing new applications. The continuous deployment of autonomous systems reveals the need for designs that facilitate increased resiliency and safety. The ability of an autonomous system to tolerate, or mitigate against errors, such as environmental conditions, sensor, hardware and software faults, and adversarial attacks, is essential to ensure its functional safety. Application-aware resilience metrics, holistic fault analysis frameworks, and lightweight fault mitigation techniques are being proposed for accurate and effective resilience and robustness assessment and improvement. This paper explores the origination of fault sources across the computing stack of autonomous systems, discusses the various fault impacts and fault mitigation techniques of different scales of autonomous systems, and concludes with challenges and opportunities for assessing and building next-generation resilient and robust autonomous systems.
自动驾驶系统已经达到了一个临界点,无数的自动驾驶汽车、无人驾驶飞行器(uav)和机器人被广泛应用,并带来了革命性的新应用。自动系统的不断部署表明,需要设计出能够提高弹性和安全性的设计。自主系统容忍或减轻诸如环境条件、传感器、硬件和软件故障以及对抗性攻击等错误的能力对于确保其功能安全至关重要。应用感知的弹性度量、整体故障分析框架和轻量级故障缓解技术被提出用于准确和有效的弹性和鲁棒性评估和改进。本文探讨了跨自治系统计算堆栈的故障源的起源,讨论了不同规模自治系统的各种故障影响和故障缓解技术,并总结了评估和构建下一代弹性和鲁棒自治系统的挑战和机遇。
{"title":"Analyzing and Improving Resilience and Robustness of Autonomous Systems (Invited Paper)","authors":"Zishen Wan, Karthik Swaminathan, Pin-Yu Chen, Nandhini Chandramoorthy, A. Raychowdhury","doi":"10.1145/3508352.3561111","DOIUrl":"https://doi.org/10.1145/3508352.3561111","url":null,"abstract":"Autonomous systems have reached a tipping point, with a myriad of self-driving cars, unmanned aerial vehicles (UAVs), and robots being widely applied and revolutionizing new applications. The continuous deployment of autonomous systems reveals the need for designs that facilitate increased resiliency and safety. The ability of an autonomous system to tolerate, or mitigate against errors, such as environmental conditions, sensor, hardware and software faults, and adversarial attacks, is essential to ensure its functional safety. Application-aware resilience metrics, holistic fault analysis frameworks, and lightweight fault mitigation techniques are being proposed for accurate and effective resilience and robustness assessment and improvement. This paper explores the origination of fault sources across the computing stack of autonomous systems, discusses the various fault impacts and fault mitigation techniques of different scales of autonomous systems, and concludes with challenges and opportunities for assessing and building next-generation resilient and robust autonomous systems.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130547189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Graph Neural Networks for Idling Error Mitigation 缓解空转错误的图神经网络
Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549444
Vedika Servanan, S. Saeed
Dynamical Decoupling (DD)-based protocols have been shown to reduce the idling errors encountered in quantum circuits. However, the current research in suppressing idling qubit errors suffers from scalability issues due to the large number of tuning quantum circuits that should be executed first to find the locations of the DD sequences in the target quantum circuit, which boost the output state fidelity. This process becomes tedious as the size of the quantum circuit increases. To address this challenge, we propose a Graph Neural Network (GNN) framework, which mitigates idling errors through an efficient insertion of DD sequences into quantum circuits by modeling their impact at different idle qubit windows. Our paper targets maximizing the benefit of DD sequences using a limited number of tuning circuits. We propose to classify the idle qubit windows into critical and non-critical (benign) windows using a data-driven reliability model. Our results obtained from IBM Lagos quantum computer show that our proposed GNN models, which determine the locations of DD sequences in the quantum circuits, significantly improve the output state fidelity by a factor of 1.4x on average and up to 2.6x compared to the adaptive DD approach, which searches for the best locations of DD sequences at run-time.
基于动态解耦(DD)的协议已被证明可以减少量子电路中遇到的空转错误。然而,目前抑制空转量子比特错误的研究存在可扩展性问题,因为需要首先执行大量的调谐量子电路来找到DD序列在目标量子电路中的位置,从而提高输出状态的保真度。随着量子电路尺寸的增加,这个过程变得单调乏味。为了解决这一挑战,我们提出了一个图神经网络(GNN)框架,该框架通过模拟DD序列在不同空闲量子位窗口的影响,通过将DD序列有效地插入量子电路来减轻空转错误。我们的论文的目标是使用有限数量的调谐电路最大化DD序列的好处。我们建议使用数据驱动的可靠性模型将空闲量子比特窗口分为关键和非关键(良性)窗口。我们在IBM Lagos量子计算机上获得的结果表明,与自适应DD方法(在运行时搜索DD序列的最佳位置)相比,我们提出的确定DD序列在量子电路中位置的GNN模型显著提高了输出状态保真度,平均提高了1.4倍,最高提高了2.6倍。
{"title":"Graph Neural Networks for Idling Error Mitigation","authors":"Vedika Servanan, S. Saeed","doi":"10.1145/3508352.3549444","DOIUrl":"https://doi.org/10.1145/3508352.3549444","url":null,"abstract":"Dynamical Decoupling (DD)-based protocols have been shown to reduce the idling errors encountered in quantum circuits. However, the current research in suppressing idling qubit errors suffers from scalability issues due to the large number of tuning quantum circuits that should be executed first to find the locations of the DD sequences in the target quantum circuit, which boost the output state fidelity. This process becomes tedious as the size of the quantum circuit increases. To address this challenge, we propose a Graph Neural Network (GNN) framework, which mitigates idling errors through an efficient insertion of DD sequences into quantum circuits by modeling their impact at different idle qubit windows. Our paper targets maximizing the benefit of DD sequences using a limited number of tuning circuits. We propose to classify the idle qubit windows into critical and non-critical (benign) windows using a data-driven reliability model. Our results obtained from IBM Lagos quantum computer show that our proposed GNN models, which determine the locations of DD sequences in the quantum circuits, significantly improve the output state fidelity by a factor of 1.4x on average and up to 2.6x compared to the adaptive DD approach, which searches for the best locations of DD sequences at run-time.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128868935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep Learning Toolkit-Accelerated Analytical Co-optimization of CNN Hardware and Dataflow 深度学习工具包-加速CNN硬件和数据流的分析协同优化
Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549402
Rongjian Liang, Jianfeng Song, Yuan Bo, Jiang Hu
The continuous growth of CNN complexity not only intensifies the need for hardware acceleration but also presents a huge challenge. That is, the solution space for CNN hardware design and dataflow mapping becomes enormously large besides the fact that it is discrete and lacks a well behaved structure. Most previous works either are stochastic metaheuristics, such as genetic algorithm, which are typically very slow for solving large problems, or rely on expensive sampling, e.g., Gumbel Softmax-based differentiable optimization and Bayesian optimization. We propose an analytical model for evaluating power and performance of CNN hardware design and dataflow solutions. Based on this model, we introduce a co-optimization method consisting of nonlinear programming and parallel local search. A key innovation in this model is its matrix form, which enables the use of deep learning toolkit for highly efficient computations of power/performance values and gradients in the optimization. In handling power-performance tradeoff, our method can lead to better solutions than minimizing a weighted sum of power and latency. The average relative error of our model compared with Timeloop is as small as 1%. Compared to state-of-the-art methods, our approach achieves solutions with up to 1.7 × shorter inference latency, 37.5% less power consumption, and 3 × less area on ResNet 18. Moreover, it provides a 6.2 × speedup of optimization runtime.
CNN复杂度的不断增长不仅加剧了对硬件加速的需求,也带来了巨大的挑战。也就是说,CNN的硬件设计和数据流映射的解空间变得非常大,而且它是离散的,缺乏良好的结构。大多数以前的工作要么是随机的元启发式,如遗传算法,这在解决大问题时通常非常缓慢,要么依赖于昂贵的采样,如基于Gumbel softmax的可微优化和贝叶斯优化。我们提出了一个分析模型来评估CNN硬件设计和数据流解决方案的功耗和性能。在此基础上,提出了一种由非线性规划和并行局部搜索组成的协同优化方法。该模型的一个关键创新是它的矩阵形式,它可以使用深度学习工具包在优化中高效地计算功率/性能值和梯度。在处理功率-性能权衡时,我们的方法可以产生比最小化功率和延迟加权总和更好的解决方案。与timelloop相比,我们的模型的平均相对误差小至1%。与最先进的方法相比,我们的方法实现的解决方案缩短了1.7倍的推理延迟,减少了37.5%的功耗,并且在ResNet 18上减少了3倍的面积。此外,它还提供了6.2倍的优化运行时加速。
{"title":"Deep Learning Toolkit-Accelerated Analytical Co-optimization of CNN Hardware and Dataflow","authors":"Rongjian Liang, Jianfeng Song, Yuan Bo, Jiang Hu","doi":"10.1145/3508352.3549402","DOIUrl":"https://doi.org/10.1145/3508352.3549402","url":null,"abstract":"The continuous growth of CNN complexity not only intensifies the need for hardware acceleration but also presents a huge challenge. That is, the solution space for CNN hardware design and dataflow mapping becomes enormously large besides the fact that it is discrete and lacks a well behaved structure. Most previous works either are stochastic metaheuristics, such as genetic algorithm, which are typically very slow for solving large problems, or rely on expensive sampling, e.g., Gumbel Softmax-based differentiable optimization and Bayesian optimization. We propose an analytical model for evaluating power and performance of CNN hardware design and dataflow solutions. Based on this model, we introduce a co-optimization method consisting of nonlinear programming and parallel local search. A key innovation in this model is its matrix form, which enables the use of deep learning toolkit for highly efficient computations of power/performance values and gradients in the optimization. In handling power-performance tradeoff, our method can lead to better solutions than minimizing a weighted sum of power and latency. The average relative error of our model compared with Timeloop is as small as 1%. Compared to state-of-the-art methods, our approach achieves solutions with up to 1.7 × shorter inference latency, 37.5% less power consumption, and 3 × less area on ResNet 18. Moreover, it provides a 6.2 × speedup of optimization runtime.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"58 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127237960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1