首页 > 最新文献

2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)最新文献

英文 中文
Deep Learning Toolkit-Accelerated Analytical Co-optimization of CNN Hardware and Dataflow 深度学习工具包-加速CNN硬件和数据流的分析协同优化
Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549402
Rongjian Liang, Jianfeng Song, Yuan Bo, Jiang Hu
The continuous growth of CNN complexity not only intensifies the need for hardware acceleration but also presents a huge challenge. That is, the solution space for CNN hardware design and dataflow mapping becomes enormously large besides the fact that it is discrete and lacks a well behaved structure. Most previous works either are stochastic metaheuristics, such as genetic algorithm, which are typically very slow for solving large problems, or rely on expensive sampling, e.g., Gumbel Softmax-based differentiable optimization and Bayesian optimization. We propose an analytical model for evaluating power and performance of CNN hardware design and dataflow solutions. Based on this model, we introduce a co-optimization method consisting of nonlinear programming and parallel local search. A key innovation in this model is its matrix form, which enables the use of deep learning toolkit for highly efficient computations of power/performance values and gradients in the optimization. In handling power-performance tradeoff, our method can lead to better solutions than minimizing a weighted sum of power and latency. The average relative error of our model compared with Timeloop is as small as 1%. Compared to state-of-the-art methods, our approach achieves solutions with up to 1.7 × shorter inference latency, 37.5% less power consumption, and 3 × less area on ResNet 18. Moreover, it provides a 6.2 × speedup of optimization runtime.
CNN复杂度的不断增长不仅加剧了对硬件加速的需求,也带来了巨大的挑战。也就是说,CNN的硬件设计和数据流映射的解空间变得非常大,而且它是离散的,缺乏良好的结构。大多数以前的工作要么是随机的元启发式,如遗传算法,这在解决大问题时通常非常缓慢,要么依赖于昂贵的采样,如基于Gumbel softmax的可微优化和贝叶斯优化。我们提出了一个分析模型来评估CNN硬件设计和数据流解决方案的功耗和性能。在此基础上,提出了一种由非线性规划和并行局部搜索组成的协同优化方法。该模型的一个关键创新是它的矩阵形式,它可以使用深度学习工具包在优化中高效地计算功率/性能值和梯度。在处理功率-性能权衡时,我们的方法可以产生比最小化功率和延迟加权总和更好的解决方案。与timelloop相比,我们的模型的平均相对误差小至1%。与最先进的方法相比,我们的方法实现的解决方案缩短了1.7倍的推理延迟,减少了37.5%的功耗,并且在ResNet 18上减少了3倍的面积。此外,它还提供了6.2倍的优化运行时加速。
{"title":"Deep Learning Toolkit-Accelerated Analytical Co-optimization of CNN Hardware and Dataflow","authors":"Rongjian Liang, Jianfeng Song, Yuan Bo, Jiang Hu","doi":"10.1145/3508352.3549402","DOIUrl":"https://doi.org/10.1145/3508352.3549402","url":null,"abstract":"The continuous growth of CNN complexity not only intensifies the need for hardware acceleration but also presents a huge challenge. That is, the solution space for CNN hardware design and dataflow mapping becomes enormously large besides the fact that it is discrete and lacks a well behaved structure. Most previous works either are stochastic metaheuristics, such as genetic algorithm, which are typically very slow for solving large problems, or rely on expensive sampling, e.g., Gumbel Softmax-based differentiable optimization and Bayesian optimization. We propose an analytical model for evaluating power and performance of CNN hardware design and dataflow solutions. Based on this model, we introduce a co-optimization method consisting of nonlinear programming and parallel local search. A key innovation in this model is its matrix form, which enables the use of deep learning toolkit for highly efficient computations of power/performance values and gradients in the optimization. In handling power-performance tradeoff, our method can lead to better solutions than minimizing a weighted sum of power and latency. The average relative error of our model compared with Timeloop is as small as 1%. Compared to state-of-the-art methods, our approach achieves solutions with up to 1.7 × shorter inference latency, 37.5% less power consumption, and 3 × less area on ResNet 18. Moreover, it provides a 6.2 × speedup of optimization runtime.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"58 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127237960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Hardware Architecture of Graph Neural Network-enabled Motion Planner (Invited Paper) 基于图神经网络的运动规划器硬件架构(特邀论文)
Pub Date : 2022-10-29 DOI: 10.1145/3508352.3561113
Lingyi Huang, Xiao Zang, Yu Gong, Bo Yuan
Motion planning aims to find a collision-free trajectory from the start to goal configurations of a robot. As a key cognition task for all the autonomous machines, motion planning is fundamentally required in various real-world robotic applications, such as 2-D/3-D autonomous navigation of unmanned mobile and aerial vehicles and high degree-of-freedom (DoF) autonomous manipulation of industry/medical robot arms and graspers.Motion planning can be performed using either non-learning- based classical algorithms or learning-based neural approaches. Most recently, the powerful capabilities of deep neural networks (DNNs) make neural planners become very attractive because of their superior planning performance over the classical methods. In particular, graph neural network (GNN)-enabled motion planner has demonstrated the state-of-the-art performance across a set of challenging high-dimensional planning tasks, motivating the efficient hardware acceleration to fully unleash its potential and promote its widespread deployment in practical applications.To that end, in this paper we perform preliminary study of the efficient accelerator design of the GNN-based neural planner, especially for the neural explorer as the key component of the entire planning pipeline. By performing in-depth analysis on the different design choices, we identify that the hybrid architecture, instead of the uniform sparse matrix multiplication (SpMM)-based solution that is popularly adopted in the existing GNN hardware, is more suitable for our target neural explorer. With a set of optimization on microarchitecture and dataflow, several design challenges incurred by using hybrid architecture, such as extensive memory access and imbalanced workload, can be efficiently mitigated. Evaluation results show that our proposed customized hardware architecture achieves order-of-magnitude performance improvement over the CPU/GPU-based implementation with respect to area and energy efficiency in various working environments.
运动规划的目的是寻找机器人从起点到目标构型的无碰撞轨迹。作为所有自主机器的关键认知任务,运动规划在各种现实世界的机器人应用中都是必不可少的,例如无人驾驶移动和飞行器的2d / 3d自主导航以及工业/医疗机器人手臂和抓取器的高自由度自主操作。运动规划可以使用非基于学习的经典算法或基于学习的神经方法来执行。近年来,深度神经网络(dnn)的强大功能使神经规划器因其优于经典方法的规划性能而变得非常有吸引力。特别是,基于图形神经网络(GNN)的运动规划器在一系列具有挑战性的高维规划任务中展示了最先进的性能,激发了高效的硬件加速,以充分释放其潜力,并促进其在实际应用中的广泛部署。为此,本文对基于gnn的神经规划器的高效加速器设计进行了初步研究,特别是对作为整个规划管道关键组成部分的神经探索者进行了研究。通过对不同设计选择的深入分析,我们发现混合架构,而不是现有GNN硬件中普遍采用的基于均匀稀疏矩阵乘法(SpMM)的解决方案,更适合我们的目标神经探测器。通过对微体系结构和数据流进行优化,可以有效地缓解混合体系结构带来的大量内存访问和工作负载不平衡等设计难题。评估结果表明,我们提出的定制硬件架构在各种工作环境下的面积和能源效率方面比基于CPU/ gpu的实现实现了数量级的性能改进。
{"title":"Hardware Architecture of Graph Neural Network-enabled Motion Planner (Invited Paper)","authors":"Lingyi Huang, Xiao Zang, Yu Gong, Bo Yuan","doi":"10.1145/3508352.3561113","DOIUrl":"https://doi.org/10.1145/3508352.3561113","url":null,"abstract":"Motion planning aims to find a collision-free trajectory from the start to goal configurations of a robot. As a key cognition task for all the autonomous machines, motion planning is fundamentally required in various real-world robotic applications, such as 2-D/3-D autonomous navigation of unmanned mobile and aerial vehicles and high degree-of-freedom (DoF) autonomous manipulation of industry/medical robot arms and graspers.Motion planning can be performed using either non-learning- based classical algorithms or learning-based neural approaches. Most recently, the powerful capabilities of deep neural networks (DNNs) make neural planners become very attractive because of their superior planning performance over the classical methods. In particular, graph neural network (GNN)-enabled motion planner has demonstrated the state-of-the-art performance across a set of challenging high-dimensional planning tasks, motivating the efficient hardware acceleration to fully unleash its potential and promote its widespread deployment in practical applications.To that end, in this paper we perform preliminary study of the efficient accelerator design of the GNN-based neural planner, especially for the neural explorer as the key component of the entire planning pipeline. By performing in-depth analysis on the different design choices, we identify that the hybrid architecture, instead of the uniform sparse matrix multiplication (SpMM)-based solution that is popularly adopted in the existing GNN hardware, is more suitable for our target neural explorer. With a set of optimization on microarchitecture and dataflow, several design challenges incurred by using hybrid architecture, such as extensive memory access and imbalanced workload, can be efficiently mitigated. Evaluation results show that our proposed customized hardware architecture achieves order-of-magnitude performance improvement over the CPU/GPU-based implementation with respect to area and energy efficiency in various working environments.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116647261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Language Equation Solving via Boolean Automata Manipulation 通过布尔自动机操作求解语言方程
Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549428
Wan-Hsuan Lin, Chia-Hsuan Su, J. H. Jiang
Language equations are a powerful tool for compositional synthesis, modeled as the unknown component problem. Given a (sequential) system specification S and a fixed component F, we are asked to synthesize an unknown component X such that whose composition with F fulfills S. The synthesis of X can be formulated with language equation solving. Although prior work exploits partitioned representation for effective finite automata manipulation, it remains challenging to solve language equations involving a large number of states. In this work, we propose variants of Boolean automata as the underlying succinct representation for regular languages. They admit logic circuit manipulation and extend the scalability for solving language equations. Experimental results demonstrate the superiority of our method to the state-of-the-art in solving nine more cases out of the 36 studied benchmarks and achieving an average of 740× speedup.
语言方程是组合综合的有力工具,它被建模为未知成分问题。给定一个(顺序的)系统规范S和一个固定的组件F,要求我们合成一个未知组件X,使其与F的组合满足S。X的合成可以用语言方程求解来表示。尽管先前的工作利用分区表示进行有效的有限自动机操作,但解决涉及大量状态的语言方程仍然具有挑战性。在这项工作中,我们提出了布尔自动机的变体作为正则语言的基础简洁表示。它们允许逻辑电路操作,并扩展了求解语言方程的可扩展性。实验结果表明,我们的方法在解决36个研究基准中的9个案例方面优于最先进的技术,并实现了平均740x的加速。
{"title":"Language Equation Solving via Boolean Automata Manipulation","authors":"Wan-Hsuan Lin, Chia-Hsuan Su, J. H. Jiang","doi":"10.1145/3508352.3549428","DOIUrl":"https://doi.org/10.1145/3508352.3549428","url":null,"abstract":"Language equations are a powerful tool for compositional synthesis, modeled as the unknown component problem. Given a (sequential) system specification S and a fixed component F, we are asked to synthesize an unknown component X such that whose composition with F fulfills S. The synthesis of X can be formulated with language equation solving. Although prior work exploits partitioned representation for effective finite automata manipulation, it remains challenging to solve language equations involving a large number of states. In this work, we propose variants of Boolean automata as the underlying succinct representation for regular languages. They admit logic circuit manipulation and extend the scalability for solving language equations. Experimental results demonstrate the superiority of our method to the state-of-the-art in solving nine more cases out of the 36 studied benchmarks and achieving an average of 740× speedup.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133487612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
2022 ICCAD CAD Contest Problem C: Microarchitecture Design Space Exploration 2022年ICCAD设计竞赛题目C:微建筑设计空间探索
Pub Date : 2022-10-29 DOI: 10.1145/3508352.3561109
Sicheng Li, Chen Bai, Xuechao Wei, Bizhao Shi, Yen-Kuang Chen, Yuan Xie
It is vital to select microarchitectures to achieve good trade-offs between performance, power, and area in the chip development cycle. Combining high-level hardware description languages and optimization of electronic design automation tools empowers microarchitecture exploration at the circuit level. Due to the extremely large design space and high runtime cost to evaluate a microarchitecture, ICCAD 2022 CAD Contest Problem C calls for an effective design space exploration algorithm to solve the problem. We formulate the research topic as a contest problem and provide benchmark suites, contest benchmark platforms, etc., for all contestants to innovate and estimate their algorithms.
在芯片开发周期中,选择微架构以实现性能、功耗和面积之间的良好权衡是至关重要的。结合高级硬件描述语言和优化电子设计自动化工具,可以在电路级进行微架构探索。由于评估微架构的设计空间非常大,运行时成本很高,ICCAD 2022 CAD竞赛问题C需要一种有效的设计空间探索算法来解决问题。我们将研究课题制定为竞赛问题,并提供基准套件、竞赛基准平台等,供所有参赛者创新和评估自己的算法。
{"title":"2022 ICCAD CAD Contest Problem C: Microarchitecture Design Space Exploration","authors":"Sicheng Li, Chen Bai, Xuechao Wei, Bizhao Shi, Yen-Kuang Chen, Yuan Xie","doi":"10.1145/3508352.3561109","DOIUrl":"https://doi.org/10.1145/3508352.3561109","url":null,"abstract":"It is vital to select microarchitectures to achieve good trade-offs between performance, power, and area in the chip development cycle. Combining high-level hardware description languages and optimization of electronic design automation tools empowers microarchitecture exploration at the circuit level. Due to the extremely large design space and high runtime cost to evaluate a microarchitecture, ICCAD 2022 CAD Contest Problem C calls for an effective design space exploration algorithm to solve the problem. We formulate the research topic as a contest problem and provide benchmark suites, contest benchmark platforms, etc., for all contestants to innovate and estimate their algorithms.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"14 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132119600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Stochastic Mixed-Signal Circuit Design for In-sensor Privacy : (Invited Paper) 传感器内隐私的随机混合信号电路设计(特邀论文)
Pub Date : 2022-10-29 DOI: 10.1145/3508352.3561099
N. Cao, Jianbo Liu, Boyang Cheng, Muya Chang
The ubiquitous data acquisition and extensive data exchange of sensors pose severe security and privacy concerns for the end-users and the public. To enable real-time protection of raw data, it is demanding to facilitate privacy-preserving algorithms at data generation, or in-sensory privacy. However, due to the severe sensor resource constraints and intensive computation/security cost, it remains an open question of how to enable data protection algorithms with efficient c ircuit techniques. To answer this question, this paper discusses the potential of a stochastic mixed-signal (SMS) circuit for ultra-low-power, small-foot-print data security. In particular, this paper discusses digitally-controlled-oscillators (DCO) and their advantages in (1) seamless analog interface, (2) stochastic computation efficiency, and (3) unified entropy generation over conventional digital circuit baselines. With DCO as an illustrative case, we target (1) SMS privacy-preserving architecture definition and systematic SMS analysis on its performance gains across various hardware/software configurations, and (2) revisit analog/mixed-signal voltage/transistor scaling in the context of entropy-based data protection.
传感器无处不在的数据采集和广泛的数据交换给最终用户和公众带来了严重的安全和隐私问题。为了实现对原始数据的实时保护,需要在数据生成或感知隐私时促进隐私保护算法。然而,由于严重的传感器资源限制和密集的计算/安全成本,如何使用高效的c电路技术实现数据保护算法仍然是一个悬而未决的问题。为了回答这个问题,本文讨论了随机混合信号(SMS)电路在超低功耗、小足迹数据安全方面的潜力。本文特别讨论了数字控制振荡器(DCO)及其在(1)无缝模拟接口,(2)随机计算效率和(3)与传统数字电路基线相比统一熵产生方面的优势。以DCO为例,我们的目标是(1)SMS隐私保护架构定义和系统的SMS分析其在各种硬件/软件配置下的性能增益,以及(2)在基于熵的数据保护背景下重新审视模拟/混合信号电压/晶体管缩放。
{"title":"Stochastic Mixed-Signal Circuit Design for In-sensor Privacy : (Invited Paper)","authors":"N. Cao, Jianbo Liu, Boyang Cheng, Muya Chang","doi":"10.1145/3508352.3561099","DOIUrl":"https://doi.org/10.1145/3508352.3561099","url":null,"abstract":"The ubiquitous data acquisition and extensive data exchange of sensors pose severe security and privacy concerns for the end-users and the public. To enable real-time protection of raw data, it is demanding to facilitate privacy-preserving algorithms at data generation, or in-sensory privacy. However, due to the severe sensor resource constraints and intensive computation/security cost, it remains an open question of how to enable data protection algorithms with efficient c ircuit techniques. To answer this question, this paper discusses the potential of a stochastic mixed-signal (SMS) circuit for ultra-low-power, small-foot-print data security. In particular, this paper discusses digitally-controlled-oscillators (DCO) and their advantages in (1) seamless analog interface, (2) stochastic computation efficiency, and (3) unified entropy generation over conventional digital circuit baselines. With DCO as an illustrative case, we target (1) SMS privacy-preserving architecture definition and systematic SMS analysis on its performance gains across various hardware/software configurations, and (2) revisit analog/mixed-signal voltage/transistor scaling in the context of entropy-based data protection.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128135193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Usage-Based RTL Subsetting for Hardware Accelerators 硬件加速器基于使用的RTL子集
Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549391
Qinhan Tan, Aarti Gupta, S. Malik
Recent years have witnessed increasing use of domain-specific accelerators in computing platforms to provide power-performance efficiency for emerging applications. To increase their applicability within the domain, these accelerators tend to support a large set of functions, e.g. Nvidia’s open-source Deep Learning Accelerator, NVDLA, supports five distinct groups of functions [17]. However, an individual use case of an accelerator may utilize only a subset of these functions. The unused functions lead to unnecessary overhead of silicon area, power, and hardware verification/hardware-software co-verification complexity. This motivates our research question: Given an RTL design for an accelerator and a subset of functions of interest, can we automatically extract a subset of the RTL that is sufficient for these functions and sequentially equivalent to the original RTL? We call this the Usage-based RTL Subsetting problem, referred to as the RTL subsetting problem in short. We first formally define this problem and show that it can be formulated as a program synthesis problem, which can be solved by performing expensive hyperproperty checks. To overcome the high cost, we propose multiple levels of sound over-approximations to construct an effective algorithm based on relatively less expensive temporal property checking and taint analysis for information flow checking. We demonstrate the acceptable computation cost and the quality of the results of our algorithm through several case studies of accelerators from different domains. The applicability of our proposed algorithm can be seen in its ability to subset the large NVDLA accelerator (with over 50,000 registers and 1,600,000 gates) for the group of convolution functions, where the subset reduces the total number of registers by 18.6% and the total number of gates by 37.1%.
近年来,在计算平台中越来越多地使用特定领域的加速器,为新兴应用程序提供功率性能效率。但是,加速器的单个用例可能只使用这些功能的一个子集。未使用的功能会导致不必要的硅面积、功率和硬件验证/硬件软件协同验证复杂性的开销。这激发了我们的研究问题:给定加速器的RTL设计和感兴趣的函数子集,我们能否自动提取RTL的一个子集,该子集足以满足这些函数,并且顺序等效于原始RTL?我们称之为基于使用的RTL子集问题,简称RTL子集问题。我们首先正式定义了这个问题,并表明它可以公式化为一个程序综合问题,该问题可以通过执行昂贵的超性质检查来解决。为了克服高成本的问题,我们提出了多级声音过近似,构建了一种基于相对便宜的时间属性检查和污染分析的有效算法来进行信息流检查。通过对不同领域加速器的几个案例研究,我们证明了我们的算法可以接受的计算成本和结果质量。我们提出的算法的适用性可以从它对卷积函数组的大型NVDLA加速器(超过50,000个寄存器和1,600,000个门)进行子集的能力中看出,其中子集将寄存器总数减少了18.6%,门总数减少了37.1%。
{"title":"Usage-Based RTL Subsetting for Hardware Accelerators","authors":"Qinhan Tan, Aarti Gupta, S. Malik","doi":"10.1145/3508352.3549391","DOIUrl":"https://doi.org/10.1145/3508352.3549391","url":null,"abstract":"Recent years have witnessed increasing use of domain-specific accelerators in computing platforms to provide power-performance efficiency for emerging applications. To increase their applicability within the domain, these accelerators tend to support a large set of functions, e.g. Nvidia’s open-source Deep Learning Accelerator, NVDLA, supports five distinct groups of functions [17]. However, an individual use case of an accelerator may utilize only a subset of these functions. The unused functions lead to unnecessary overhead of silicon area, power, and hardware verification/hardware-software co-verification complexity. This motivates our research question: Given an RTL design for an accelerator and a subset of functions of interest, can we automatically extract a subset of the RTL that is sufficient for these functions and sequentially equivalent to the original RTL? We call this the Usage-based RTL Subsetting problem, referred to as the RTL subsetting problem in short. We first formally define this problem and show that it can be formulated as a program synthesis problem, which can be solved by performing expensive hyperproperty checks. To overcome the high cost, we propose multiple levels of sound over-approximations to construct an effective algorithm based on relatively less expensive temporal property checking and taint analysis for information flow checking. We demonstrate the acceptable computation cost and the quality of the results of our algorithm through several case studies of accelerators from different domains. The applicability of our proposed algorithm can be seen in its ability to subset the large NVDLA accelerator (with over 50,000 registers and 1,600,000 gates) for the group of convolution functions, where the subset reduces the total number of registers by 18.6% and the total number of gates by 37.1%.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121842975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hardware IP Protection against Confidentiality Attacks and Evolving Role of CAD Tool (Invited Paper) 针对机密性攻击的硬件IP保护及CAD工具角色的演变(特邀论文)
Pub Date : 2022-10-29 DOI: 10.1145/3508352.3561103
S. Bhunia, Amitabh Das, Saverio Fazzari, V. Kammler, David Kehlet, J. Rajendran, Ankur Srivastava
With growing use of hardware intellectual property (IP) based integrated circuits (IC) design and increasing reliance on a globalized supply chain, the threats to confidentiality of hardware IPs have emerged as major security concerns to the IP producers and owners. These threats are diverse, including reverse engineering (RE), piracy, cloning, and extraction of design secrets, and span different phases of electronics life cycle. The academic research community and the semiconductor industry have made significant efforts over the past decade on developing effective methodologies and CAD tools targeted to protect hardware IPs against these threats. These solutions include watermarking, logic locking, obfuscation, camouflaging, split manufacturing, and hardware redaction. This paper focuses on key topics on confidentiality of hardware IPs encompassing the major threats, protection approaches, security analysis, and metrics. It discusses the strengths and limitations of the major solutions in protecting hardware IPs against the confidentiality attacks, and future directions to address the limitations in the modern supply chain ecosystem.
随着基于硬件知识产权(IP)的集成电路(IC)设计的使用越来越多,以及对全球化供应链的依赖越来越多,对硬件知识产权保密性的威胁已经成为知识产权生产者和所有者的主要安全问题。这些威胁是多种多样的,包括逆向工程(RE)、盗版、克隆和设计秘密的提取,并且跨越电子产品生命周期的不同阶段。在过去的十年中,学术研究界和半导体行业在开发有效的方法和CAD工具方面做出了重大努力,旨在保护硬件ip免受这些威胁。这些解决方案包括水印、逻辑锁定、混淆、伪装、拆分制造和硬件编校。本文重点讨论了硬件ip的机密性问题,包括主要威胁、保护方法、安全分析和度量。讨论了保护硬件ip免受机密性攻击的主要解决方案的优势和局限性,以及解决现代供应链生态系统限制的未来方向。
{"title":"Hardware IP Protection against Confidentiality Attacks and Evolving Role of CAD Tool (Invited Paper)","authors":"S. Bhunia, Amitabh Das, Saverio Fazzari, V. Kammler, David Kehlet, J. Rajendran, Ankur Srivastava","doi":"10.1145/3508352.3561103","DOIUrl":"https://doi.org/10.1145/3508352.3561103","url":null,"abstract":"With growing use of hardware intellectual property (IP) based integrated circuits (IC) design and increasing reliance on a globalized supply chain, the threats to confidentiality of hardware IPs have emerged as major security concerns to the IP producers and owners. These threats are diverse, including reverse engineering (RE), piracy, cloning, and extraction of design secrets, and span different phases of electronics life cycle. The academic research community and the semiconductor industry have made significant efforts over the past decade on developing effective methodologies and CAD tools targeted to protect hardware IPs against these threats. These solutions include watermarking, logic locking, obfuscation, camouflaging, split manufacturing, and hardware redaction. This paper focuses on key topics on confidentiality of hardware IPs encompassing the major threats, protection approaches, security analysis, and metrics. It discusses the strengths and limitations of the major solutions in protecting hardware IPs against the confidentiality attacks, and future directions to address the limitations in the modern supply chain ecosystem.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123199217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamic Frequency Boosting beyond Critical Path Delay 超过关键路径延迟的动态频率提升
Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549433
N. Zompakis, S. Xydis
This paper introduces an innovative post-implementation Dynamic Frequency Boosting (DFB) technique to release "hidden" performance margins of digital circuit designs currently suppressed by typical critical path constraint design flows, thus defining higher limits of operation speed. The proposed technique goes beyond state-of-the-art and exploits the data-driven path delay variability incorporating an innovative hardware clocking mechanism that detects in real-time the paths’ activation. In contrast to timing speculation, the operating speed is adjusted on the nominal path delay activation, succeeding an error-free acceleration. The proposed technique has been evaluated on three FPGA-based use cases carefully selected to exhibit differing domain characteristics, i.e i) a third party DNN inference accelerator IP for CIFAR-10 images achieving an average speedup of 18%, ii) a highly designer-optimized Optical Digital Equalizer design, in which DBF delivered a speedup of 50% and iii) a set of 5 synthetic designs examining high frequency (beyond 400 MHz) applications in FPGAs, achieving accelerations of 20-60% depending on the underlying path variability.
本文介绍了一种创新的实现后动态频率提升(DFB)技术,以释放目前被典型关键路径约束设计流程所抑制的数字电路设计的“隐藏”性能边际,从而定义更高的运行速度限制。所提出的技术超越了最先进的技术,利用数据驱动的路径延迟可变性,结合创新的硬件时钟机制,实时检测路径的激活。与时间推测相反,运行速度在名义路径延迟激活上进行调整,随后进行无误差加速。所提出的技术已经在三个基于fpga的用例中进行了评估,这些用例经过精心挑选,表现出不同的领域特征,即i)用于CIFAR-10图像的第三方DNN推理加速器IP实现了18%的平均加速,ii)高度优化的光学数字均衡器设计,其中DBF提供了50%的加速,iii)一组5合成设计检查fpga中的高频(超过400 MHz)应用。根据潜在的路径可变性,实现20-60%的加速度。
{"title":"Dynamic Frequency Boosting beyond Critical Path Delay","authors":"N. Zompakis, S. Xydis","doi":"10.1145/3508352.3549433","DOIUrl":"https://doi.org/10.1145/3508352.3549433","url":null,"abstract":"This paper introduces an innovative post-implementation Dynamic Frequency Boosting (DFB) technique to release \"hidden\" performance margins of digital circuit designs currently suppressed by typical critical path constraint design flows, thus defining higher limits of operation speed. The proposed technique goes beyond state-of-the-art and exploits the data-driven path delay variability incorporating an innovative hardware clocking mechanism that detects in real-time the paths’ activation. In contrast to timing speculation, the operating speed is adjusted on the nominal path delay activation, succeeding an error-free acceleration. The proposed technique has been evaluated on three FPGA-based use cases carefully selected to exhibit differing domain characteristics, i.e i) a third party DNN inference accelerator IP for CIFAR-10 images achieving an average speedup of 18%, ii) a highly designer-optimized Optical Digital Equalizer design, in which DBF delivered a speedup of 50% and iii) a set of 5 synthetic designs examining high frequency (beyond 400 MHz) applications in FPGAs, achieving accelerations of 20-60% depending on the underlying path variability.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131346586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GPU-Accelerated Rectilinear Steiner Tree Generation gpu加速直线斯坦纳树生成
Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549434
Zizheng Guo, Feng Gu, Yibo Lin
Rectilinear Steiner minimum tree (RSMT) generation is a fundamental component in the VLSI design automation flow. Due to its extensive usage in circuit design iterations at early design stages like synthesis, placement, and routing, the performance of RSMT generation is critical for a reasonable design turnaround time. State-of-the-art RSMT generation algorithms, like fast look-up table estimation (FLUTE), are constrained by CPU-based parallelism with limited runtime improvements. The acceleration of RSMT on GPUs is an important yet difficult task, due to the complex and non-trivial divide-and-conquer computation patterns with recursions. In this paper, we present the first GPU-accelerated RSMT generation algorithm based on FLUTE. By designing GPU-efficient data structures and levelized decomposition, table look-up, and merging operations, we incorporate large-scale data parallelism into the generation of Steiner trees. An up to 10.47× runtime speed-up has been achieved compared with FLUTE running on 40 CPU cores, filling in a critical missing component in today’s GPU-accelerated design automation framework.
线性斯坦纳最小树(RSMT)生成是VLSI设计自动化流程中的一个基本组成部分。由于RSMT在早期设计阶段(如合成、放置和路由)的电路设计迭代中广泛使用,因此RSMT生成的性能对于合理的设计周转时间至关重要。最先进的RSMT生成算法,如快速查找表估计(FLUTE),受到基于cpu的并行性和有限的运行时改进的限制。gpu上的RSMT加速是一项重要而又困难的任务,因为递归的分治计算模式非常复杂。在本文中,我们提出了第一个基于FLUTE的gpu加速RSMT生成算法。通过设计gpu高效的数据结构和分层分解、表查找和合并操作,我们将大规模数据并行性融入到斯坦纳树的生成中。与在40个CPU内核上运行的FLUTE相比,实现了高达10.47倍的运行速度提升,填补了当今gpu加速设计自动化框架中一个关键的缺失组件。
{"title":"GPU-Accelerated Rectilinear Steiner Tree Generation","authors":"Zizheng Guo, Feng Gu, Yibo Lin","doi":"10.1145/3508352.3549434","DOIUrl":"https://doi.org/10.1145/3508352.3549434","url":null,"abstract":"Rectilinear Steiner minimum tree (RSMT) generation is a fundamental component in the VLSI design automation flow. Due to its extensive usage in circuit design iterations at early design stages like synthesis, placement, and routing, the performance of RSMT generation is critical for a reasonable design turnaround time. State-of-the-art RSMT generation algorithms, like fast look-up table estimation (FLUTE), are constrained by CPU-based parallelism with limited runtime improvements. The acceleration of RSMT on GPUs is an important yet difficult task, due to the complex and non-trivial divide-and-conquer computation patterns with recursions. In this paper, we present the first GPU-accelerated RSMT generation algorithm based on FLUTE. By designing GPU-efficient data structures and levelized decomposition, table look-up, and merging operations, we incorporate large-scale data parallelism into the generation of Steiner trees. An up to 10.47× runtime speed-up has been achieved compared with FLUTE running on 40 CPU cores, filling in a critical missing component in today’s GPU-accelerated design automation framework.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120952125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Novel Blockage-avoiding Macro Placement Approach for 3D ICs based on POCS 一种基于POCS的三维集成电路免堵塞宏放置方法
Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549352
Jai-Ming Lin, Po-Chen Lu, Heng-Yu Lin, Jia-Ting Tsai
Although the 3D integrated circuit (IC) placement problem has been studied for many years, few publications devoted to the macro legalization. Due to large sizes of macros, the macro placement problem is harder than cell placement , especially when preplaced macros exist in a multi-tier structure. In order to have a more global view, this paper proposes the partitioning-last macro-first flow to handle 3D placement for mixed-size designs, which performs tier partitioning after placement prototyping and then legalizes macros before cell placement. A novel two-step approach is proposed to handle 3D macro placement. The first step determines locations of macros in a projection plane based on a new representation, named K-tier Partially Occupied Corner Stitching. It not only can keep the prototyping result but also guarantees a legal placement after tier assignment of macros. Next, macros are assigned to respective tiers by Integer Linear Programming (ILP) algorithm. Experimental results show that our design flow can obtain better solutions than other flows especially in the cases with more preplaced macros.
虽然对三维集成电路(IC)布局问题的研究已有多年,但很少有文章对其进行宏观合法化。由于宏的大小很大,宏的放置问题比单元格的放置更难,特别是当预先放置的宏存在于多层结构中时。为了有一个更全局的视角,本文提出了分区-最后-宏优先流程来处理混合尺寸设计的3D布局,该流程在布局原型之后进行分层划分,然后在单元放置之前对宏进行合法化。提出了一种新的两步法来处理三维宏放置问题。第一步基于一种新的表示确定投影平面中宏的位置,称为k层部分占用角拼接。它不仅可以保留原型结果,还可以保证宏在层分配后的合法位置。其次,通过整数线性规划(ILP)算法将宏分配到各自的层。实验结果表明,本文设计的流程能够较好地解决问题,特别是在预置宏较多的情况下。
{"title":"A Novel Blockage-avoiding Macro Placement Approach for 3D ICs based on POCS","authors":"Jai-Ming Lin, Po-Chen Lu, Heng-Yu Lin, Jia-Ting Tsai","doi":"10.1145/3508352.3549352","DOIUrl":"https://doi.org/10.1145/3508352.3549352","url":null,"abstract":"Although the 3D integrated circuit (IC) placement problem has been studied for many years, few publications devoted to the macro legalization. Due to large sizes of macros, the macro placement problem is harder than cell placement , especially when preplaced macros exist in a multi-tier structure. In order to have a more global view, this paper proposes the partitioning-last macro-first flow to handle 3D placement for mixed-size designs, which performs tier partitioning after placement prototyping and then legalizes macros before cell placement. A novel two-step approach is proposed to handle 3D macro placement. The first step determines locations of macros in a projection plane based on a new representation, named K-tier Partially Occupied Corner Stitching. It not only can keep the prototyping result but also guarantees a legal placement after tier assignment of macros. Next, macros are assigned to respective tiers by Integer Linear Programming (ILP) algorithm. Experimental results show that our design flow can obtain better solutions than other flows especially in the cases with more preplaced macros.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116468741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1