首页 > 最新文献

2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)最新文献

英文 中文
Accurate Processor-level Wirelength Distribution Model for Technology Pathfinding Using a Modernized Interpretation of Rent’s Rule 采用现代化的Rent’s规则解释的技术寻路的精确处理器级波长分布模型
Pub Date : 2018-06-24 DOI: 10.1145/3195970.3195980
D. Prasad, Saurabh Sinha, B. Cline, S. Moore, A. Naeemi
Faithful system-level modeling is vital to design and technology pathfinding, and requires accurate representation of interconnects. In this study, Rent’s rule is modernized to cater to advanced technology and design, and applied to derive a priori wirelength distribution models. Furthermore, a priori interconnect branching models are proposed to capture design constraints and their handling by the Electronic-Design-Automation tools. These interconnect branching models are embedded into the wire-length distribution models and validated against a suite of state-of-the-art commercial designs across technology nodes. Novel design-specific critical-path models are presented which capture trends in technology and microarchitecture, providing a reliable framework for future technology and design benchmarking.
忠实的系统级建模对于设计和技术寻路至关重要,并且需要准确地表示互连。在本研究中,为了适应先进的技术和设计,Rent’s rule被现代化,并应用于推导先验的无线分布模型。此外,提出了一个先验的互连分支模型来捕获设计约束及其由电子设计自动化工具处理。这些互连分支模型被嵌入到线长分布模型中,并根据一套最先进的跨技术节点的商业设计进行验证。提出了新的特定于设计的关键路径模型,这些模型捕捉了技术和微体系结构的趋势,为未来的技术和设计基准提供了可靠的框架。
{"title":"Accurate Processor-level Wirelength Distribution Model for Technology Pathfinding Using a Modernized Interpretation of Rent’s Rule","authors":"D. Prasad, Saurabh Sinha, B. Cline, S. Moore, A. Naeemi","doi":"10.1145/3195970.3195980","DOIUrl":"https://doi.org/10.1145/3195970.3195980","url":null,"abstract":"Faithful system-level modeling is vital to design and technology pathfinding, and requires accurate representation of interconnects. In this study, Rent’s rule is modernized to cater to advanced technology and design, and applied to derive a priori wirelength distribution models. Furthermore, a priori interconnect branching models are proposed to capture design constraints and their handling by the Electronic-Design-Automation tools. These interconnect branching models are embedded into the wire-length distribution models and validated against a suite of state-of-the-art commercial designs across technology nodes. Novel design-specific critical-path models are presented which capture trends in technology and microarchitecture, providing a reliable framework for future technology and design benchmarking.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"45 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83398478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Obstacle-Avoiding Open-Net Connector with Precise Shortest Distance Estimation* 避障开放网络连接器与精确的最短距离估计*
Pub Date : 2018-06-24 DOI: 10.1145/3195970.3196081
Guan-Qi Fang, Yong Zhong, Yi-Hao Cheng, Shao-Yun Fang
At the end of digital integrated circuit (IC) design flow, some nets may still be left open due to engineering change order (ECO). Resolving these opens could be quite challenging for some huge nets such as power ground nets because of a large number of obstacles and greatly distributed net components. Existing studies on multilayer obstacle-avoiding rectilinear Steiner trees may not be applicable to solve this problem because they assume the pins of an input net is a set of points, while the discrete net components in this problem can be regarded as a set of rectilinear pins. In this paper, we develop an efficient open-net connector that can deal with rectilinear pins. The proposed algorithm flow minimizes the total connection cost based on precise estimation of the shortest distance between each pair of rectilinear net components with the presence of complex obstacles. Experimental results show that the proposed flow can outperform the top three teams of 2017 CAD Contest at ICCAD in terms of total connection cost or runtime efficiency.
在数字集成电路(IC)设计流程的最后,由于工程变更命令(ECO),一些网络可能仍然是开放的。解决这些问题对于一些大型电网(如电力地网)来说是相当具有挑战性的,因为它们有大量的障碍,并且网络组件分布非常分散。现有的多层避障直线斯坦纳树研究可能不适用于解决该问题,因为它们假设输入网络的引脚是一组点,而该问题中的离散网络分量可以看作是一组直线引脚。在本文中,我们开发了一种高效的开放网络连接器,可以处理直线引脚。提出的算法流程通过精确估计存在复杂障碍物的每对直线网络组件之间的最短距离来最小化总连接成本。实验结果表明,所提出的流程在总连接成本或运行效率方面优于ICCAD 2017年CAD竞赛的前三名团队。
{"title":"Obstacle-Avoiding Open-Net Connector with Precise Shortest Distance Estimation*","authors":"Guan-Qi Fang, Yong Zhong, Yi-Hao Cheng, Shao-Yun Fang","doi":"10.1145/3195970.3196081","DOIUrl":"https://doi.org/10.1145/3195970.3196081","url":null,"abstract":"At the end of digital integrated circuit (IC) design flow, some nets may still be left open due to engineering change order (ECO). Resolving these opens could be quite challenging for some huge nets such as power ground nets because of a large number of obstacles and greatly distributed net components. Existing studies on multilayer obstacle-avoiding rectilinear Steiner trees may not be applicable to solve this problem because they assume the pins of an input net is a set of points, while the discrete net components in this problem can be regarded as a set of rectilinear pins. In this paper, we develop an efficient open-net connector that can deal with rectilinear pins. The proposed algorithm flow minimizes the total connection cost based on precise estimation of the shortest distance between each pair of rectilinear net components with the presence of complex obstacles. Experimental results show that the proposed flow can outperform the top three teams of 2017 CAD Contest at ICCAD in terms of total connection cost or runtime efficiency.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"6 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83739190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Modelling Multicore Contention on the AURIX™ TC27x 在AURIX™TC27x上建模多核争用
Pub Date : 2018-06-24 DOI: 10.1145/3195970.3196077
Enrique Díaz, E. Mezzetti, Leonidas Kosmidis, J. Abella, F. Cazorla
Multicores are becoming ubiquitous in automotive. Yet, the expected benefits on integration are challenged by multicore contention concerns on timing V&V. Worst-case execution time (WCET) estimates are required as early as possible in the software development, to enable prompt detection of timing misbehavior. Factoring in multicore contention necessarily builds on conservative assumptions on interference, independent of co-runners load on shared hardware. We propose a contention model for automotive multi-cores that balances time-composability with tightness by exploiting available information on contenders. We tailor the model to the AURIX TC27x and provide tight WCET estimates using information from performance monitors and software configurations.
多核在汽车领域正变得无处不在。然而,对集成的预期好处受到多核争用对定时V&V关注的挑战。在软件开发过程中,需要尽可能早地估计最坏情况执行时间(WCET),以便及时检测计时错误行为。考虑多核争用必须建立在对干扰的保守假设上,与共享硬件上的共同运行程序负载无关。我们提出了一种汽车多核争用模型,通过利用竞争者的可用信息来平衡时间可组合性和紧密性。我们根据AURIX TC27x定制模型,并使用来自性能监视器和软件配置的信息提供严格的WCET估计。
{"title":"Modelling Multicore Contention on the AURIX™ TC27x","authors":"Enrique Díaz, E. Mezzetti, Leonidas Kosmidis, J. Abella, F. Cazorla","doi":"10.1145/3195970.3196077","DOIUrl":"https://doi.org/10.1145/3195970.3196077","url":null,"abstract":"Multicores are becoming ubiquitous in automotive. Yet, the expected benefits on integration are challenged by multicore contention concerns on timing V&V. Worst-case execution time (WCET) estimates are required as early as possible in the software development, to enable prompt detection of timing misbehavior. Factoring in multicore contention necessarily builds on conservative assumptions on interference, independent of co-runners load on shared hardware. We propose a contention model for automotive multi-cores that balances time-composability with tightness by exploiting available information on contenders. We tailor the model to the AURIX TC27x and provide tight WCET estimates using information from performance monitors and software configurations.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"32 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73949854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Invited: Protecting the Supply Chain for Automotives and IoTs 特邀嘉宾:保护汽车和物联网供应链
Pub Date : 2018-06-24 DOI: 10.1145/3195970.3199851
S. Ray, Wen Chen, Rosario Cammarota
Modern automotive systems and IoT devices are designed through a highly complex, globalized, and potentially untrustworthy supply chain. Each player in this supply chain may (1) introduce sensitive information and data (collectively termed “assets”) that must be protected from other players in the supply chain, and (2) have controlled access to assets introduced by other players. Furthermore, some players in the supply chain may be malicious. It is imperative to protect the device and any sensitive assets in it from being compromised or unknowingly disclosed by such entities. A key – and sometimes overlooked – component of security architecture of modern electronic systems entails managing security in the face of supply chain challenges. In this paper we discuss some security challenges in automotive and IoT systems arising from supply chain complexity, and the state of the practice in this area.
现代汽车系统和物联网设备是通过高度复杂、全球化且可能不可信的供应链设计的。该供应链中的每个参与者可能(1)引入敏感信息和数据(统称为“资产”),必须对供应链中的其他参与者进行保护,并且(2)控制对其他参与者引入的资产的访问。此外,供应链中的一些参与者可能是恶意的。必须保护设备和其中的任何敏感资产免受此类实体的损害或在不知不觉中泄露。现代电子系统安全体系结构的一个关键(有时被忽视)组成部分需要在面对供应链挑战时管理安全。在本文中,我们讨论了由于供应链复杂性而引起的汽车和物联网系统中的一些安全挑战,以及该领域的实践状况。
{"title":"Invited: Protecting the Supply Chain for Automotives and IoTs","authors":"S. Ray, Wen Chen, Rosario Cammarota","doi":"10.1145/3195970.3199851","DOIUrl":"https://doi.org/10.1145/3195970.3199851","url":null,"abstract":"Modern automotive systems and IoT devices are designed through a highly complex, globalized, and potentially untrustworthy supply chain. Each player in this supply chain may (1) introduce sensitive information and data (collectively termed “assets”) that must be protected from other players in the supply chain, and (2) have controlled access to assets introduced by other players. Furthermore, some players in the supply chain may be malicious. It is imperative to protect the device and any sensitive assets in it from being compromised or unknowingly disclosed by such entities. A key – and sometimes overlooked – component of security architecture of modern electronic systems entails managing security in the face of supply chain challenges. In this paper we discuss some security challenges in automotive and IoT systems arising from supply chain complexity, and the state of the practice in this area.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"42 1","pages":"1-4"},"PeriodicalIF":0.0,"publicationDate":"2018-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85062264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Packet Pump: Overcoming Network Bottleneck in On-Chip Interconnects for GPGPUs* 包泵:克服gpgpu片上互连中的网络瓶颈*
Pub Date : 2018-06-24 DOI: 10.1145/3195970.3196087
Xianwei Cheng, Yang Zhao, Hui Zhao, Yuan Xie
In order to fully exploit GPGPU's parallel processing power, on-chip interconnects need to provide bandwidth efficient data communication. GPGPUs exhibit a many-to-few-to-many traffic pattern which makes the memory controller connected routers the network bottleneck. Inefficient design of conventional routers causes long queues of packets blocked at memory controllers and thus greatly constrained the network bandwidth. In this work, we employ heterogeneous design techniques and propose a novel decoupled architecture for routers connected with memory controllers. To further improve performance, we propose techniques called Injection Virtual Circuit and Memory-aware Adaptive Routing. We show that our scheme can effectively eliminate NoC bottleneck and improve performance by 78% on average.
为了充分利用GPGPU的并行处理能力,片上互连需要提供带宽高效的数据通信。gpgpu表现出多对少对多的流量模式,使得内存控制器连接的路由器成为网络瓶颈。传统路由器的低效设计导致数据包长队列阻塞在内存控制器上,从而极大地限制了网络带宽。在这项工作中,我们采用异构设计技术,并提出了一种新的与内存控制器连接的路由器解耦架构。为了进一步提高性能,我们提出了注入虚拟电路和内存感知自适应路由技术。结果表明,该方案可以有效地消除NoC瓶颈,性能平均提高78%。
{"title":"Packet Pump: Overcoming Network Bottleneck in On-Chip Interconnects for GPGPUs*","authors":"Xianwei Cheng, Yang Zhao, Hui Zhao, Yuan Xie","doi":"10.1145/3195970.3196087","DOIUrl":"https://doi.org/10.1145/3195970.3196087","url":null,"abstract":"In order to fully exploit GPGPU's parallel processing power, on-chip interconnects need to provide bandwidth efficient data communication. GPGPUs exhibit a many-to-few-to-many traffic pattern which makes the memory controller connected routers the network bottleneck. Inefficient design of conventional routers causes long queues of packets blocked at memory controllers and thus greatly constrained the network bandwidth. In this work, we employ heterogeneous design techniques and propose a novel decoupled architecture for routers connected with memory controllers. To further improve performance, we propose techniques called Injection Virtual Circuit and Memory-aware Adaptive Routing. We show that our scheme can effectively eliminate NoC bottleneck and improve performance by 78% on average.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"14 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83909662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Sign-Magnitude SC: Getting 10X Accuracy for Free in Stochastic Computing for Deep Neural Networks* Sign-Magnitude SC:在深度神经网络随机计算中获得10倍的精度*
Pub Date : 2018-06-24 DOI: 10.1145/3195970.3196113
Aidyn Zhakatayev, Sugil Lee, H. Sim, Jongeun Lee
Stochastic computing (SC) is a promising computing paradigm for applications with low precision requirement, stringent cost and power restriction. One known problem with SC, however, is the low accuracy especially with multiplication. In this paper we propose a simple, yet very effective solution to the low-accuracy SC-multiplication problem, which is critical in many applications such as deep neural networks (DNNs). Our solution is based on an old concept of sign-magnitude, which, when applied to SC, has unique advantages. Our experimental results using multiple DNN applications demonstrate that our technique can improve the efficiency of SC-based DNNs by about 32X in terms of latency over using bipolar SC, with very little area overhead (about 1%).
对于精度要求低、成本和功耗限制严格的应用,随机计算是一种很有前途的计算范式。然而,SC的一个已知问题是精度低,尤其是乘法。在本文中,我们提出了一个简单但非常有效的解决低精度sc乘法问题的方法,这在许多应用中是至关重要的,如深度神经网络(dnn)。我们的解决方案基于一个古老的符号幅度概念,当应用于SC时,它具有独特的优势。我们使用多个DNN应用的实验结果表明,我们的技术可以将基于SC的DNN的效率提高约32倍,就延迟而言,使用双极SC,并且面积开销很小(约1%)。
{"title":"Sign-Magnitude SC: Getting 10X Accuracy for Free in Stochastic Computing for Deep Neural Networks*","authors":"Aidyn Zhakatayev, Sugil Lee, H. Sim, Jongeun Lee","doi":"10.1145/3195970.3196113","DOIUrl":"https://doi.org/10.1145/3195970.3196113","url":null,"abstract":"Stochastic computing (SC) is a promising computing paradigm for applications with low precision requirement, stringent cost and power restriction. One known problem with SC, however, is the low accuracy especially with multiplication. In this paper we propose a simple, yet very effective solution to the low-accuracy SC-multiplication problem, which is critical in many applications such as deep neural networks (DNNs). Our solution is based on an old concept of sign-magnitude, which, when applied to SC, has unique advantages. Our experimental results using multiple DNN applications demonstrate that our technique can improve the efficiency of SC-based DNNs by about 32X in terms of latency over using bipolar SC, with very little area overhead (about 1%).","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"68 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74323743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
GAN-OPC: Mask Optimization with Lithography-guided Generative Adversarial Nets GAN-OPC:光刻引导生成对抗网络的掩模优化
Pub Date : 2018-06-01 DOI: 10.1145/3195970.3196056
Haoyu Yang, Shuhe Li, Yuzhe Ma, Bei Yu, Evangeline F. Y. Young
Mask optimization has been a critical problem in the VLSI design flow due to the mismatch between the lithography system and the continuously shrinking feature sizes. Optical proximity correction (OPC) is one of the prevailing resolution enhancement techniques (RETs) that can significantly improve mask printability. However, in advanced technology nodes, the mask optimization process consumes more and more computational resources. In this paper, we develop a generative adversarial network (GAN) model to achieve better mask optimization performance. We first develop an OPC-oriented GAN flow that can learn target-mask mapping from the improved architecture and objectives, which leads to satisfactory mask optimization results. To facilitate the training process and ensure better convergence, we also propose a pre-training procedure that jointly trains the neural network with inverse lithography technique (ILT). At convergence, the generative network is able to create quasi-optimal masks for given target circuit patterns and fewer normal OPC steps are required to generate high quality masks. Experimental results show that our flow can facilitate the mask optimization process as well as ensure a better printability.
由于光刻系统与不断缩小的特征尺寸之间的不匹配,掩模优化一直是VLSI设计流程中的关键问题。光学接近校正(OPC)是目前流行的分辨率增强技术(ret)之一,可以显著提高掩模的可打印性。然而,在先进的技术节点上,掩码优化过程消耗的计算资源越来越多。在本文中,我们开发了一个生成对抗网络(GAN)模型来实现更好的掩模优化性能。我们首先开发了一个面向opc的GAN流,它可以从改进的架构和目标中学习目标-掩码映射,从而获得令人满意的掩码优化结果。为了简化训练过程并确保更好的收敛性,我们还提出了一种与逆光刻技术(ILT)联合训练神经网络的预训练过程。在收敛时,生成网络能够为给定的目标电路模式创建准最优掩模,并且生成高质量掩模所需的常规OPC步骤更少。实验结果表明,该流程可以简化掩模优化过程,并保证较好的印刷适性。
{"title":"GAN-OPC: Mask Optimization with Lithography-guided Generative Adversarial Nets","authors":"Haoyu Yang, Shuhe Li, Yuzhe Ma, Bei Yu, Evangeline F. Y. Young","doi":"10.1145/3195970.3196056","DOIUrl":"https://doi.org/10.1145/3195970.3196056","url":null,"abstract":"Mask optimization has been a critical problem in the VLSI design flow due to the mismatch between the lithography system and the continuously shrinking feature sizes. Optical proximity correction (OPC) is one of the prevailing resolution enhancement techniques (RETs) that can significantly improve mask printability. However, in advanced technology nodes, the mask optimization process consumes more and more computational resources. In this paper, we develop a generative adversarial network (GAN) model to achieve better mask optimization performance. We first develop an OPC-oriented GAN flow that can learn target-mask mapping from the improved architecture and objectives, which leads to satisfactory mask optimization results. To facilitate the training process and ensure better convergence, we also propose a pre-training procedure that jointly trains the neural network with inverse lithography technique (ILT). At convergence, the generative network is able to create quasi-optimal masks for given target circuit patterns and fewer normal OPC steps are required to generate high quality masks. Experimental results show that our flow can facilitate the mask optimization process as well as ensure a better printability.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"58 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73864129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 104
RAMP: Resource-Aware Mapping for CGRAs RAMP:用于CGRAs的资源感知映射
Pub Date : 2018-06-01 DOI: 10.1145/3195970.3196101
Shail Dave, M. Balasubramanian, Aviral Shrivastava
Coarse-grained reconfigurable array (CGRA) is a promising solution that can accelerate even non-parallel loops. Acceleration achieved through CGRAs critically depends on the goodness of mapping (of loop operations onto the PEs of CGRA), and in particular, the compiler’s ability to route the dependencies among operations. Previous works have explored several mechanisms to route data dependencies, including, routing through other PEs, registers, memory, and even re-computation. All these routing options change the graph to be mapped onto PEs (often by adding new operations), and without re-scheduling, it may be impossible to map the new graph. However, existing techniques explore these routing options inside the Place and Route (P&R) phase of the compilation process, which is performed after the scheduling step. As a result, they either may not achieve the mapping or obtain poor results. Our method RAMP, explicitly and intelligently explores the various routing options, before the scheduling step, and makes improve the mapping-ability and mapping quality. Evaluating top performance-critical loops of MiBench benchmarks over 12 architectural configurations, we find that RAMP is able to accelerate loops by 23× over sequential execution, achieving a geomean speedup of 2.13× over state-of-the-art.
粗粒度可重构阵列(CGRA)是一种很有前途的解决方案,可以加速非并行循环。通过CGRAs实现的加速主要取决于映射的好坏(循环操作到CGRA的pe),特别是编译器路由操作之间依赖关系的能力。以前的工作已经探索了几种路由数据依赖性的机制,包括通过其他pe、寄存器、内存甚至重新计算进行路由。所有这些路由选项都会更改要映射到pe上的图(通常通过添加新操作),如果不重新调度,可能无法映射新图。然而,现有技术在编译过程的Place and Route (P&R)阶段探索这些路由选项,该阶段在调度步骤之后执行。因此,它们可能无法实现映射或获得较差的结果。本文提出的RAMP方法在调度步骤之前,明确而智能地探索各种路由选择,提高了映射能力和映射质量。在12种架构配置中评估MiBench基准测试的顶级性能关键循环,我们发现RAMP能够比顺序执行加速23倍,比最先进的实现2.13倍的几何加速。
{"title":"RAMP: Resource-Aware Mapping for CGRAs","authors":"Shail Dave, M. Balasubramanian, Aviral Shrivastava","doi":"10.1145/3195970.3196101","DOIUrl":"https://doi.org/10.1145/3195970.3196101","url":null,"abstract":"Coarse-grained reconfigurable array (CGRA) is a promising solution that can accelerate even non-parallel loops. Acceleration achieved through CGRAs critically depends on the goodness of mapping (of loop operations onto the PEs of CGRA), and in particular, the compiler’s ability to route the dependencies among operations. Previous works have explored several mechanisms to route data dependencies, including, routing through other PEs, registers, memory, and even re-computation. All these routing options change the graph to be mapped onto PEs (often by adding new operations), and without re-scheduling, it may be impossible to map the new graph. However, existing techniques explore these routing options inside the Place and Route (P&R) phase of the compilation process, which is performed after the scheduling step. As a result, they either may not achieve the mapping or obtain poor results. Our method RAMP, explicitly and intelligently explores the various routing options, before the scheduling step, and makes improve the mapping-ability and mapping quality. Evaluating top performance-critical loops of MiBench benchmarks over 12 architectural configurations, we find that RAMP is able to accelerate loops by 23× over sequential execution, achieving a geomean speedup of 2.13× over state-of-the-art.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"57 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77755436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
A Security Vulnerability Analysis of SoCFPGA Architectures SoCFPGA体系结构的安全漏洞分析
Pub Date : 2018-06-01 DOI: 10.1145/3195970.3195979
S. Chaudhuri
SoCFPGAs or FPGAs integrated on the same die with chip multi processors have made it to the market in the past years. In this article we analyse various security loopholes, existing precautions and countermeasures in these architectures. We consider Intel Cyclone/Arria devices and Xilinx Zynq/Ultrascale devices. We present an attacker model and we highlight three different types of attacks namely direct memory attacks, cache timing attacks, and rowhammer attacks that can be used on inadequately protected systems. We present and compare existing security mechanisms in this architectures, and their shortfalls. We present real life example of these attacks and further countermeasures to secure systems based on SoCFPGAs.
在过去的几年里,socfpga或fpga与芯片多处理器集成在同一芯片上,已经进入市场。在本文中,我们分析了这些架构中的各种安全漏洞、现有的预防措施和对策。我们考虑Intel Cyclone/Arria设备和Xilinx Zynq/Ultrascale设备。我们提出了一个攻击者模型,并强调了三种不同类型的攻击,即直接内存攻击、缓存定时攻击和可用于保护不充分的系统的rowhammer攻击。我们介绍并比较了该体系结构中现有的安全机制及其不足之处。我们给出了这些攻击的真实例子和基于socfpga的安全系统的进一步对策。
{"title":"A Security Vulnerability Analysis of SoCFPGA Architectures","authors":"S. Chaudhuri","doi":"10.1145/3195970.3195979","DOIUrl":"https://doi.org/10.1145/3195970.3195979","url":null,"abstract":"SoCFPGAs or FPGAs integrated on the same die with chip multi processors have made it to the market in the past years. In this article we analyse various security loopholes, existing precautions and countermeasures in these architectures. We consider Intel Cyclone/Arria devices and Xilinx Zynq/Ultrascale devices. We present an attacker model and we highlight three different types of attacks namely direct memory attacks, cache timing attacks, and rowhammer attacks that can be used on inadequately protected systems. We present and compare existing security mechanisms in this architectures, and their shortfalls. We present real life example of these attacks and further countermeasures to secure systems based on SoCFPGAs.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"25 3 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81266752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Routability-Driven and Fence-Aware Legalization for Mixed-Cell-Height Circuits 混合小区高度电路的可达性驱动和围栏感知合法化
Pub Date : 2018-06-01 DOI: 10.1145/3195970.3196107
Haocheng Li, Wing-Kai Chow, Gengjie Chen, Evangeline F. Y. Young, Bei Yu
Placement is one of the most critical stages in the physical synthesis flow. Circuits with increasing numbers of cells of multi-row height have brought challenges to traditional placers on efficiency and effectiveness. Furthermore, constraints on fence region and routability (e.g., edge spacing, pin access/short) should be considered, besides providing an overlap-free solution close to the global placement (GP) solution and fulfilling the power and ground (P/G) alignments. In this paper, we propose a legalization method for mixed-cell-height circuits by a window-based cell insertion technique and two post-processing network-flow-based optimizations. Compared with the champion of the IC/CAD 2017 Contest, our algorithm achieves 18% and 12% less average and maximum displacement respectively as well as significantly fewer routability violations. Comparing our algorithm with the state-of-the-art algorithms on this problem, there is a 9% improvement in total displacement with 20% less running time.
放置是物理合成流程中最关键的阶段之一。随着多排高度电池数量的不断增加,传统的放砂机在效率和有效性方面面临挑战。此外,除了提供接近全局布局(GP)的无重叠解决方案并实现电源和接地(P/G)对齐外,还应考虑对围栏区域和可达性的限制(例如,边缘间距,引脚访问/短)。在本文中,我们提出了一种基于窗口的单元插入技术和两个基于后处理网络流优化的混合单元高度电路的合法化方法。与2017年IC/CAD竞赛的冠军相比,我们的算法分别减少了18%和12%的平均位移和最大位移,并且显著减少了可达性违规。将我们的算法与最先进的算法进行比较,总位移提高了9%,运行时间减少了20%。
{"title":"Routability-Driven and Fence-Aware Legalization for Mixed-Cell-Height Circuits","authors":"Haocheng Li, Wing-Kai Chow, Gengjie Chen, Evangeline F. Y. Young, Bei Yu","doi":"10.1145/3195970.3196107","DOIUrl":"https://doi.org/10.1145/3195970.3196107","url":null,"abstract":"Placement is one of the most critical stages in the physical synthesis flow. Circuits with increasing numbers of cells of multi-row height have brought challenges to traditional placers on efficiency and effectiveness. Furthermore, constraints on fence region and routability (e.g., edge spacing, pin access/short) should be considered, besides providing an overlap-free solution close to the global placement (GP) solution and fulfilling the power and ground (P/G) alignments. In this paper, we propose a legalization method for mixed-cell-height circuits by a window-based cell insertion technique and two post-processing network-flow-based optimizations. Compared with the champion of the IC/CAD 2017 Contest, our algorithm achieves 18% and 12% less average and maximum displacement respectively as well as significantly fewer routability violations. Comparing our algorithm with the state-of-the-art algorithms on this problem, there is a 9% improvement in total displacement with 20% less running time.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"42 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84864808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
期刊
2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1