D. Prasad, Saurabh Sinha, B. Cline, S. Moore, A. Naeemi
Faithful system-level modeling is vital to design and technology pathfinding, and requires accurate representation of interconnects. In this study, Rent’s rule is modernized to cater to advanced technology and design, and applied to derive a priori wirelength distribution models. Furthermore, a priori interconnect branching models are proposed to capture design constraints and their handling by the Electronic-Design-Automation tools. These interconnect branching models are embedded into the wire-length distribution models and validated against a suite of state-of-the-art commercial designs across technology nodes. Novel design-specific critical-path models are presented which capture trends in technology and microarchitecture, providing a reliable framework for future technology and design benchmarking.
{"title":"Accurate Processor-level Wirelength Distribution Model for Technology Pathfinding Using a Modernized Interpretation of Rent’s Rule","authors":"D. Prasad, Saurabh Sinha, B. Cline, S. Moore, A. Naeemi","doi":"10.1145/3195970.3195980","DOIUrl":"https://doi.org/10.1145/3195970.3195980","url":null,"abstract":"Faithful system-level modeling is vital to design and technology pathfinding, and requires accurate representation of interconnects. In this study, Rent’s rule is modernized to cater to advanced technology and design, and applied to derive a priori wirelength distribution models. Furthermore, a priori interconnect branching models are proposed to capture design constraints and their handling by the Electronic-Design-Automation tools. These interconnect branching models are embedded into the wire-length distribution models and validated against a suite of state-of-the-art commercial designs across technology nodes. Novel design-specific critical-path models are presented which capture trends in technology and microarchitecture, providing a reliable framework for future technology and design benchmarking.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"45 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83398478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
At the end of digital integrated circuit (IC) design flow, some nets may still be left open due to engineering change order (ECO). Resolving these opens could be quite challenging for some huge nets such as power ground nets because of a large number of obstacles and greatly distributed net components. Existing studies on multilayer obstacle-avoiding rectilinear Steiner trees may not be applicable to solve this problem because they assume the pins of an input net is a set of points, while the discrete net components in this problem can be regarded as a set of rectilinear pins. In this paper, we develop an efficient open-net connector that can deal with rectilinear pins. The proposed algorithm flow minimizes the total connection cost based on precise estimation of the shortest distance between each pair of rectilinear net components with the presence of complex obstacles. Experimental results show that the proposed flow can outperform the top three teams of 2017 CAD Contest at ICCAD in terms of total connection cost or runtime efficiency.
{"title":"Obstacle-Avoiding Open-Net Connector with Precise Shortest Distance Estimation*","authors":"Guan-Qi Fang, Yong Zhong, Yi-Hao Cheng, Shao-Yun Fang","doi":"10.1145/3195970.3196081","DOIUrl":"https://doi.org/10.1145/3195970.3196081","url":null,"abstract":"At the end of digital integrated circuit (IC) design flow, some nets may still be left open due to engineering change order (ECO). Resolving these opens could be quite challenging for some huge nets such as power ground nets because of a large number of obstacles and greatly distributed net components. Existing studies on multilayer obstacle-avoiding rectilinear Steiner trees may not be applicable to solve this problem because they assume the pins of an input net is a set of points, while the discrete net components in this problem can be regarded as a set of rectilinear pins. In this paper, we develop an efficient open-net connector that can deal with rectilinear pins. The proposed algorithm flow minimizes the total connection cost based on precise estimation of the shortest distance between each pair of rectilinear net components with the presence of complex obstacles. Experimental results show that the proposed flow can outperform the top three teams of 2017 CAD Contest at ICCAD in terms of total connection cost or runtime efficiency.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"6 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83739190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Enrique Díaz, E. Mezzetti, Leonidas Kosmidis, J. Abella, F. Cazorla
Multicores are becoming ubiquitous in automotive. Yet, the expected benefits on integration are challenged by multicore contention concerns on timing V&V. Worst-case execution time (WCET) estimates are required as early as possible in the software development, to enable prompt detection of timing misbehavior. Factoring in multicore contention necessarily builds on conservative assumptions on interference, independent of co-runners load on shared hardware. We propose a contention model for automotive multi-cores that balances time-composability with tightness by exploiting available information on contenders. We tailor the model to the AURIX TC27x and provide tight WCET estimates using information from performance monitors and software configurations.
{"title":"Modelling Multicore Contention on the AURIX™ TC27x","authors":"Enrique Díaz, E. Mezzetti, Leonidas Kosmidis, J. Abella, F. Cazorla","doi":"10.1145/3195970.3196077","DOIUrl":"https://doi.org/10.1145/3195970.3196077","url":null,"abstract":"Multicores are becoming ubiquitous in automotive. Yet, the expected benefits on integration are challenged by multicore contention concerns on timing V&V. Worst-case execution time (WCET) estimates are required as early as possible in the software development, to enable prompt detection of timing misbehavior. Factoring in multicore contention necessarily builds on conservative assumptions on interference, independent of co-runners load on shared hardware. We propose a contention model for automotive multi-cores that balances time-composability with tightness by exploiting available information on contenders. We tailor the model to the AURIX TC27x and provide tight WCET estimates using information from performance monitors and software configurations.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"32 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73949854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Modern automotive systems and IoT devices are designed through a highly complex, globalized, and potentially untrustworthy supply chain. Each player in this supply chain may (1) introduce sensitive information and data (collectively termed “assets”) that must be protected from other players in the supply chain, and (2) have controlled access to assets introduced by other players. Furthermore, some players in the supply chain may be malicious. It is imperative to protect the device and any sensitive assets in it from being compromised or unknowingly disclosed by such entities. A key – and sometimes overlooked – component of security architecture of modern electronic systems entails managing security in the face of supply chain challenges. In this paper we discuss some security challenges in automotive and IoT systems arising from supply chain complexity, and the state of the practice in this area.
{"title":"Invited: Protecting the Supply Chain for Automotives and IoTs","authors":"S. Ray, Wen Chen, Rosario Cammarota","doi":"10.1145/3195970.3199851","DOIUrl":"https://doi.org/10.1145/3195970.3199851","url":null,"abstract":"Modern automotive systems and IoT devices are designed through a highly complex, globalized, and potentially untrustworthy supply chain. Each player in this supply chain may (1) introduce sensitive information and data (collectively termed “assets”) that must be protected from other players in the supply chain, and (2) have controlled access to assets introduced by other players. Furthermore, some players in the supply chain may be malicious. It is imperative to protect the device and any sensitive assets in it from being compromised or unknowingly disclosed by such entities. A key – and sometimes overlooked – component of security architecture of modern electronic systems entails managing security in the face of supply chain challenges. In this paper we discuss some security challenges in automotive and IoT systems arising from supply chain complexity, and the state of the practice in this area.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"42 1","pages":"1-4"},"PeriodicalIF":0.0,"publicationDate":"2018-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85062264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In order to fully exploit GPGPU's parallel processing power, on-chip interconnects need to provide bandwidth efficient data communication. GPGPUs exhibit a many-to-few-to-many traffic pattern which makes the memory controller connected routers the network bottleneck. Inefficient design of conventional routers causes long queues of packets blocked at memory controllers and thus greatly constrained the network bandwidth. In this work, we employ heterogeneous design techniques and propose a novel decoupled architecture for routers connected with memory controllers. To further improve performance, we propose techniques called Injection Virtual Circuit and Memory-aware Adaptive Routing. We show that our scheme can effectively eliminate NoC bottleneck and improve performance by 78% on average.
{"title":"Packet Pump: Overcoming Network Bottleneck in On-Chip Interconnects for GPGPUs*","authors":"Xianwei Cheng, Yang Zhao, Hui Zhao, Yuan Xie","doi":"10.1145/3195970.3196087","DOIUrl":"https://doi.org/10.1145/3195970.3196087","url":null,"abstract":"In order to fully exploit GPGPU's parallel processing power, on-chip interconnects need to provide bandwidth efficient data communication. GPGPUs exhibit a many-to-few-to-many traffic pattern which makes the memory controller connected routers the network bottleneck. Inefficient design of conventional routers causes long queues of packets blocked at memory controllers and thus greatly constrained the network bandwidth. In this work, we employ heterogeneous design techniques and propose a novel decoupled architecture for routers connected with memory controllers. To further improve performance, we propose techniques called Injection Virtual Circuit and Memory-aware Adaptive Routing. We show that our scheme can effectively eliminate NoC bottleneck and improve performance by 78% on average.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"14 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83909662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Stochastic computing (SC) is a promising computing paradigm for applications with low precision requirement, stringent cost and power restriction. One known problem with SC, however, is the low accuracy especially with multiplication. In this paper we propose a simple, yet very effective solution to the low-accuracy SC-multiplication problem, which is critical in many applications such as deep neural networks (DNNs). Our solution is based on an old concept of sign-magnitude, which, when applied to SC, has unique advantages. Our experimental results using multiple DNN applications demonstrate that our technique can improve the efficiency of SC-based DNNs by about 32X in terms of latency over using bipolar SC, with very little area overhead (about 1%).
{"title":"Sign-Magnitude SC: Getting 10X Accuracy for Free in Stochastic Computing for Deep Neural Networks*","authors":"Aidyn Zhakatayev, Sugil Lee, H. Sim, Jongeun Lee","doi":"10.1145/3195970.3196113","DOIUrl":"https://doi.org/10.1145/3195970.3196113","url":null,"abstract":"Stochastic computing (SC) is a promising computing paradigm for applications with low precision requirement, stringent cost and power restriction. One known problem with SC, however, is the low accuracy especially with multiplication. In this paper we propose a simple, yet very effective solution to the low-accuracy SC-multiplication problem, which is critical in many applications such as deep neural networks (DNNs). Our solution is based on an old concept of sign-magnitude, which, when applied to SC, has unique advantages. Our experimental results using multiple DNN applications demonstrate that our technique can improve the efficiency of SC-based DNNs by about 32X in terms of latency over using bipolar SC, with very little area overhead (about 1%).","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"68 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74323743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haoyu Yang, Shuhe Li, Yuzhe Ma, Bei Yu, Evangeline F. Y. Young
Mask optimization has been a critical problem in the VLSI design flow due to the mismatch between the lithography system and the continuously shrinking feature sizes. Optical proximity correction (OPC) is one of the prevailing resolution enhancement techniques (RETs) that can significantly improve mask printability. However, in advanced technology nodes, the mask optimization process consumes more and more computational resources. In this paper, we develop a generative adversarial network (GAN) model to achieve better mask optimization performance. We first develop an OPC-oriented GAN flow that can learn target-mask mapping from the improved architecture and objectives, which leads to satisfactory mask optimization results. To facilitate the training process and ensure better convergence, we also propose a pre-training procedure that jointly trains the neural network with inverse lithography technique (ILT). At convergence, the generative network is able to create quasi-optimal masks for given target circuit patterns and fewer normal OPC steps are required to generate high quality masks. Experimental results show that our flow can facilitate the mask optimization process as well as ensure a better printability.
{"title":"GAN-OPC: Mask Optimization with Lithography-guided Generative Adversarial Nets","authors":"Haoyu Yang, Shuhe Li, Yuzhe Ma, Bei Yu, Evangeline F. Y. Young","doi":"10.1145/3195970.3196056","DOIUrl":"https://doi.org/10.1145/3195970.3196056","url":null,"abstract":"Mask optimization has been a critical problem in the VLSI design flow due to the mismatch between the lithography system and the continuously shrinking feature sizes. Optical proximity correction (OPC) is one of the prevailing resolution enhancement techniques (RETs) that can significantly improve mask printability. However, in advanced technology nodes, the mask optimization process consumes more and more computational resources. In this paper, we develop a generative adversarial network (GAN) model to achieve better mask optimization performance. We first develop an OPC-oriented GAN flow that can learn target-mask mapping from the improved architecture and objectives, which leads to satisfactory mask optimization results. To facilitate the training process and ensure better convergence, we also propose a pre-training procedure that jointly trains the neural network with inverse lithography technique (ILT). At convergence, the generative network is able to create quasi-optimal masks for given target circuit patterns and fewer normal OPC steps are required to generate high quality masks. Experimental results show that our flow can facilitate the mask optimization process as well as ensure a better printability.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"58 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73864129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shail Dave, M. Balasubramanian, Aviral Shrivastava
Coarse-grained reconfigurable array (CGRA) is a promising solution that can accelerate even non-parallel loops. Acceleration achieved through CGRAs critically depends on the goodness of mapping (of loop operations onto the PEs of CGRA), and in particular, the compiler’s ability to route the dependencies among operations. Previous works have explored several mechanisms to route data dependencies, including, routing through other PEs, registers, memory, and even re-computation. All these routing options change the graph to be mapped onto PEs (often by adding new operations), and without re-scheduling, it may be impossible to map the new graph. However, existing techniques explore these routing options inside the Place and Route (P&R) phase of the compilation process, which is performed after the scheduling step. As a result, they either may not achieve the mapping or obtain poor results. Our method RAMP, explicitly and intelligently explores the various routing options, before the scheduling step, and makes improve the mapping-ability and mapping quality. Evaluating top performance-critical loops of MiBench benchmarks over 12 architectural configurations, we find that RAMP is able to accelerate loops by 23× over sequential execution, achieving a geomean speedup of 2.13× over state-of-the-art.
粗粒度可重构阵列(CGRA)是一种很有前途的解决方案,可以加速非并行循环。通过CGRAs实现的加速主要取决于映射的好坏(循环操作到CGRA的pe),特别是编译器路由操作之间依赖关系的能力。以前的工作已经探索了几种路由数据依赖性的机制,包括通过其他pe、寄存器、内存甚至重新计算进行路由。所有这些路由选项都会更改要映射到pe上的图(通常通过添加新操作),如果不重新调度,可能无法映射新图。然而,现有技术在编译过程的Place and Route (P&R)阶段探索这些路由选项,该阶段在调度步骤之后执行。因此,它们可能无法实现映射或获得较差的结果。本文提出的RAMP方法在调度步骤之前,明确而智能地探索各种路由选择,提高了映射能力和映射质量。在12种架构配置中评估MiBench基准测试的顶级性能关键循环,我们发现RAMP能够比顺序执行加速23倍,比最先进的实现2.13倍的几何加速。
{"title":"RAMP: Resource-Aware Mapping for CGRAs","authors":"Shail Dave, M. Balasubramanian, Aviral Shrivastava","doi":"10.1145/3195970.3196101","DOIUrl":"https://doi.org/10.1145/3195970.3196101","url":null,"abstract":"Coarse-grained reconfigurable array (CGRA) is a promising solution that can accelerate even non-parallel loops. Acceleration achieved through CGRAs critically depends on the goodness of mapping (of loop operations onto the PEs of CGRA), and in particular, the compiler’s ability to route the dependencies among operations. Previous works have explored several mechanisms to route data dependencies, including, routing through other PEs, registers, memory, and even re-computation. All these routing options change the graph to be mapped onto PEs (often by adding new operations), and without re-scheduling, it may be impossible to map the new graph. However, existing techniques explore these routing options inside the Place and Route (P&R) phase of the compilation process, which is performed after the scheduling step. As a result, they either may not achieve the mapping or obtain poor results. Our method RAMP, explicitly and intelligently explores the various routing options, before the scheduling step, and makes improve the mapping-ability and mapping quality. Evaluating top performance-critical loops of MiBench benchmarks over 12 architectural configurations, we find that RAMP is able to accelerate loops by 23× over sequential execution, achieving a geomean speedup of 2.13× over state-of-the-art.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"57 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77755436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
SoCFPGAs or FPGAs integrated on the same die with chip multi processors have made it to the market in the past years. In this article we analyse various security loopholes, existing precautions and countermeasures in these architectures. We consider Intel Cyclone/Arria devices and Xilinx Zynq/Ultrascale devices. We present an attacker model and we highlight three different types of attacks namely direct memory attacks, cache timing attacks, and rowhammer attacks that can be used on inadequately protected systems. We present and compare existing security mechanisms in this architectures, and their shortfalls. We present real life example of these attacks and further countermeasures to secure systems based on SoCFPGAs.
{"title":"A Security Vulnerability Analysis of SoCFPGA Architectures","authors":"S. Chaudhuri","doi":"10.1145/3195970.3195979","DOIUrl":"https://doi.org/10.1145/3195970.3195979","url":null,"abstract":"SoCFPGAs or FPGAs integrated on the same die with chip multi processors have made it to the market in the past years. In this article we analyse various security loopholes, existing precautions and countermeasures in these architectures. We consider Intel Cyclone/Arria devices and Xilinx Zynq/Ultrascale devices. We present an attacker model and we highlight three different types of attacks namely direct memory attacks, cache timing attacks, and rowhammer attacks that can be used on inadequately protected systems. We present and compare existing security mechanisms in this architectures, and their shortfalls. We present real life example of these attacks and further countermeasures to secure systems based on SoCFPGAs.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"25 3 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81266752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haocheng Li, Wing-Kai Chow, Gengjie Chen, Evangeline F. Y. Young, Bei Yu
Placement is one of the most critical stages in the physical synthesis flow. Circuits with increasing numbers of cells of multi-row height have brought challenges to traditional placers on efficiency and effectiveness. Furthermore, constraints on fence region and routability (e.g., edge spacing, pin access/short) should be considered, besides providing an overlap-free solution close to the global placement (GP) solution and fulfilling the power and ground (P/G) alignments. In this paper, we propose a legalization method for mixed-cell-height circuits by a window-based cell insertion technique and two post-processing network-flow-based optimizations. Compared with the champion of the IC/CAD 2017 Contest, our algorithm achieves 18% and 12% less average and maximum displacement respectively as well as significantly fewer routability violations. Comparing our algorithm with the state-of-the-art algorithms on this problem, there is a 9% improvement in total displacement with 20% less running time.
{"title":"Routability-Driven and Fence-Aware Legalization for Mixed-Cell-Height Circuits","authors":"Haocheng Li, Wing-Kai Chow, Gengjie Chen, Evangeline F. Y. Young, Bei Yu","doi":"10.1145/3195970.3196107","DOIUrl":"https://doi.org/10.1145/3195970.3196107","url":null,"abstract":"Placement is one of the most critical stages in the physical synthesis flow. Circuits with increasing numbers of cells of multi-row height have brought challenges to traditional placers on efficiency and effectiveness. Furthermore, constraints on fence region and routability (e.g., edge spacing, pin access/short) should be considered, besides providing an overlap-free solution close to the global placement (GP) solution and fulfilling the power and ground (P/G) alignments. In this paper, we propose a legalization method for mixed-cell-height circuits by a window-based cell insertion technique and two post-processing network-flow-based optimizations. Compared with the champion of the IC/CAD 2017 Contest, our algorithm achieves 18% and 12% less average and maximum displacement respectively as well as significantly fewer routability violations. Comparing our algorithm with the state-of-the-art algorithms on this problem, there is a 9% improvement in total displacement with 20% less running time.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"42 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84864808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}