首页 > 最新文献

2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)最新文献

英文 中文
Hierarchical Layout Synthesis and Optimization Framework for High-Density Power Module Design Automation 高密度电源模块设计自动化的分层布局综合与优化框架
Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643545
Imam Al Razi, Quang Le, H. Mantooth, Yarui Peng
Multi-chip power module (MCPM) layout design automation has become an emerging research field in the power electronics society. MCPM physical design is currently a trial-and-error procedure that heavily relies on the designers' experience to produce a reliable solution. To push the boundary of energy efficiency and power density, novel packaging technologies are emerging with increasing design complexity. As this manual design process becomes the bottleneck in design productivity, the power electronics industry is calling for more intelligence in design CAD tools, especially for advanced packaging solutions with stacked substrates. This paper presents a physical design, synthesis, and optimization framework for 2D, 2.5D, and 3D power modules. Generic, scalable, and efficient physical design algorithms are implemented with optimization metaheuristics to solve the hierarchical layout synthesis problem. Corner stitching data structure and hierarchical constraint graph evaluation have been customized to better align with power electronics design considerations. A complete layout synthesis process is demonstrated for both 2D and 3D power module examples. Further, electro-thermal design optimization is carried out on a sample 3D MCPM layout using both exhaustive and evolutionary search methods. Our algorithm can generate 937 3D layouts in 56 s, resulting in 10 layouts on the Pareto-front. In addition, our optimized 3D layouts can achieve 1.3 nH loop inductance with 38 °C temperature rise and 836 mm2 footprint area, compared to 2D layouts with 8.5 nH, 99 °C, and 2000 mm2.
多芯片功率模块(MCPM)版图设计自动化已成为电力电子领域一个新兴的研究领域。MCPM物理设计目前是一个反复试验的过程,严重依赖于设计师的经验来产生可靠的解决方案。为了推动能源效率和功率密度的边界,新的封装技术不断涌现,设计的复杂性也越来越高。由于这种手工设计过程成为设计生产力的瓶颈,电力电子行业要求设计CAD工具更加智能化,特别是对于具有堆叠基板的先进封装解决方案。本文介绍了2D、2.5D和3D电源模块的物理设计、合成和优化框架。采用优化元启发式实现通用、可扩展、高效的物理设计算法,解决分层布局综合问题。角拼接数据结构和分层约束图评估已经定制,以更好地与电力电子设计考虑相一致。以二维和三维电源模块为例,演示了完整的布局合成过程。在此基础上,利用穷举和进化搜索方法对三维MCPM布局进行了电热优化设计。我们的算法可以在56秒内生成937个3D布局,在Pareto-front上生成10个布局。此外,我们优化的3D布局可以在38°C温升和836 mm2占地面积下实现1.3 nH环路电感,而2D布局为8.5 nH, 99°C和2000 mm2。
{"title":"Hierarchical Layout Synthesis and Optimization Framework for High-Density Power Module Design Automation","authors":"Imam Al Razi, Quang Le, H. Mantooth, Yarui Peng","doi":"10.1109/ICCAD51958.2021.9643545","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643545","url":null,"abstract":"Multi-chip power module (MCPM) layout design automation has become an emerging research field in the power electronics society. MCPM physical design is currently a trial-and-error procedure that heavily relies on the designers' experience to produce a reliable solution. To push the boundary of energy efficiency and power density, novel packaging technologies are emerging with increasing design complexity. As this manual design process becomes the bottleneck in design productivity, the power electronics industry is calling for more intelligence in design CAD tools, especially for advanced packaging solutions with stacked substrates. This paper presents a physical design, synthesis, and optimization framework for 2D, 2.5D, and 3D power modules. Generic, scalable, and efficient physical design algorithms are implemented with optimization metaheuristics to solve the hierarchical layout synthesis problem. Corner stitching data structure and hierarchical constraint graph evaluation have been customized to better align with power electronics design considerations. A complete layout synthesis process is demonstrated for both 2D and 3D power module examples. Further, electro-thermal design optimization is carried out on a sample 3D MCPM layout using both exhaustive and evolutionary search methods. Our algorithm can generate 937 3D layouts in 56 s, resulting in 10 layouts on the Pareto-front. In addition, our optimized 3D layouts can achieve 1.3 nH loop inductance with 38 °C temperature rise and 836 mm2 footprint area, compared to 2D layouts with 8.5 nH, 99 °C, and 2000 mm2.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127763316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Starfish: An Efficient P&R Co-Optimization Engine with A*-based Partial Rerouting 海星:基于A*的局部重路由的P&R协同优化引擎
Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643517
Fangzhou Wang, Lixin Liu, Jingsong Chen, Jinwei Liu, Xinshi Zang, Martin D. F. Wong
Placement and routing (P&R) are two important stages in the physical design flow. After circuit components are assigned locations by a placer, routing will take place to make the connections. Defined as two separate problems, placement and routing aim to optimize different objectives. For instance, placement usually focuses on optimizing the half-perimeter wire length (HPWL) and estimated congestion while routing will try to minimize the routed wire length and the number of overflows. The misalignment between the objectives will inevitably lead to a significant degradation in solution quality. Therefore, in this paper, we present Starfish, an efficient P&R co-optimization engine that bridges the gap between placement and routing. To incrementally optimize the routed wire length, Starfish conducts cell movements and reconnects broken nets by A*-based partial rerouting. Experimental results on the ICCAD 2020 contest benchmark suites [1] show that our co-optimizer outperforms all the contestants with better solution quality and much shorter runtime.
放置和布线(P&R)是物理设计流程中的两个重要阶段。在电路组件被分配位置后,布线将进行连接。定位和路径被定义为两个独立的问题,其目的是优化不同的目标。例如,布局通常侧重于优化半周线长(HPWL)和估计的拥塞,而路由将尝试最小化路由的线长和溢出的数量。目标之间的不一致将不可避免地导致解决方案质量的显著下降。因此,在本文中,我们提出了一种有效的P&R协同优化引擎Starfish,它可以弥合放置和路由之间的差距。为了逐步优化路由的导线长度,海星通过基于A*的部分重路由进行细胞运动并重新连接破碎的网。在ICCAD 2020竞赛基准套件上的实验结果[1]表明,我们的协同优化器以更好的解决方案质量和更短的运行时间优于所有竞争者。
{"title":"Starfish: An Efficient P&R Co-Optimization Engine with A*-based Partial Rerouting","authors":"Fangzhou Wang, Lixin Liu, Jingsong Chen, Jinwei Liu, Xinshi Zang, Martin D. F. Wong","doi":"10.1109/ICCAD51958.2021.9643517","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643517","url":null,"abstract":"Placement and routing (P&R) are two important stages in the physical design flow. After circuit components are assigned locations by a placer, routing will take place to make the connections. Defined as two separate problems, placement and routing aim to optimize different objectives. For instance, placement usually focuses on optimizing the half-perimeter wire length (HPWL) and estimated congestion while routing will try to minimize the routed wire length and the number of overflows. The misalignment between the objectives will inevitably lead to a significant degradation in solution quality. Therefore, in this paper, we present Starfish, an efficient P&R co-optimization engine that bridges the gap between placement and routing. To incrementally optimize the routed wire length, Starfish conducts cell movements and reconnects broken nets by A*-based partial rerouting. Experimental results on the ICCAD 2020 contest benchmark suites [1] show that our co-optimizer outperforms all the contestants with better solution quality and much shorter runtime.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"262 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132663477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Quantum Machine Learning for Finance ICCAD Special Session Paper 量子机器学习金融ICCAD特别会议论文
Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643469
Marco Pistoia, Syed Farhan Ahmad, Akshay Ajagekar, Alexander Buts, Shouvanik Chakrabarti, Dylan Herman, Shaohan Hu, Andrew Jena, Pierre Minssen, Pradeep Niroula, Arthur G. Rattew, Yue Sun, Romina Yalovetzky
Quantum computers are expected to surpass the computational capabilities of classical computers during this decade, and achieve disruptive impact on numerous industry sectors, particularly finance. In fact, finance is estimated to be the first industry sector to benefit from Quantum Computing not only in the medium and long terms, but even in the short term. This review paper presents the state of the art of quantum algorithms for financial applications, with particular focus to those use cases that can be solved via Machine Learning.
量子计算机有望在这十年内超越经典计算机的计算能力,并对许多行业,特别是金融领域产生颠覆性影响。事实上,金融预计将是第一个受益于量子计算的行业,不仅在中长期,甚至在短期内都是如此。这篇综述文章介绍了量子算法在金融应用中的最新进展,特别关注那些可以通过机器学习解决的用例。
{"title":"Quantum Machine Learning for Finance ICCAD Special Session Paper","authors":"Marco Pistoia, Syed Farhan Ahmad, Akshay Ajagekar, Alexander Buts, Shouvanik Chakrabarti, Dylan Herman, Shaohan Hu, Andrew Jena, Pierre Minssen, Pradeep Niroula, Arthur G. Rattew, Yue Sun, Romina Yalovetzky","doi":"10.1109/ICCAD51958.2021.9643469","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643469","url":null,"abstract":"Quantum computers are expected to surpass the computational capabilities of classical computers during this decade, and achieve disruptive impact on numerous industry sectors, particularly finance. In fact, finance is estimated to be the first industry sector to benefit from Quantum Computing not only in the medium and long terms, but even in the short term. This review paper presents the state of the art of quantum algorithms for financial applications, with particular focus to those use cases that can be solved via Machine Learning.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"175 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132200971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Online and Offline Machine Learning for Industrial Design Flow Tuning: (Invited - ICCAD Special Session Paper) 在线和离线机器学习用于工业设计流程调整:(邀请- ICCAD特别会议论文)
Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643577
M. Ziegler, Jihye Kwon, Hung-Yi Liu, L. Carloni
Modern logic and physical synthesis tools provide numerous options and parameters that can drastically affect design quality; however, the large number of options leads to a complex design space difficult for human designers to navigate. Fortunately, machine learning approaches and cloud computing environments are well suited for tackling complex parameter tuning problems like those seen in VLSI design flows. This paper proposes a holistic approach where online and offline machine learning approaches work together for tuning industrial design flows. We describe a system called SynTunSys (STS) that has been used to optimize multiple industrial high-performance processors. STS consists of an online system that optimizes designs and generates data for a recommender system that performs offline training and recommendation. Experimental results show the collaboration between STS online and offline machine learning systems as well as insight from human designers provide best-of-breed results. Finally, we discuss potential new directions for research on design flow tuning.
现代逻辑和物理合成工具提供了许多选项和参数,可以极大地影响设计质量;然而,大量的选择导致了一个复杂的设计空间,很难让人类设计师驾驭。幸运的是,机器学习方法和云计算环境非常适合处理复杂的参数调优问题,例如在VLSI设计流程中看到的问题。本文提出了一种整体方法,其中在线和离线机器学习方法一起工作以调整工业设计流程。我们描述了一个名为SynTunSys (STS)的系统,该系统已用于优化多个工业高性能处理器。STS由一个在线系统组成,该系统为执行离线培训和推荐的推荐系统优化设计和生成数据。实验结果表明,STS在线和离线机器学习系统之间的协作以及人类设计师的洞察力提供了最佳的结果。最后,讨论了设计流程调优的潜在研究方向。
{"title":"Online and Offline Machine Learning for Industrial Design Flow Tuning: (Invited - ICCAD Special Session Paper)","authors":"M. Ziegler, Jihye Kwon, Hung-Yi Liu, L. Carloni","doi":"10.1109/ICCAD51958.2021.9643577","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643577","url":null,"abstract":"Modern logic and physical synthesis tools provide numerous options and parameters that can drastically affect design quality; however, the large number of options leads to a complex design space difficult for human designers to navigate. Fortunately, machine learning approaches and cloud computing environments are well suited for tackling complex parameter tuning problems like those seen in VLSI design flows. This paper proposes a holistic approach where online and offline machine learning approaches work together for tuning industrial design flows. We describe a system called SynTunSys (STS) that has been used to optimize multiple industrial high-performance processors. STS consists of an online system that optimizes designs and generates data for a recommender system that performs offline training and recommendation. Experimental results show the collaboration between STS online and offline machine learning systems as well as insight from human designers provide best-of-breed results. Finally, we discuss potential new directions for research on design flow tuning.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125358481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Heterogeneous Manycore Architectures Enabled by Processing-in-Memory for Deep Learning: From CNNs to GNNs: (ICCAD Special Session Paper) 基于内存处理的异构多核深度学习架构:从cnn到GNNs (ICCAD特别会议论文)
Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643559
Biresh Kumar Joardar, Aqeeb Iqbal Arka, J. Doppa, P. Pande, Hai Helen Li, K. Chakrabarty
Resistive random-access memory (ReRAM)-based processing-in-memory (PIM) architectures have recently become a popular architectural choice for deep-learning applications. ReRAM-based architectures can accelerate inferencing and training of deep learning algorithms and are more energy efficient compared to traditional GPUs. However, these architectures have various limitations that affect the model accuracy and performance. Moreover, the choice of the deep-learning application also imposes new design challenges that must be addressed to achieve high performance. In this paper, we present the advantages and challenges associated with ReRAM-based PIM architectures by considering Convolutional Neural Networks (CNNs) and Graph Neural Networks (GNNs) as important application domains. We also outline methods that can be used to address these challenges.
基于电阻随机存取存储器(ReRAM)的内存中处理(PIM)架构最近成为深度学习应用程序的流行架构选择。基于rerram的架构可以加速深度学习算法的推理和训练,并且与传统gpu相比更节能。然而,这些体系结构有各种各样的限制,这些限制会影响模型的准确性和性能。此外,深度学习应用程序的选择也带来了新的设计挑战,必须解决这些挑战才能实现高性能。在本文中,我们通过考虑卷积神经网络(cnn)和图神经网络(gnn)作为重要的应用领域,提出了基于reram的PIM架构的优势和挑战。我们还概述了可用于应对这些挑战的方法。
{"title":"Heterogeneous Manycore Architectures Enabled by Processing-in-Memory for Deep Learning: From CNNs to GNNs: (ICCAD Special Session Paper)","authors":"Biresh Kumar Joardar, Aqeeb Iqbal Arka, J. Doppa, P. Pande, Hai Helen Li, K. Chakrabarty","doi":"10.1109/ICCAD51958.2021.9643559","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643559","url":null,"abstract":"Resistive random-access memory (ReRAM)-based processing-in-memory (PIM) architectures have recently become a popular architectural choice for deep-learning applications. ReRAM-based architectures can accelerate inferencing and training of deep learning algorithms and are more energy efficient compared to traditional GPUs. However, these architectures have various limitations that affect the model accuracy and performance. Moreover, the choice of the deep-learning application also imposes new design challenges that must be addressed to achieve high performance. In this paper, we present the advantages and challenges associated with ReRAM-based PIM architectures by considering Convolutional Neural Networks (CNNs) and Graph Neural Networks (GNNs) as important application domains. We also outline methods that can be used to address these challenges.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"194 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132640181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
CORLD: In-Stream Correlation Manipulation for Low-Discrepancy Stochastic Computing 低差异随机计算的流内相关操作
Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643450
Sina Asadi, M. Najafi, M. Imani
Stochastic computing (SC) is a re-emerging computing paradigm providing low-cost and noise-tolerant designs for a wide range of arithmetic operations. SC circuits operate on uniform bit-streams with the value determined by the probability of observing 1's in the bit-stream. The accuracy of SC operations highly depends on the correlation between input bit-streams. While some operations such as minimum and maximum value functions require highly correlated inputs, some other such as multiplication operation need uncorrelated or independent inputs for accurate computation. Developing low-cost and accurate correlation manipulation circuits is an important research in SC as these circuits can manage correlation between bit-streams without expensive bit-stream regeneration. This work proposes a novel in-stream correlator and decorrelator circuit that manages 1) correlation between stochastic bit-streams, and 2) distribution of 1's in the output bit-streams. Compared to state-of-the-art solutions, our designs achieve lower hardware cost and higher accuracy. The output bit-streams enjoy a low-discrepancy distribution of bits which leads to higher quality of results. The effectiveness of the proposed circuits is shown with two case studies: SC design of sorting and median filtering.
随机计算(SC)是一种重新兴起的计算范式,为广泛的算术运算提供低成本和耐噪声的设计。SC电路在均匀的比特流上工作,其值由在比特流中观察到1的概率决定。SC操作的准确性高度依赖于输入比特流之间的相关性。虽然一些操作(如最小值函数和最大值函数)需要高度相关的输入,但其他一些操作(如乘法操作)需要不相关或独立的输入以进行精确计算。开发低成本、精确的相关处理电路是集成电路的重要研究方向,因为这些电路可以在不需要昂贵的比特流再生的情况下管理比特流之间的相关。这项工作提出了一种新的流内相关和去相关电路,它可以管理1)随机比特流之间的相关性,以及2)输出比特流中1的分布。与最先进的解决方案相比,我们的设计实现了更低的硬件成本和更高的精度。输出的比特流具有比特的低差异分布,从而获得更高质量的结果。通过两个案例研究表明了所提电路的有效性:排序的SC设计和中值滤波。
{"title":"CORLD: In-Stream Correlation Manipulation for Low-Discrepancy Stochastic Computing","authors":"Sina Asadi, M. Najafi, M. Imani","doi":"10.1109/ICCAD51958.2021.9643450","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643450","url":null,"abstract":"Stochastic computing (SC) is a re-emerging computing paradigm providing low-cost and noise-tolerant designs for a wide range of arithmetic operations. SC circuits operate on uniform bit-streams with the value determined by the probability of observing 1's in the bit-stream. The accuracy of SC operations highly depends on the correlation between input bit-streams. While some operations such as minimum and maximum value functions require highly correlated inputs, some other such as multiplication operation need uncorrelated or independent inputs for accurate computation. Developing low-cost and accurate correlation manipulation circuits is an important research in SC as these circuits can manage correlation between bit-streams without expensive bit-stream regeneration. This work proposes a novel in-stream correlator and decorrelator circuit that manages 1) correlation between stochastic bit-streams, and 2) distribution of 1's in the output bit-streams. Compared to state-of-the-art solutions, our designs achieve lower hardware cost and higher accuracy. The output bit-streams enjoy a low-discrepancy distribution of bits which leads to higher quality of results. The effectiveness of the proposed circuits is shown with two case studies: SC design of sorting and median filtering.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115400834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
AMF-Placer: High-Performance Analytical Mixed-size Placer for FPGA AMF-Placer:用于FPGA的高性能分析混合大小的Placer
Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643574
Tingyuan Liang, Gengjie Chen, Jieru Zhao, Sharad Sinha, Wei Zhang
To enable the performance optimization of application mapping on modern field-programmable gate arrays (FPGAs), certain critical path portions of the designs might be prearranged into many multi-cell macros during synthesis. These movable macros with constraints of shape and resources lead to challenging mixed-size placement for FPGA designs which cannot be addressed by previous works of analytical placers. In this work, we propose AMF-Placer, an open-source Analytical Mixed-size FPGA placer supporting mixed-size placement on FPGA, with an interface to Xilinx Vivado. To speed up the convergence and improve the quality of the placement, AMF-Placer is equipped with a series of new techniques for wirelength optimization, cell spreading, packing, and legalization. Based on a set of the latest large open-source benchmarks from various domains for Xilinx Ultrascale FPGAs, experimental results indicate that AMF-Placer can improve HPWL by 20.4%-89.3% and reduce runtime by 8.0%-84.2%, compared to the baseline. Furthermore, utilizing the parallelism of the proposed algorithms, with 8 threads, the placement procedure can be accelerated by 2.41x on average.
为了在现代现场可编程门阵列(fpga)上实现应用映射的性能优化,设计的某些关键路径部分可能在合成过程中被预先安排到许多多单元宏中。这些具有形状和资源限制的可移动宏导致FPGA设计具有挑战性的混合尺寸放置,这是以前的分析放置器无法解决的。在这项工作中,我们提出了AMF-Placer,这是一个开源的分析型混合尺寸FPGA放砂器,支持FPGA上的混合尺寸放置,并具有与Xilinx Vivado的接口。为了加快收敛速度和提高放置质量,AMF-Placer配备了一系列新技术,用于无线优化、小区扩展、打包和合法化。基于Xilinx Ultrascale fpga各领域最新的大型开源基准测试,实验结果表明,与基线相比,AMF-Placer可将HPWL提高20.4% ~ 89.3%,将运行时间缩短8.0% ~ 84.2%。此外,利用所提出算法的并行性,在8个线程的情况下,放置过程平均可以加速2.41倍。
{"title":"AMF-Placer: High-Performance Analytical Mixed-size Placer for FPGA","authors":"Tingyuan Liang, Gengjie Chen, Jieru Zhao, Sharad Sinha, Wei Zhang","doi":"10.1109/ICCAD51958.2021.9643574","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643574","url":null,"abstract":"To enable the performance optimization of application mapping on modern field-programmable gate arrays (FPGAs), certain critical path portions of the designs might be prearranged into many multi-cell macros during synthesis. These movable macros with constraints of shape and resources lead to challenging mixed-size placement for FPGA designs which cannot be addressed by previous works of analytical placers. In this work, we propose AMF-Placer, an open-source Analytical Mixed-size FPGA placer supporting mixed-size placement on FPGA, with an interface to Xilinx Vivado. To speed up the convergence and improve the quality of the placement, AMF-Placer is equipped with a series of new techniques for wirelength optimization, cell spreading, packing, and legalization. Based on a set of the latest large open-source benchmarks from various domains for Xilinx Ultrascale FPGAs, experimental results indicate that AMF-Placer can improve HPWL by 20.4%-89.3% and reduce runtime by 8.0%-84.2%, compared to the baseline. Furthermore, utilizing the parallelism of the proposed algorithms, with 8 threads, the placement procedure can be accelerated by 2.41x on average.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116930343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Exploring Physical Synthesis for Circuits based on Emerging Reconfigurable Nanotechnologies 基于新兴可重构纳米技术的电路物理合成探索
Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643439
Andreas Krinke, Shubham Rai, Akash Kumar, J. Lienig
Recently proposed ambipolar nanotechnologies allow the development of reconfigurable circuits with low area and power overheads as compared to the conventional CMOS technology. However, using a conventional physical synthesis flow for circuits that include gates based on reconfigurable FETs (RFETs) leads to sub-optimal results. This is due to the fact that the physical synthesis flow for circuits based on RFETs has to cater to the additional gate terminal per RFET transistors. In the present work, we explore three important verticals that lead to an optimized physical synthesis flow for RFET-based circuits with circuit-level reconfigurability: (1) designing optimized layouts of reconfigurable gates, (2) utilize special driver cells to drive the reconfigurable portions of a circuit, and (3) optimized placement of these reconfigurable parts in separate power domains. Experimental evaluations over EPFL benchmarks using our proposed approach show a reduction in chip area of up to 17.5% when compared to conventional flows.
与传统的CMOS技术相比,最近提出的双极性纳米技术允许开发具有低面积和低功耗开销的可重构电路。然而,对于包括基于可重构场效应管(rfet)的门的电路,使用传统的物理合成流程会导致次优结果。这是由于基于RFET的电路的物理合成流必须迎合每个RFET晶体管的附加栅极终端。在目前的工作中,我们探索了三个重要的垂直方向,这些垂直方向导致具有电路级可重构性的基于rfet的电路的优化物理合成流程:(1)设计可重构门的优化布局,(2)利用特殊的驱动单元来驱动电路的可重构部分,以及(3)优化放置这些可重构部分在单独的功率域中。在EPFL基准测试中,使用我们提出的方法进行的实验评估表明,与传统流程相比,芯片面积减少了17.5%。
{"title":"Exploring Physical Synthesis for Circuits based on Emerging Reconfigurable Nanotechnologies","authors":"Andreas Krinke, Shubham Rai, Akash Kumar, J. Lienig","doi":"10.1109/ICCAD51958.2021.9643439","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643439","url":null,"abstract":"Recently proposed ambipolar nanotechnologies allow the development of reconfigurable circuits with low area and power overheads as compared to the conventional CMOS technology. However, using a conventional physical synthesis flow for circuits that include gates based on reconfigurable FETs (RFETs) leads to sub-optimal results. This is due to the fact that the physical synthesis flow for circuits based on RFETs has to cater to the additional gate terminal per RFET transistors. In the present work, we explore three important verticals that lead to an optimized physical synthesis flow for RFET-based circuits with circuit-level reconfigurability: (1) designing optimized layouts of reconfigurable gates, (2) utilize special driver cells to drive the reconfigurable portions of a circuit, and (3) optimized placement of these reconfigurable parts in separate power domains. Experimental evaluations over EPFL benchmarks using our proposed approach show a reduction in chip area of up to 17.5% when compared to conventional flows.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115391845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Sampling-Based Approximate Logic Synthesis: An Explainable Machine Learning Approach 基于采样的近似逻辑综合:一种可解释的机器学习方法
Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643484
Wei Zeng, A. Davoodi, R. Topaloglu
Recent years have seen promising studies on machine learning (ML) techniques applied to approximate logic synthesis (ALS), especially based on logic reconstruction from samples of input-output pairs. This “sampling-based ALS” supports integration with conventional logic synthesis and optimization techniques, as well as synthesis for a constrained input space (e.g., when primary input values are restricted using Boolean relations). To achieve an effective sampling-based ALS, for the first time, this paper proposes the use of adaptive decision trees (ADTs), and in particular variations guided by explainable ML. We adopt SHAP importance, which is a feature importance metric derived from a recent advance in explainable ML to guide the training of ADTs. We also include approximation techniques for ADT which are specifically designed for ALS, including don't-care bit assertion and instantiation. Comprehensive experiments show that we can achieve 39%-42% area reduction with 0.20%-0.22% error rate on average, based on 15 logic functions in the IWLS'20 benchmark suite.
近年来,机器学习(ML)技术应用于近似逻辑合成(ALS),特别是基于输入输出对样本的逻辑重构的研究取得了很大进展。这种“基于采样的ALS”支持与传统逻辑合成和优化技术的集成,以及对受限输入空间的合成(例如,当主输入值使用布尔关系受到限制时)。为了实现有效的基于采样的ALS,本文首次提出使用自适应决策树(adt),特别是由可解释ML指导的变化。我们采用SHAP重要性,这是一种特征重要性度量,源自可解释ML的最新进展,以指导adt的训练。我们还包括专门为ALS设计的ADT近似技术,包括不关心位断言和实例化。综合实验表明,基于IWLS的20个基准测试套件中的15个逻辑函数,我们可以实现39%-42%的面积缩减,平均错误率为0.20%-0.22%。
{"title":"Sampling-Based Approximate Logic Synthesis: An Explainable Machine Learning Approach","authors":"Wei Zeng, A. Davoodi, R. Topaloglu","doi":"10.1109/ICCAD51958.2021.9643484","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643484","url":null,"abstract":"Recent years have seen promising studies on machine learning (ML) techniques applied to approximate logic synthesis (ALS), especially based on logic reconstruction from samples of input-output pairs. This “sampling-based ALS” supports integration with conventional logic synthesis and optimization techniques, as well as synthesis for a constrained input space (e.g., when primary input values are restricted using Boolean relations). To achieve an effective sampling-based ALS, for the first time, this paper proposes the use of adaptive decision trees (ADTs), and in particular variations guided by explainable ML. We adopt SHAP importance, which is a feature importance metric derived from a recent advance in explainable ML to guide the training of ADTs. We also include approximation techniques for ADT which are specifically designed for ALS, including don't-care bit assertion and instantiation. Comprehensive experiments show that we can achieve 39%-42% area reduction with 0.20%-0.22% error rate on average, based on 15 logic functions in the IWLS'20 benchmark suite.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115735095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Crossbar based Processing in Memory Accelerator Architecture for Graph Convolutional Networks 基于交叉条的图卷积网络内存加速体系结构处理
Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643465
Nagadastagiri Challapalle, Karthik Swaminathan, Nandhini Chandramoorthy, V. Narayanan
Graph data structures are central to many applications such as social networks, citation networks, molecular interactions, and navigation systems. Graph Convolutional Networks (GCNs) are used to process and learn insights from the graph data for tasks such as link prediction, node classification, and learning node embeddings. The compute and memory access characteristics of GCNs differ, both from conventional graph analytics algorithms and from convolutional neural networks, rendering the existing accelerators for graph analytics as well as deep learning, inefficient. In this work, we propose PIM-GCN, a crossbar-based processing-in-memory (PIM) accelerator architecture for GCNs. PIM-GCN incorporates a node-stationary dataflow with support for both Compressed Sparse Row (CSR) and Compressed Sparse Column (CSC) graph data representations. We propose techniques for graph traversal in the compressed sparse domain, feature aggregation, and feature transformation operations in GCNs mapped to in-situ analog compute functions of crossbar memory, and present the trade-offs in performance, energy, and scalability aspects of the PIM-GCN architecture for CSR, and CSC graph data representations. PIM-GCN shows an average speedup of over $3-16times$ and an average energy reduction of $4-12times$ compared to the existing accelerator architectures.
图数据结构是许多应用程序的核心,如社交网络、引文网络、分子相互作用和导航系统。图卷积网络(GCNs)用于处理和从图数据中学习洞察力,用于链路预测、节点分类和学习节点嵌入等任务。在这项工作中,我们提出了PIM- gcn,这是一种基于交叉棒的内存处理(PIM)加速器架构。PIM-GCN结合了节点平稳数据流,支持压缩稀疏行(CSR)和压缩稀疏列(CSC)图形数据表示。我们提出了压缩稀疏域的图遍历、特征聚合和映射到交叉棒存储器的原位模拟计算功能的gcn中的特征转换操作的技术,并提出了用于CSR和CSC图数据表示的PIM-GCN架构在性能、能量和可扩展性方面的权衡。与现有的加速器架构相比,PIM-GCN的平均加速速度超过3-16倍,平均能耗降低4-12倍。
{"title":"Crossbar based Processing in Memory Accelerator Architecture for Graph Convolutional Networks","authors":"Nagadastagiri Challapalle, Karthik Swaminathan, Nandhini Chandramoorthy, V. Narayanan","doi":"10.1109/ICCAD51958.2021.9643465","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643465","url":null,"abstract":"Graph data structures are central to many applications such as social networks, citation networks, molecular interactions, and navigation systems. Graph Convolutional Networks (GCNs) are used to process and learn insights from the graph data for tasks such as link prediction, node classification, and learning node embeddings. The compute and memory access characteristics of GCNs differ, both from conventional graph analytics algorithms and from convolutional neural networks, rendering the existing accelerators for graph analytics as well as deep learning, inefficient. In this work, we propose PIM-GCN, a crossbar-based processing-in-memory (PIM) accelerator architecture for GCNs. PIM-GCN incorporates a node-stationary dataflow with support for both Compressed Sparse Row (CSR) and Compressed Sparse Column (CSC) graph data representations. We propose techniques for graph traversal in the compressed sparse domain, feature aggregation, and feature transformation operations in GCNs mapped to in-situ analog compute functions of crossbar memory, and present the trade-offs in performance, energy, and scalability aspects of the PIM-GCN architecture for CSR, and CSC graph data representations. PIM-GCN shows an average speedup of over $3-16times$ and an average energy reduction of $4-12times$ compared to the existing accelerator architectures.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124166380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
期刊
2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1