首页 > 最新文献

2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)最新文献

英文 中文
Hierarchical Layout Synthesis and Optimization Framework for High-Density Power Module Design Automation 高密度电源模块设计自动化的分层布局综合与优化框架
Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643545
Imam Al Razi, Quang Le, H. Mantooth, Yarui Peng
Multi-chip power module (MCPM) layout design automation has become an emerging research field in the power electronics society. MCPM physical design is currently a trial-and-error procedure that heavily relies on the designers' experience to produce a reliable solution. To push the boundary of energy efficiency and power density, novel packaging technologies are emerging with increasing design complexity. As this manual design process becomes the bottleneck in design productivity, the power electronics industry is calling for more intelligence in design CAD tools, especially for advanced packaging solutions with stacked substrates. This paper presents a physical design, synthesis, and optimization framework for 2D, 2.5D, and 3D power modules. Generic, scalable, and efficient physical design algorithms are implemented with optimization metaheuristics to solve the hierarchical layout synthesis problem. Corner stitching data structure and hierarchical constraint graph evaluation have been customized to better align with power electronics design considerations. A complete layout synthesis process is demonstrated for both 2D and 3D power module examples. Further, electro-thermal design optimization is carried out on a sample 3D MCPM layout using both exhaustive and evolutionary search methods. Our algorithm can generate 937 3D layouts in 56 s, resulting in 10 layouts on the Pareto-front. In addition, our optimized 3D layouts can achieve 1.3 nH loop inductance with 38 °C temperature rise and 836 mm2 footprint area, compared to 2D layouts with 8.5 nH, 99 °C, and 2000 mm2.
多芯片功率模块(MCPM)版图设计自动化已成为电力电子领域一个新兴的研究领域。MCPM物理设计目前是一个反复试验的过程,严重依赖于设计师的经验来产生可靠的解决方案。为了推动能源效率和功率密度的边界,新的封装技术不断涌现,设计的复杂性也越来越高。由于这种手工设计过程成为设计生产力的瓶颈,电力电子行业要求设计CAD工具更加智能化,特别是对于具有堆叠基板的先进封装解决方案。本文介绍了2D、2.5D和3D电源模块的物理设计、合成和优化框架。采用优化元启发式实现通用、可扩展、高效的物理设计算法,解决分层布局综合问题。角拼接数据结构和分层约束图评估已经定制,以更好地与电力电子设计考虑相一致。以二维和三维电源模块为例,演示了完整的布局合成过程。在此基础上,利用穷举和进化搜索方法对三维MCPM布局进行了电热优化设计。我们的算法可以在56秒内生成937个3D布局,在Pareto-front上生成10个布局。此外,我们优化的3D布局可以在38°C温升和836 mm2占地面积下实现1.3 nH环路电感,而2D布局为8.5 nH, 99°C和2000 mm2。
{"title":"Hierarchical Layout Synthesis and Optimization Framework for High-Density Power Module Design Automation","authors":"Imam Al Razi, Quang Le, H. Mantooth, Yarui Peng","doi":"10.1109/ICCAD51958.2021.9643545","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643545","url":null,"abstract":"Multi-chip power module (MCPM) layout design automation has become an emerging research field in the power electronics society. MCPM physical design is currently a trial-and-error procedure that heavily relies on the designers' experience to produce a reliable solution. To push the boundary of energy efficiency and power density, novel packaging technologies are emerging with increasing design complexity. As this manual design process becomes the bottleneck in design productivity, the power electronics industry is calling for more intelligence in design CAD tools, especially for advanced packaging solutions with stacked substrates. This paper presents a physical design, synthesis, and optimization framework for 2D, 2.5D, and 3D power modules. Generic, scalable, and efficient physical design algorithms are implemented with optimization metaheuristics to solve the hierarchical layout synthesis problem. Corner stitching data structure and hierarchical constraint graph evaluation have been customized to better align with power electronics design considerations. A complete layout synthesis process is demonstrated for both 2D and 3D power module examples. Further, electro-thermal design optimization is carried out on a sample 3D MCPM layout using both exhaustive and evolutionary search methods. Our algorithm can generate 937 3D layouts in 56 s, resulting in 10 layouts on the Pareto-front. In addition, our optimized 3D layouts can achieve 1.3 nH loop inductance with 38 °C temperature rise and 836 mm2 footprint area, compared to 2D layouts with 8.5 nH, 99 °C, and 2000 mm2.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127763316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Starfish: An Efficient P&R Co-Optimization Engine with A*-based Partial Rerouting 海星:基于A*的局部重路由的P&R协同优化引擎
Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643517
Fangzhou Wang, Lixin Liu, Jingsong Chen, Jinwei Liu, Xinshi Zang, Martin D. F. Wong
Placement and routing (P&R) are two important stages in the physical design flow. After circuit components are assigned locations by a placer, routing will take place to make the connections. Defined as two separate problems, placement and routing aim to optimize different objectives. For instance, placement usually focuses on optimizing the half-perimeter wire length (HPWL) and estimated congestion while routing will try to minimize the routed wire length and the number of overflows. The misalignment between the objectives will inevitably lead to a significant degradation in solution quality. Therefore, in this paper, we present Starfish, an efficient P&R co-optimization engine that bridges the gap between placement and routing. To incrementally optimize the routed wire length, Starfish conducts cell movements and reconnects broken nets by A*-based partial rerouting. Experimental results on the ICCAD 2020 contest benchmark suites [1] show that our co-optimizer outperforms all the contestants with better solution quality and much shorter runtime.
放置和布线(P&R)是物理设计流程中的两个重要阶段。在电路组件被分配位置后,布线将进行连接。定位和路径被定义为两个独立的问题,其目的是优化不同的目标。例如,布局通常侧重于优化半周线长(HPWL)和估计的拥塞,而路由将尝试最小化路由的线长和溢出的数量。目标之间的不一致将不可避免地导致解决方案质量的显著下降。因此,在本文中,我们提出了一种有效的P&R协同优化引擎Starfish,它可以弥合放置和路由之间的差距。为了逐步优化路由的导线长度,海星通过基于A*的部分重路由进行细胞运动并重新连接破碎的网。在ICCAD 2020竞赛基准套件上的实验结果[1]表明,我们的协同优化器以更好的解决方案质量和更短的运行时间优于所有竞争者。
{"title":"Starfish: An Efficient P&R Co-Optimization Engine with A*-based Partial Rerouting","authors":"Fangzhou Wang, Lixin Liu, Jingsong Chen, Jinwei Liu, Xinshi Zang, Martin D. F. Wong","doi":"10.1109/ICCAD51958.2021.9643517","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643517","url":null,"abstract":"Placement and routing (P&R) are two important stages in the physical design flow. After circuit components are assigned locations by a placer, routing will take place to make the connections. Defined as two separate problems, placement and routing aim to optimize different objectives. For instance, placement usually focuses on optimizing the half-perimeter wire length (HPWL) and estimated congestion while routing will try to minimize the routed wire length and the number of overflows. The misalignment between the objectives will inevitably lead to a significant degradation in solution quality. Therefore, in this paper, we present Starfish, an efficient P&R co-optimization engine that bridges the gap between placement and routing. To incrementally optimize the routed wire length, Starfish conducts cell movements and reconnects broken nets by A*-based partial rerouting. Experimental results on the ICCAD 2020 contest benchmark suites [1] show that our co-optimizer outperforms all the contestants with better solution quality and much shorter runtime.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"262 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132663477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Quantum Machine Learning for Finance ICCAD Special Session Paper 量子机器学习金融ICCAD特别会议论文
Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643469
Marco Pistoia, Syed Farhan Ahmad, Akshay Ajagekar, Alexander Buts, Shouvanik Chakrabarti, Dylan Herman, Shaohan Hu, Andrew Jena, Pierre Minssen, Pradeep Niroula, Arthur G. Rattew, Yue Sun, Romina Yalovetzky
Quantum computers are expected to surpass the computational capabilities of classical computers during this decade, and achieve disruptive impact on numerous industry sectors, particularly finance. In fact, finance is estimated to be the first industry sector to benefit from Quantum Computing not only in the medium and long terms, but even in the short term. This review paper presents the state of the art of quantum algorithms for financial applications, with particular focus to those use cases that can be solved via Machine Learning.
量子计算机有望在这十年内超越经典计算机的计算能力,并对许多行业,特别是金融领域产生颠覆性影响。事实上,金融预计将是第一个受益于量子计算的行业,不仅在中长期,甚至在短期内都是如此。这篇综述文章介绍了量子算法在金融应用中的最新进展,特别关注那些可以通过机器学习解决的用例。
{"title":"Quantum Machine Learning for Finance ICCAD Special Session Paper","authors":"Marco Pistoia, Syed Farhan Ahmad, Akshay Ajagekar, Alexander Buts, Shouvanik Chakrabarti, Dylan Herman, Shaohan Hu, Andrew Jena, Pierre Minssen, Pradeep Niroula, Arthur G. Rattew, Yue Sun, Romina Yalovetzky","doi":"10.1109/ICCAD51958.2021.9643469","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643469","url":null,"abstract":"Quantum computers are expected to surpass the computational capabilities of classical computers during this decade, and achieve disruptive impact on numerous industry sectors, particularly finance. In fact, finance is estimated to be the first industry sector to benefit from Quantum Computing not only in the medium and long terms, but even in the short term. This review paper presents the state of the art of quantum algorithms for financial applications, with particular focus to those use cases that can be solved via Machine Learning.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"175 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132200971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Algorithm and Hardware Co-design for Deep Learning-powered Channel Decoder: A Case Study 基于深度学习的信道解码器算法与硬件协同设计:一个案例研究
Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643510
Boyang Zhang, Yang Sui, Lingyi Huang, Siyu Liao, Chunhua Deng, Bo Yuan
Channel decoder is a key component module in many communication systems. Recently, neural networks-based channel decoders have been actively investigated because of the great potential of their data-driven decoding procedure. However, as the intersection among machine learning, information theory and hardware design, the efficient algorithm and hardware codesign of deep learning-powered channel decoder has not been well studied. This paper is a first step towards exploring the efficient DNN-enabled channel decoders, from a joint perspective of algorithm and hardware. We first revisit our recently proposed doubly residual neural decoder. By introducing the advanced architectural topology on the decoder design, the overall error-correcting performance can be significantly improved. Based on this algorithm, we further develop the corresponding systolic array-based hardware architecture for the DRN decoder. The corresponding FPGA implementation for our DRN decoder on short LDPC code is also developed.
信道解码器是许多通信系统的关键组件模块。近年来,基于神经网络的信道解码器由于其数据驱动解码过程的巨大潜力而受到积极的研究。然而,作为机器学习、信息论和硬件设计的交叉点,深度学习驱动信道解码器的高效算法和硬件协同设计还没有得到很好的研究。本文是从算法和硬件的联合角度探索有效的dnn信道解码器的第一步。我们首先重温我们最近提出的双残差神经解码器。通过在解码器设计中引入先进的体系结构拓扑,可以显著提高解码器的整体纠错性能。在此基础上,我们进一步开发了相应的基于收缩阵列的DRN解码器硬件架构。本文还开发了相应的短LDPC码DRN解码器的FPGA实现。
{"title":"Algorithm and Hardware Co-design for Deep Learning-powered Channel Decoder: A Case Study","authors":"Boyang Zhang, Yang Sui, Lingyi Huang, Siyu Liao, Chunhua Deng, Bo Yuan","doi":"10.1109/ICCAD51958.2021.9643510","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643510","url":null,"abstract":"Channel decoder is a key component module in many communication systems. Recently, neural networks-based channel decoders have been actively investigated because of the great potential of their data-driven decoding procedure. However, as the intersection among machine learning, information theory and hardware design, the efficient algorithm and hardware codesign of deep learning-powered channel decoder has not been well studied. This paper is a first step towards exploring the efficient DNN-enabled channel decoders, from a joint perspective of algorithm and hardware. We first revisit our recently proposed doubly residual neural decoder. By introducing the advanced architectural topology on the decoder design, the overall error-correcting performance can be significantly improved. Based on this algorithm, we further develop the corresponding systolic array-based hardware architecture for the DRN decoder. The corresponding FPGA implementation for our DRN decoder on short LDPC code is also developed.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114442518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Sampling-Based Approximate Logic Synthesis: An Explainable Machine Learning Approach 基于采样的近似逻辑综合:一种可解释的机器学习方法
Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643484
Wei Zeng, A. Davoodi, R. Topaloglu
Recent years have seen promising studies on machine learning (ML) techniques applied to approximate logic synthesis (ALS), especially based on logic reconstruction from samples of input-output pairs. This “sampling-based ALS” supports integration with conventional logic synthesis and optimization techniques, as well as synthesis for a constrained input space (e.g., when primary input values are restricted using Boolean relations). To achieve an effective sampling-based ALS, for the first time, this paper proposes the use of adaptive decision trees (ADTs), and in particular variations guided by explainable ML. We adopt SHAP importance, which is a feature importance metric derived from a recent advance in explainable ML to guide the training of ADTs. We also include approximation techniques for ADT which are specifically designed for ALS, including don't-care bit assertion and instantiation. Comprehensive experiments show that we can achieve 39%-42% area reduction with 0.20%-0.22% error rate on average, based on 15 logic functions in the IWLS'20 benchmark suite.
近年来,机器学习(ML)技术应用于近似逻辑合成(ALS),特别是基于输入输出对样本的逻辑重构的研究取得了很大进展。这种“基于采样的ALS”支持与传统逻辑合成和优化技术的集成,以及对受限输入空间的合成(例如,当主输入值使用布尔关系受到限制时)。为了实现有效的基于采样的ALS,本文首次提出使用自适应决策树(adt),特别是由可解释ML指导的变化。我们采用SHAP重要性,这是一种特征重要性度量,源自可解释ML的最新进展,以指导adt的训练。我们还包括专门为ALS设计的ADT近似技术,包括不关心位断言和实例化。综合实验表明,基于IWLS的20个基准测试套件中的15个逻辑函数,我们可以实现39%-42%的面积缩减,平均错误率为0.20%-0.22%。
{"title":"Sampling-Based Approximate Logic Synthesis: An Explainable Machine Learning Approach","authors":"Wei Zeng, A. Davoodi, R. Topaloglu","doi":"10.1109/ICCAD51958.2021.9643484","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643484","url":null,"abstract":"Recent years have seen promising studies on machine learning (ML) techniques applied to approximate logic synthesis (ALS), especially based on logic reconstruction from samples of input-output pairs. This “sampling-based ALS” supports integration with conventional logic synthesis and optimization techniques, as well as synthesis for a constrained input space (e.g., when primary input values are restricted using Boolean relations). To achieve an effective sampling-based ALS, for the first time, this paper proposes the use of adaptive decision trees (ADTs), and in particular variations guided by explainable ML. We adopt SHAP importance, which is a feature importance metric derived from a recent advance in explainable ML to guide the training of ADTs. We also include approximation techniques for ADT which are specifically designed for ALS, including don't-care bit assertion and instantiation. Comprehensive experiments show that we can achieve 39%-42% area reduction with 0.20%-0.22% error rate on average, based on 15 logic functions in the IWLS'20 benchmark suite.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115735095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Exploring Physical Synthesis for Circuits based on Emerging Reconfigurable Nanotechnologies 基于新兴可重构纳米技术的电路物理合成探索
Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643439
Andreas Krinke, Shubham Rai, Akash Kumar, J. Lienig
Recently proposed ambipolar nanotechnologies allow the development of reconfigurable circuits with low area and power overheads as compared to the conventional CMOS technology. However, using a conventional physical synthesis flow for circuits that include gates based on reconfigurable FETs (RFETs) leads to sub-optimal results. This is due to the fact that the physical synthesis flow for circuits based on RFETs has to cater to the additional gate terminal per RFET transistors. In the present work, we explore three important verticals that lead to an optimized physical synthesis flow for RFET-based circuits with circuit-level reconfigurability: (1) designing optimized layouts of reconfigurable gates, (2) utilize special driver cells to drive the reconfigurable portions of a circuit, and (3) optimized placement of these reconfigurable parts in separate power domains. Experimental evaluations over EPFL benchmarks using our proposed approach show a reduction in chip area of up to 17.5% when compared to conventional flows.
与传统的CMOS技术相比,最近提出的双极性纳米技术允许开发具有低面积和低功耗开销的可重构电路。然而,对于包括基于可重构场效应管(rfet)的门的电路,使用传统的物理合成流程会导致次优结果。这是由于基于RFET的电路的物理合成流必须迎合每个RFET晶体管的附加栅极终端。在目前的工作中,我们探索了三个重要的垂直方向,这些垂直方向导致具有电路级可重构性的基于rfet的电路的优化物理合成流程:(1)设计可重构门的优化布局,(2)利用特殊的驱动单元来驱动电路的可重构部分,以及(3)优化放置这些可重构部分在单独的功率域中。在EPFL基准测试中,使用我们提出的方法进行的实验评估表明,与传统流程相比,芯片面积减少了17.5%。
{"title":"Exploring Physical Synthesis for Circuits based on Emerging Reconfigurable Nanotechnologies","authors":"Andreas Krinke, Shubham Rai, Akash Kumar, J. Lienig","doi":"10.1109/ICCAD51958.2021.9643439","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643439","url":null,"abstract":"Recently proposed ambipolar nanotechnologies allow the development of reconfigurable circuits with low area and power overheads as compared to the conventional CMOS technology. However, using a conventional physical synthesis flow for circuits that include gates based on reconfigurable FETs (RFETs) leads to sub-optimal results. This is due to the fact that the physical synthesis flow for circuits based on RFETs has to cater to the additional gate terminal per RFET transistors. In the present work, we explore three important verticals that lead to an optimized physical synthesis flow for RFET-based circuits with circuit-level reconfigurability: (1) designing optimized layouts of reconfigurable gates, (2) utilize special driver cells to drive the reconfigurable portions of a circuit, and (3) optimized placement of these reconfigurable parts in separate power domains. Experimental evaluations over EPFL benchmarks using our proposed approach show a reduction in chip area of up to 17.5% when compared to conventional flows.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115391845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
CORLD: In-Stream Correlation Manipulation for Low-Discrepancy Stochastic Computing 低差异随机计算的流内相关操作
Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643450
Sina Asadi, M. Najafi, M. Imani
Stochastic computing (SC) is a re-emerging computing paradigm providing low-cost and noise-tolerant designs for a wide range of arithmetic operations. SC circuits operate on uniform bit-streams with the value determined by the probability of observing 1's in the bit-stream. The accuracy of SC operations highly depends on the correlation between input bit-streams. While some operations such as minimum and maximum value functions require highly correlated inputs, some other such as multiplication operation need uncorrelated or independent inputs for accurate computation. Developing low-cost and accurate correlation manipulation circuits is an important research in SC as these circuits can manage correlation between bit-streams without expensive bit-stream regeneration. This work proposes a novel in-stream correlator and decorrelator circuit that manages 1) correlation between stochastic bit-streams, and 2) distribution of 1's in the output bit-streams. Compared to state-of-the-art solutions, our designs achieve lower hardware cost and higher accuracy. The output bit-streams enjoy a low-discrepancy distribution of bits which leads to higher quality of results. The effectiveness of the proposed circuits is shown with two case studies: SC design of sorting and median filtering.
随机计算(SC)是一种重新兴起的计算范式,为广泛的算术运算提供低成本和耐噪声的设计。SC电路在均匀的比特流上工作,其值由在比特流中观察到1的概率决定。SC操作的准确性高度依赖于输入比特流之间的相关性。虽然一些操作(如最小值函数和最大值函数)需要高度相关的输入,但其他一些操作(如乘法操作)需要不相关或独立的输入以进行精确计算。开发低成本、精确的相关处理电路是集成电路的重要研究方向,因为这些电路可以在不需要昂贵的比特流再生的情况下管理比特流之间的相关。这项工作提出了一种新的流内相关和去相关电路,它可以管理1)随机比特流之间的相关性,以及2)输出比特流中1的分布。与最先进的解决方案相比,我们的设计实现了更低的硬件成本和更高的精度。输出的比特流具有比特的低差异分布,从而获得更高质量的结果。通过两个案例研究表明了所提电路的有效性:排序的SC设计和中值滤波。
{"title":"CORLD: In-Stream Correlation Manipulation for Low-Discrepancy Stochastic Computing","authors":"Sina Asadi, M. Najafi, M. Imani","doi":"10.1109/ICCAD51958.2021.9643450","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643450","url":null,"abstract":"Stochastic computing (SC) is a re-emerging computing paradigm providing low-cost and noise-tolerant designs for a wide range of arithmetic operations. SC circuits operate on uniform bit-streams with the value determined by the probability of observing 1's in the bit-stream. The accuracy of SC operations highly depends on the correlation between input bit-streams. While some operations such as minimum and maximum value functions require highly correlated inputs, some other such as multiplication operation need uncorrelated or independent inputs for accurate computation. Developing low-cost and accurate correlation manipulation circuits is an important research in SC as these circuits can manage correlation between bit-streams without expensive bit-stream regeneration. This work proposes a novel in-stream correlator and decorrelator circuit that manages 1) correlation between stochastic bit-streams, and 2) distribution of 1's in the output bit-streams. Compared to state-of-the-art solutions, our designs achieve lower hardware cost and higher accuracy. The output bit-streams enjoy a low-discrepancy distribution of bits which leads to higher quality of results. The effectiveness of the proposed circuits is shown with two case studies: SC design of sorting and median filtering.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115400834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Fast and Accurate PPA Modeling with Transfer Learning 快速和准确的PPA建模与迁移学习
Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643533
W. R. Davis, P. Franzon, Luis Francisco, Bill Huggins, Rajeev Jain
The power, performance and area (PPA) of digital blocks can vary 10:1 based on their synthesis, place, and route tool recipes. With rapid increase in number of PVT corners and complexity of logic functions approaching 10M gates, industry has an acute need to minimize the human resources, compute servers, and EDA licenses needed to achieve a Pareto optimal recipe. We first present models for fast accurate PPA prediction that can reduce the manual optimization iterations with EDA tools. Secondly we investigate techniques to automate the PPA optimization using evolutionary algorithms. For PPA prediction, a baseline model is trained on a known design using Latin hypercube sample runs of the EDA tool, and transfer learning is then used to train the model for an unseen design. For a known design the baseline needed 150 training runs to achieve a 95% accuracy. With transfer learning the same accuracy was achieved on a different (unseen) design in only 15 runs indicating the viability of transfer learning to generalize PPA models. The PPA optimization technique, based on evolutionary algorithms, effectively combines the PPA modeling and optimization. Our approach reached the same PPA solution as human designers in the same or fewer runs for a CORTEX-M0 system design. This shows potential for automating the recipe optimization without needing more runs than a human designer would need.
数字块的功率、性能和面积(PPA)可以根据其合成、放置和路由工具配方变化10:1。随着PVT拐角数量的快速增加和逻辑功能的复杂性接近10M门,业界迫切需要最大限度地减少实现Pareto最优配方所需的人力资源、计算服务器和EDA许可。我们首先提出了快速准确的PPA预测模型,可以减少EDA工具的手动优化迭代。其次,我们研究了使用进化算法自动化PPA优化的技术。对于PPA预测,使用EDA工具的拉丁超立方体样本运行在已知设计上训练基线模型,然后使用迁移学习来训练未知设计的模型。对于已知的设计,基线需要150次训练才能达到95%的准确率。通过迁移学习,在15次运行中,在不同的(未见过的)设计上实现了相同的精度,这表明迁移学习推广PPA模型的可行性。基于进化算法的PPA优化技术有效地将PPA建模与优化相结合。对于CORTEX-M0系统设计,我们的方法在相同或更少的运行中达到了与人类设计人员相同的PPA解决方案。这显示了自动化配方优化的潜力,而不需要比人工设计人员更多的运行。
{"title":"Fast and Accurate PPA Modeling with Transfer Learning","authors":"W. R. Davis, P. Franzon, Luis Francisco, Bill Huggins, Rajeev Jain","doi":"10.1109/ICCAD51958.2021.9643533","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643533","url":null,"abstract":"The power, performance and area (PPA) of digital blocks can vary 10:1 based on their synthesis, place, and route tool recipes. With rapid increase in number of PVT corners and complexity of logic functions approaching 10M gates, industry has an acute need to minimize the human resources, compute servers, and EDA licenses needed to achieve a Pareto optimal recipe. We first present models for fast accurate PPA prediction that can reduce the manual optimization iterations with EDA tools. Secondly we investigate techniques to automate the PPA optimization using evolutionary algorithms. For PPA prediction, a baseline model is trained on a known design using Latin hypercube sample runs of the EDA tool, and transfer learning is then used to train the model for an unseen design. For a known design the baseline needed 150 training runs to achieve a 95% accuracy. With transfer learning the same accuracy was achieved on a different (unseen) design in only 15 runs indicating the viability of transfer learning to generalize PPA models. The PPA optimization technique, based on evolutionary algorithms, effectively combines the PPA modeling and optimization. Our approach reached the same PPA solution as human designers in the same or fewer runs for a CORTEX-M0 system design. This shows potential for automating the recipe optimization without needing more runs than a human designer would need.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"103 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113954482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hotspot Detection via Multi-task Learning and Transformer Encoder 基于多任务学习和变压器编码器的热点检测
Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643590
Binwu Zhu, Ran Chen, Xinyun Zhang, Fan Yang, Xuan Zeng, Bei Yu, Martin D. F. Wong
With the rapid development of semiconductors and the continuous scaling-down of circuit feature size, hotspot detection has become much more challenging and crucial as a critical step in the physical verification flow. In recent years, advanced deep learning techniques have spawned many frameworks for hotspot detection. However, most existing hotspot detectors can only detect defects arising in the central region of small clips, making the whole detection process time-consuming on large layouts. Some advanced hotspot detectors can detect multiple hotspots in a large area but need to propose potential defect regions, and a refinement step is required to locate the hotspot precisely. To simplify the procedure of multi-stage detectors, an end - to-end single-stage hotspot detector is proposed to identify hotspots on large scales without refining potential regions. Besides, multiple tasks are developed to learn various pattern topological features. Also, a feature aggregation module based on Transformer Encoder is designed to globally capture the relationship between different features, further enhancing the feature representation ability. Experimental results show that our proposed framework achieves higher accuracy over prior methods with faster inference speed.
随着半导体技术的快速发展和电路特征尺寸的不断缩小,热点检测作为物理验证流程中的关键步骤变得越来越具有挑战性和重要性。近年来,先进的深度学习技术催生了许多热点检测框架。然而,现有的热点检测器大多只能检测小夹片中心区域产生的缺陷,在大版图上,整个检测过程非常耗时。一些先进的热点检测器可以检测到大面积的多个热点,但需要提出潜在的缺陷区域,并且需要一个细化步骤来精确定位热点。为了简化多级探测过程,提出了一种端到端单级热点探测器,可以在不细化电位区域的情况下,在大范围内识别热点。此外,还开发了多个任务来学习各种模式拓扑特征。同时,设计了基于Transformer Encoder的特征聚合模块,全局捕获不同特征之间的关系,进一步增强了特征表示能力。实验结果表明,该框架比现有方法具有更高的准确率和更快的推理速度。
{"title":"Hotspot Detection via Multi-task Learning and Transformer Encoder","authors":"Binwu Zhu, Ran Chen, Xinyun Zhang, Fan Yang, Xuan Zeng, Bei Yu, Martin D. F. Wong","doi":"10.1109/ICCAD51958.2021.9643590","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643590","url":null,"abstract":"With the rapid development of semiconductors and the continuous scaling-down of circuit feature size, hotspot detection has become much more challenging and crucial as a critical step in the physical verification flow. In recent years, advanced deep learning techniques have spawned many frameworks for hotspot detection. However, most existing hotspot detectors can only detect defects arising in the central region of small clips, making the whole detection process time-consuming on large layouts. Some advanced hotspot detectors can detect multiple hotspots in a large area but need to propose potential defect regions, and a refinement step is required to locate the hotspot precisely. To simplify the procedure of multi-stage detectors, an end - to-end single-stage hotspot detector is proposed to identify hotspots on large scales without refining potential regions. Besides, multiple tasks are developed to learn various pattern topological features. Also, a feature aggregation module based on Transformer Encoder is designed to globally capture the relationship between different features, further enhancing the feature representation ability. Experimental results show that our proposed framework achieves higher accuracy over prior methods with faster inference speed.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"212 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124159522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Crossbar based Processing in Memory Accelerator Architecture for Graph Convolutional Networks 基于交叉条的图卷积网络内存加速体系结构处理
Pub Date : 2021-11-01 DOI: 10.1109/ICCAD51958.2021.9643465
Nagadastagiri Challapalle, Karthik Swaminathan, Nandhini Chandramoorthy, V. Narayanan
Graph data structures are central to many applications such as social networks, citation networks, molecular interactions, and navigation systems. Graph Convolutional Networks (GCNs) are used to process and learn insights from the graph data for tasks such as link prediction, node classification, and learning node embeddings. The compute and memory access characteristics of GCNs differ, both from conventional graph analytics algorithms and from convolutional neural networks, rendering the existing accelerators for graph analytics as well as deep learning, inefficient. In this work, we propose PIM-GCN, a crossbar-based processing-in-memory (PIM) accelerator architecture for GCNs. PIM-GCN incorporates a node-stationary dataflow with support for both Compressed Sparse Row (CSR) and Compressed Sparse Column (CSC) graph data representations. We propose techniques for graph traversal in the compressed sparse domain, feature aggregation, and feature transformation operations in GCNs mapped to in-situ analog compute functions of crossbar memory, and present the trade-offs in performance, energy, and scalability aspects of the PIM-GCN architecture for CSR, and CSC graph data representations. PIM-GCN shows an average speedup of over $3-16times$ and an average energy reduction of $4-12times$ compared to the existing accelerator architectures.
图数据结构是许多应用程序的核心,如社交网络、引文网络、分子相互作用和导航系统。图卷积网络(GCNs)用于处理和从图数据中学习洞察力,用于链路预测、节点分类和学习节点嵌入等任务。在这项工作中,我们提出了PIM- gcn,这是一种基于交叉棒的内存处理(PIM)加速器架构。PIM-GCN结合了节点平稳数据流,支持压缩稀疏行(CSR)和压缩稀疏列(CSC)图形数据表示。我们提出了压缩稀疏域的图遍历、特征聚合和映射到交叉棒存储器的原位模拟计算功能的gcn中的特征转换操作的技术,并提出了用于CSR和CSC图数据表示的PIM-GCN架构在性能、能量和可扩展性方面的权衡。与现有的加速器架构相比,PIM-GCN的平均加速速度超过3-16倍,平均能耗降低4-12倍。
{"title":"Crossbar based Processing in Memory Accelerator Architecture for Graph Convolutional Networks","authors":"Nagadastagiri Challapalle, Karthik Swaminathan, Nandhini Chandramoorthy, V. Narayanan","doi":"10.1109/ICCAD51958.2021.9643465","DOIUrl":"https://doi.org/10.1109/ICCAD51958.2021.9643465","url":null,"abstract":"Graph data structures are central to many applications such as social networks, citation networks, molecular interactions, and navigation systems. Graph Convolutional Networks (GCNs) are used to process and learn insights from the graph data for tasks such as link prediction, node classification, and learning node embeddings. The compute and memory access characteristics of GCNs differ, both from conventional graph analytics algorithms and from convolutional neural networks, rendering the existing accelerators for graph analytics as well as deep learning, inefficient. In this work, we propose PIM-GCN, a crossbar-based processing-in-memory (PIM) accelerator architecture for GCNs. PIM-GCN incorporates a node-stationary dataflow with support for both Compressed Sparse Row (CSR) and Compressed Sparse Column (CSC) graph data representations. We propose techniques for graph traversal in the compressed sparse domain, feature aggregation, and feature transformation operations in GCNs mapped to in-situ analog compute functions of crossbar memory, and present the trade-offs in performance, energy, and scalability aspects of the PIM-GCN architecture for CSR, and CSC graph data representations. PIM-GCN shows an average speedup of over $3-16times$ and an average energy reduction of $4-12times$ compared to the existing accelerator architectures.","PeriodicalId":370791,"journal":{"name":"2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124166380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
期刊
2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1