2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)最新文献

英文中文

Batch Sequential Black-box Optimization with Embedding Alignment Cells for Logic Synthesis 逻辑合成中嵌入对齐单元的批量顺序黑盒优化

2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549363

Chang Feng, Wenlong Lyu, Zhitang Chen, Junjie Ye, M. Yuan, Jianye Hao

During the logic synthesis flow of EDA, a sequence of graph transformation operators are applied to the circuits so that the Quality of Results (QoR) of the circuits highly depends on the chosen operators and their specific parameters in the sequence, making the search space operator-dependent and increasingly exponential. In this paper, we formulate the logic synthesis design space exploration as a conditional sequence optimization problem, where at each transformation step, an optimization operator is selected and its corresponding parameters are decided. To solve this problem, we propose a novel sequential black-box optimization approach without human intervention: 1) Due to the conditional and sequential structure of operator sequence with variable length, we build an embedding alignment cells based recurrent neural network as a surrogate model to estimate the QoR of the logic synthesis flow with historical data. 2) With the surrogate model, we construct acquisition function to balance exploration and exploitation with respect to each metric of the QoR. 3) We use multi-objective optimization algorithm to find the Pareto front of the acquisition functions, along which a batch of sequences, consisting of parameterized operators, are (randomly) selected to users for evaluation under the budget of computing resource. We repeat the above three steps until convergence or time limit. Experimental results on public EPFL benchmarks demonstrate the superiority of our approach over the expert-crafted optimization flows and other machine learning based methods. Compared to resyn2, we achieve 11.8% LUT-6 count descent improvements without sacrificing level values.

在EDA的逻辑合成流程中，对电路应用了一系列图变换算子，使得电路的结果质量(QoR)高度依赖于所选择的算子及其序列中的特定参数，使得搜索空间与算子相关且呈指数增长。本文将逻辑综合设计空间探索问题表述为一个条件序列优化问题，在每个变换步骤中选择一个优化算子并确定其相应的参数。为了解决这一问题，我们提出了一种无需人工干预的顺序黑盒优化方法:1)针对变长算子序列的条件和顺序结构，我们构建了一个基于嵌入对齐单元的递归神经网络作为代理模型来估计具有历史数据的逻辑综合流的QoR。2)使用代理模型，我们构建获取函数来平衡QoR的每个度量的勘探和开发。3)利用多目标优化算法找到采集函数的Pareto前沿，在计算资源预算下，随机选取一批参数化算子序列供用户评价。我们重复以上三个步骤，直到收敛或时间限制。公共EPFL基准测试的实验结果表明，我们的方法优于专家制作的优化流程和其他基于机器学习的方法。与resyn2相比，我们在不牺牲关卡值的情况下实现了11.8%的LUT-6计数下降改进。

{"title":"Batch Sequential Black-box Optimization with Embedding Alignment Cells for Logic Synthesis","authors":"Chang Feng, Wenlong Lyu, Zhitang Chen, Junjie Ye, M. Yuan, Jianye Hao","doi":"10.1145/3508352.3549363","DOIUrl":"https://doi.org/10.1145/3508352.3549363","url":null,"abstract":"During the logic synthesis flow of EDA, a sequence of graph transformation operators are applied to the circuits so that the Quality of Results (QoR) of the circuits highly depends on the chosen operators and their specific parameters in the sequence, making the search space operator-dependent and increasingly exponential. In this paper, we formulate the logic synthesis design space exploration as a conditional sequence optimization problem, where at each transformation step, an optimization operator is selected and its corresponding parameters are decided. To solve this problem, we propose a novel sequential black-box optimization approach without human intervention: 1) Due to the conditional and sequential structure of operator sequence with variable length, we build an embedding alignment cells based recurrent neural network as a surrogate model to estimate the QoR of the logic synthesis flow with historical data. 2) With the surrogate model, we construct acquisition function to balance exploration and exploitation with respect to each metric of the QoR. 3) We use multi-objective optimization algorithm to find the Pareto front of the acquisition functions, along which a batch of sequences, consisting of parameterized operators, are (randomly) selected to users for evaluation under the budget of computing resource. We repeat the above three steps until convergence or time limit. Experimental results on public EPFL benchmarks demonstrate the superiority of our approach over the expert-crafted optimization flows and other machine learning based methods. Compared to resyn2, we achieve 11.8% LUT-6 count descent improvements without sacrificing level values.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"490 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115305273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

A High-precision Stochastic Solver for Steady-state Thermal Analysis with Fourier Heat Transfer Robin Boundary Conditions 具有傅里叶传热罗宾边界条件的稳态热分析高精度随机求解器

2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549457

L. Yang, Cuiyang Ding, Changhao Yan, Dian Zhou, Xuan Zeng

In this work, we propose a path integral random walk (PIRW) solver, the first accurate stochastic method for steady-state thermal analysis with mixed boundary conditions, especially involving Fourier heat transfer Robin boundary conditions. We innovatively adopt the strictly correct calculation of the local time and the Feynman-Kac functional eˆc (t) to handle Neumann and Robin boundary conditions with high precision. Compared with ANSYS, experimental results show that PIRW achieves over 121× speedup and over 83× storage space reduction with a negligible error within 0.8°C at a single point. An application combining PIRW with low-accuracy ANSYS for the temperature calculation at hot-spots is provided as a more accurate and faster solution than only ANSYS used.

在这项工作中，我们提出了一个路径积分随机漫步(PIRW)求解器，这是第一个精确的随机方法，用于混合边界条件下的稳态热分析，特别是涉及傅里叶传热罗宾边界条件。我们创新性地采用严格正确的局部时间计算和Feynman-Kac函数e³c (t)，高精度地处理Neumann和Robin边界条件。实验结果表明，与ANSYS相比，PIRW在单点0.8°C范围内实现了121x以上的加速和83x以上的存储空间缩减，误差可以忽略。将PIRW与低精度ANSYS相结合用于热点温度计算，是一种比仅使用ANSYS更准确、更快速的解决方案。

引用次数: 0

GIA: A Reusable General Interposer Architecture for Agile Chiplet Integration GIA:用于敏捷芯片集成的可重用通用中间层体系结构

2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549464

Fuping Li, Ying Wang, Yuanqing Cheng, Yujie Wang, Yinhe Han, Huawei Li, Xiaowei Li

2.5D chiplet technology is gaining popularity for the eﬃciency of integrating multiple heterogeneous dies or chiplets on interposers, and it is also considered an ideal option for agile silicon system design by mitigating the huge design, verification, and manufacturing overhead of monolithic SoCs. Although it significantly reduces development costs by chiplet reuse, the design and fabrication of interposers also introduce additional high non-recurring engineering (NRE) costs and development cycles which might be prohibitive for application-specific designs having low volume.To address this challenge, in this paper, we propose a reusable general interposer architecture (GIA) to amortize NRE costs and accelerate integration flows of interposers across different chiplet-based systems effectively. The proposed assembly-time configurable interposer architecture covers both active interposers and passive interposers considering diverse applications of 2.5D systems. The agile interposer integration is also facilitated by a novel end-to-end design automation framework to generate optimal system assembly configurations including the selection of chiplets, inter-chiplet network configuration, placement of chiplets, and mapping on GIA, which are specialized for the given target workload. The experimental results show that our proposed active GIA and passive GIA achieve 3.15x and 60.92x performance boost with 2.57x and 2.99x power saving over baselines respectively.

2.5D晶片技术因其在中间体上集成多个异构芯片或晶片的效率而越来越受欢迎，并且通过减轻单片soc的巨大设计、验证和制造开销，它也被认为是敏捷硅系统设计的理想选择。虽然通过芯片的重复使用可以显著降低开发成本，但中间体的设计和制造也会引入额外的高非重复工程(NRE)成本和开发周期，这对于小批量的特定应用设计来说可能是令人望而却步的。为了解决这一挑战，在本文中，我们提出了一个可重用的通用中间层架构(GIA)来分摊NRE成本，并有效地加速跨不同基于芯片的系统的中间层集成流程。考虑到2.5D系统的不同应用，所提出的装配时可配置中介器架构涵盖了主动中介器和被动中介器。灵活的中间体集成还通过一个新颖的端到端设计自动化框架来促进，以生成最佳的系统装配配置，包括小芯片的选择、小芯片间网络配置、小芯片的放置和在GIA上的映射，这些都是专门针对给定的目标工作负载的。实验结果表明，我们提出的有源GIA和无源GIA在基准上分别实现了3.15倍和60.92倍的性能提升，分别节省2.57倍和2.99倍的功耗。

{"title":"GIA: A Reusable General Interposer Architecture for Agile Chiplet Integration","authors":"Fuping Li, Ying Wang, Yuanqing Cheng, Yujie Wang, Yinhe Han, Huawei Li, Xiaowei Li","doi":"10.1145/3508352.3549464","DOIUrl":"https://doi.org/10.1145/3508352.3549464","url":null,"abstract":"2.5D chiplet technology is gaining popularity for the eﬃciency of integrating multiple heterogeneous dies or chiplets on interposers, and it is also considered an ideal option for agile silicon system design by mitigating the huge design, verification, and manufacturing overhead of monolithic SoCs. Although it significantly reduces development costs by chiplet reuse, the design and fabrication of interposers also introduce additional high non-recurring engineering (NRE) costs and development cycles which might be prohibitive for application-specific designs having low volume.To address this challenge, in this paper, we propose a reusable general interposer architecture (GIA) to amortize NRE costs and accelerate integration flows of interposers across different chiplet-based systems effectively. The proposed assembly-time configurable interposer architecture covers both active interposers and passive interposers considering diverse applications of 2.5D systems. The agile interposer integration is also facilitated by a novel end-to-end design automation framework to generate optimal system assembly configurations including the selection of chiplets, inter-chiplet network configuration, placement of chiplets, and mapping on GIA, which are specialized for the given target workload. The experimental results show that our proposed active GIA and passive GIA achieve 3.15x and 60.92x performance boost with 2.57x and 2.99x power saving over baselines respectively.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"516 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123251687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

An Approach to Unlocking Cyclic Logic Locking - LOOPLock 2.0 一种解锁循环逻辑锁的方法——LOOPLock 2.0

2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549461

Pei-Pei Chen, Xiang-Min Yang, Yi-Ting Li, Yung-Chih Chen, Chun-Yao Wang

Cyclic logic locking is a new type of SAT-resistant techniques in hardware security. Recently, LOOPLock 2.0 was proposed, which is a cyclic logic locking method creating cycles deliberately in the locked circuit to resist SAT Attack, CycSAT, BeSAT, and Removal Attack simultaneously. The key idea of LOOPLock 2.0 is that the resultant circuit is still cyclic no matter the key vector is correct or not. This property refuses attackers and demonstrates its success on defending against attackers. In this paper, we propose an unlocking approach to LOOPLock 2.0 based on structure analysis and SAT solvers. Specifically, we identify and remove non-combinational cycles in the locked circuit before running SAT solvers. The experimental results show that the proposed unlocking approach is promising.

循环逻辑锁是硬件安全领域中一种新型的抗sat技术。最近，LOOPLock 2.0被提出，它是一种循环逻辑锁定方法，在被锁电路中故意创建循环，以同时抵抗SAT攻击、CycSAT、BeSAT和移除攻击。LOOPLock 2.0的关键思想是，无论键向量正确与否，生成的电路仍然是循环的。此属性拒绝攻击者，并证明其在防御攻击者方面是成功的。在本文中，我们提出了一种基于结构分析和SAT求解器的LOOPLock 2.0解锁方法。具体来说，我们在运行SAT求解器之前识别并移除锁定电路中的非组合周期。实验结果表明，所提出的解锁方法是有前途的。

引用次数: 1

INDENT: Incremental Online Decision Tree Training for Domain-Specific Systems-on-Chip 针对特定领域的片上系统的增量在线决策树训练

2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549436

A. Krishnakumar, R. Marculescu, Ümit Y. Ogras

The performance and energy efficiency potential of heterogeneous architectures has fueled domain-specific systems-on-chip (DSSoCs) that integrate general-purpose and domain-specialized hardware accelerators. Decision trees (DTs) perform high-quality, low-latency task scheduling to utilize the massive parallelism and heterogeneity in DSSoCs effectively. However, offline trained DT scheduling policies can quickly become ineffective when applications or hardware configurations change. There is a critical need for runtime techniques to train DTs incrementally without sacrificing accuracy since current training approaches have large memory and computational power requirements. To address this need, we propose INDENT, an incremental online DT framework to update the scheduling policy and adapt it to unseen scenarios. INDENT updates DT schedulers at runtime using only 1-8% of the original training data embedded during training. Thorough evaluations with hardware platforms and DSSoC simulators demonstrate that INDENT performs within 5% of a DT trained from scratch using the entire dataset and outperforms current state-of-the-art approaches.

异构架构的性能和能源效率潜力推动了集成通用和领域专用硬件加速器的特定领域的片上系统(dssoc)。决策树(DTs)执行高质量、低延迟的任务调度，以有效地利用dssoc中的大规模并行性和异构性。但是，当应用程序或硬件配置发生变化时，离线训练的DT调度策略可能很快失效。由于当前的训练方法需要大量的内存和计算能力，因此迫切需要运行时技术在不牺牲准确性的情况下增量地训练dt。为了解决这一需求，我们提出了INDENT，一个增量在线DT框架来更新调度策略并使其适应未知的场景。缩进在运行时更新DT调度器，只使用在训练期间嵌入的原始训练数据的1-8%。对硬件平台和DSSoC模拟器的全面评估表明，使用整个数据集从头开始训练的DT, INDENT的执行率在5%以内，优于当前最先进的方法。

{"title":"INDENT: Incremental Online Decision Tree Training for Domain-Specific Systems-on-Chip","authors":"A. Krishnakumar, R. Marculescu, Ümit Y. Ogras","doi":"10.1145/3508352.3549436","DOIUrl":"https://doi.org/10.1145/3508352.3549436","url":null,"abstract":"The performance and energy efficiency potential of heterogeneous architectures has fueled domain-specific systems-on-chip (DSSoCs) that integrate general-purpose and domain-specialized hardware accelerators. Decision trees (DTs) perform high-quality, low-latency task scheduling to utilize the massive parallelism and heterogeneity in DSSoCs effectively. However, offline trained DT scheduling policies can quickly become ineffective when applications or hardware configurations change. There is a critical need for runtime techniques to train DTs incrementally without sacrificing accuracy since current training approaches have large memory and computational power requirements. To address this need, we propose INDENT, an incremental online DT framework to update the scheduling policy and adapt it to unseen scenarios. INDENT updates DT schedulers at runtime using only 1-8% of the original training data embedded during training. Thorough evaluations with hardware platforms and DSSoC simulators demonstrate that INDENT performs within 5% of a DT trained from scratch using the entire dataset and outperforms current state-of-the-art approaches.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126044263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Reliable Computing of ReRAM Based Compute-in-Memory Circuits for AI Edge Devices 基于ReRAM的AI边缘设备内存计算电路的可靠计算

2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2022-10-29 DOI: 10.1145/3508352.3561119

Meng-Fan Chang, Je-Ming Hung, Ping-Cheng Chen, Tai-Hao Wen

Compute-in-memory macros based on non-volatile memory (nvCIM) are a promising approach to break through the memory bottleneck for artificial intelligence (AI) edge devices; however, the development of these devices involves unavoidable tradeoffs between reliability, energy efficiency, computing latency, and readout accuracy. This paper outlines the background of ReRAM-based nvCIM as well as the major challenges in its further development, including process variation in ReRAM devices and transistors and the small signal margins associated with variation in input-weight patterns. This paper also investigates the error model of a nvCIM macro, and the correspondent degradation of inference accuracy as a function of error model when using nvCIM macros. Finally, we summarize recent trends and advances in the development of reliable ReRAM-based nvCIM macro.

基于非易失性存储器(nvCIM)的内存宏计算是突破人工智能(AI)边缘设备内存瓶颈的一种有前途的方法;然而，这些设备的开发涉及到可靠性、能源效率、计算延迟和读出精度之间不可避免的权衡。本文概述了基于ReRAM的nvCIM的背景以及其进一步发展中的主要挑战，包括ReRAM器件和晶体管的工艺变化以及与输入权重模式变化相关的小信号裕度。本文还研究了nvCIM宏的误差模型，以及在使用nvCIM宏时，误差模型对推理精度的影响。最后，我们总结了可靠的基于reram的nvCIM宏开发的最新趋势和进展。

引用次数: 1

Securing Hardware through Reconfigurable Nano-structures (Invited Paper) 通过可重构纳米结构保护硬件(特邀论文)

2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2022-10-29 DOI: 10.1145/3508352.3561116

N. Kavand, A. Darjani, Shubham Rai, Akash Kumar

Hardware security has been an ever-growing concern of the integrated circuit (IC) designers. Through different stages in the IC design and life cycle, an adversary can extract sensitive design information and private data stored in the circuit using logical, physical, and structural weaknesses. Besides, in recent times, ML-based attacks have become the new de facto standard in hardware security community. Contemporary defense strategies are often facing unforeseen challenges to cope up with these attack schemes. Additionally, the high overhead of the CMOS-based secure add-on circuitry and intrinsic limitations of these devices indicate the need for new nano-electronics. Emerging reconfigurable devices like Reconfigurable Field Effect transistors (RFETs) provide unique features to fortify the design against various threats at different stages in the IC design and life cycle. In this manuscript, we investigate the applications of the RFETs for securing the design against traditional and machine learning (ML)-based intellectual property (IP) piracy techniques and side-channel attacks (SCAs).

硬件安全一直是集成电路设计人员日益关注的问题。通过IC设计和生命周期的不同阶段，攻击者可以利用逻辑、物理和结构弱点提取存储在电路中的敏感设计信息和私有数据。此外，近年来，基于机器学习的攻击已经成为硬件安全社区新的事实标准。当代的防御战略往往面临着无法预见的挑战，以应对这些攻击计划。此外，基于cmos的安全附加电路的高开销和这些器件的固有局限性表明需要新的纳米电子学。新兴的可重构器件，如可重构场效应晶体管(rfet)提供了独特的功能，以加强设计抵御IC设计和生命周期不同阶段的各种威胁。在本文中，我们研究了rfet在保护设计免受传统和基于机器学习(ML)的知识产权(IP)盗版技术和侧信道攻击(sca)方面的应用。

引用次数: 1

DCIM-GCN: Digital Computing-in-Memory to Efficiently Accelerate Graph Convolutional Networks DCIM-GCN:有效加速图卷积网络的数字内存计算

2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549465

Yikan Qiu, Yufei Ma, Wentao Zhao, Meng Wu, Le Ye, Ru Huang

Computing-in-memory (CIM) is emerging as a promising architecture to accelerate graph convolutional networks (GCNs) normally bounded by redundant and irregular memory transactions. Current analog based CIM requires frequent analog and digital conversions (AD/DA) that dominate the overall area and power consumption. Furthermore, the analog non-ideality degrades the accuracy and reliability of CIM. In this work, an SRAM based digital CIM system is proposed to accelerate memory intensive GCNs, namely DCIM-GCN, which covers innovations from CIM circuit level eliminating costly AD/DA converters to architecture level addressing irregularity and sparsity of graph data. DCIM-GCN achieves 2.07×, 1.76×, and 1.89× speedup and 29.98×, 1.29×, and 3.73× energy efficiency improvement on average over CIM based PIMGCN, TARe, and PIM-GCN, respectively.

内存计算(CIM)作为一种很有前途的架构正在兴起，用于加速通常受冗余和不规则内存事务限制的图卷积网络(GCNs)。当前基于模拟的CIM需要频繁的模拟和数字转换(AD/DA)，这在总体面积和功耗方面占主导地位。此外，模拟的非理想性降低了CIM的精度和可靠性。在这项工作中，提出了一种基于SRAM的数字CIM系统来加速内存密集型gcn，即DCIM-GCN，它涵盖了从CIM电路级消除昂贵的AD/DA转换器到解决图数据不规则性和稀疏性的架构级创新。与基于CIM的PIMGCN、TARe和pimm - gcn相比，DCIM-GCN的平均速度提升了2.07倍、1.76倍和1.89倍，能效提升了29.98倍、1.29倍和3.73倍。

引用次数: 1

An MLIR-based Compiler Flow for System-Level Design and Hardware Acceleration 基于mlir的系统级设计和硬件加速编译流程

2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2022-10-29 DOI: 10.1145/3508352.3549424

Nicolas Bohm Agostini, S. Curzel, Vinay C. Amatya, Cheng Tan, Marco Minutoli, Vito Giovanni Castellana, J. Manzano, D. Kaeli, Antonino Tumeo

The generation of custom hardware accelerators for applications implemented within high-level productive programming frameworks requires considerable manual effort. To automate this process, we introduce SODA-OPT, a compiler tool that extends the MLIR infrastructure. SODA-OPT automatically searches, outlines, tiles, and pre-optimizes relevant code regions to generate high-quality accelerators through high-level synthesis. SODA-OPT can support any high-level programming framework and domain-specific language that interface with the MLIR infrastructure. By leveraging MLIR, SODA-OPT solves compiler optimization problems with specialized abstractions. Backend synthesis tools connect to SODA-OPT through progressive intermediate representation lowerings. SODA-OPT interfaces to a design space exploration engine to identify the combination of compiler optimization passes and options that provides high-performance generated designs for different backends and targets. We demonstrate the practical applicability of the compilation flow by exploring the automatic generation of accelerators for deep neural networks operators outlined at arbitrary granularity and by combining outlining with tiling on large convolution layers. Experimental results with kernels from the PolyBench benchmark show that our high-level optimizations improve execution delays of synthesized accelerators up to 60x. We also show that for the selected kernels, our solution outperforms the current of state-of-the art in more than 70% of the benchmarks and provides better average speedup in 55% of them. SODA-OPT is an open source project available at https://gitlab.pnnl.gov/sodalite/soda-opt.

为在高级生产编程框架内实现的应用程序生成定制硬件加速器需要大量的手工工作。为了使这个过程自动化，我们引入了SODA-OPT，这是一种扩展mlr基础结构的编译器工具。SODA-OPT会自动搜索、勾画轮廓、拼贴和预优化相关代码区域，从而通过高级合成生成高质量的加速器。SODA-OPT可以支持任何与MLIR基础结构接口的高级编程框架和特定于领域的语言。通过利用MLIR, SODA-OPT通过专门的抽象解决了编译器优化问题。后端合成工具通过渐进的中间表示降低连接到SODA-OPT。SODA-OPT与设计空间探索引擎接口，以识别编译器优化通道和选项的组合，为不同的后端和目标提供高性能的生成设计。我们通过探索以任意粒度勾画的深度神经网络算子的加速器自动生成，以及在大卷积层上结合勾勒与平铺，展示了编译流程的实际适用性。PolyBench基准测试的实验结果表明，我们的高级优化将合成加速器的执行延迟提高了60倍。我们还表明，对于选定的内核，我们的解决方案在超过70%的基准测试中优于当前最先进的技术，并在55%的基准测试中提供更好的平均加速。SODA-OPT是一个开源项目，可在https://gitlab.pnnl.gov/sodalite/soda-opt上获得。

{"title":"An MLIR-based Compiler Flow for System-Level Design and Hardware Acceleration","authors":"Nicolas Bohm Agostini, S. Curzel, Vinay C. Amatya, Cheng Tan, Marco Minutoli, Vito Giovanni Castellana, J. Manzano, D. Kaeli, Antonino Tumeo","doi":"10.1145/3508352.3549424","DOIUrl":"https://doi.org/10.1145/3508352.3549424","url":null,"abstract":"The generation of custom hardware accelerators for applications implemented within high-level productive programming frameworks requires considerable manual effort. To automate this process, we introduce SODA-OPT, a compiler tool that extends the MLIR infrastructure. SODA-OPT automatically searches, outlines, tiles, and pre-optimizes relevant code regions to generate high-quality accelerators through high-level synthesis. SODA-OPT can support any high-level programming framework and domain-specific language that interface with the MLIR infrastructure. By leveraging MLIR, SODA-OPT solves compiler optimization problems with specialized abstractions. Backend synthesis tools connect to SODA-OPT through progressive intermediate representation lowerings. SODA-OPT interfaces to a design space exploration engine to identify the combination of compiler optimization passes and options that provides high-performance generated designs for different backends and targets. We demonstrate the practical applicability of the compilation flow by exploring the automatic generation of accelerators for deep neural networks operators outlined at arbitrary granularity and by combining outlining with tiling on large convolution layers. Experimental results with kernels from the PolyBench benchmark show that our high-level optimizations improve execution delays of synthesized accelerators up to 60x. We also show that for the selected kernels, our solution outperforms the current of state-of-the art in more than 70% of the benchmarks and provides better average speedup in 55% of them. SODA-OPT is an open source project available at https://gitlab.pnnl.gov/sodalite/soda-opt.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128271142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

2022 ICCAD CAD Contest Problem B: 3D Placement with D2D Vertical Connections 2022年ICCAD CAD竞赛题目B: D2D垂直连接的3D布局

2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

Pub Date : 2022-10-29 DOI: 10.1145/3508352.3561108

Kai-Shun Hu, I-Jye Lin, Yu-Hui Huang, Hao-Yu Chi, Yi-Hsuan Wu, Cindy Chin-Fang Shen

In the chiplet era, the benefits from multiple factors can be observed by splitting a large single die into multiple small dies. By having the multiple small dies with die-to-die (D2D) vertical connections, the benefits including: 1) better yield, 2) better timing/performance, and 3) better cost. How to do the netlist partitioning, cell placement in each of the small dies, and also how to determine the location of the D2D inter-connection terminals becomes a new topic.To address this chiplet era physical implementation problem, ICCAD-2022 contest encourages the research in the techniques of multi-die netlist partitioning and placement with D2D vertical connections. We provided (i) a set of benchmarks and (ii) an evaluation metric that facilitate contestants to develop, test, and evaluate their new algorithms.

在碎片时代，可以通过将一个大的单个模具分成多个小模具来观察多因素的好处。通过具有模对模(D2D)垂直连接的多个小模具，好处包括:1)更好的产量，2)更好的时间/性能，以及3)更好的成本。如何进行网表划分，如何在每个小模中放置单元，以及如何确定D2D互连端子的位置成为一个新的课题。为了解决这个芯片时代的物理实现问题，ICCAD-2022竞赛鼓励在D2D垂直连接的多模网表划分和放置技术方面的研究。我们提供了(i)一套基准和(ii)一个评估指标，方便参赛者开发、测试和评估他们的新算法。

引用次数: 4

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀