首页 > 最新文献

ACM Transactions on Design Automation of Electronic Systems最新文献

英文 中文
Scalable and Accelerated Self Healing Control Circuit using Evolvable Hardware 使用可进化硬件的可扩展和加速自愈控制电路
IF 1.4 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-11-24 DOI: 10.1145/3634682
Deepanjali.S, Noor Mahammad.Sk

Controllers are mission-critical components of any electronic design. By sending control signals, it decides which and when other data path elements must operate. Faults, especially Single Event Upset (SEU) occurrence in these components, can lead to functional/mission failure of the system when deployed in harsh environments. Hence, a competence to self-heal from SEU is highly required in the control path of the digital system. Reconfiguration is critical for recovering from a faulty state to a non-faulty state. Compared to native reconfiguration, the Virtual Reconfigurable Circuit (VRC) is an FPGA-generic reconfiguration mechanism. The non-partial reconfiguration in VRC and extensive architecture are considered hindrances in extending the VRC-based Evolvable Hardware (EHW) to real-time fault mitigation. To confront this challenge, we have proposed an intrinsic constrained evolution to improve the scalability and accelerate the evolution process for VRC-based fault mitigation in mission-critical applications. Experimentation is conducted on complex ACM/SIGDA benchmark circuits and real-time circuits used in space missions, which are not included in related works. In addition, a comparative study is made between existing and proposed methodologies for brushless DC motor control circuits. The hardware utilization in the multiplexer has been significantly reduced, resulting in up to 77% reduction in the existing VRC architecture. The proposed methodology employs a fault localization approach to narrow the search space effectively. This approach has yielded an 87% improvement on average in convergence speed, as measured by the evolution time compared to the existing work.

控制器是任何电子设计的关键组件。通过发送控制信号,它决定哪些和何时其他数据路径元素必须操作。在恶劣的环境中部署时,这些组件发生的故障,特别是单事件干扰(SEU),可能导致系统的功能/任务失败。因此,在数字系统的控制路径中,高度要求具有自愈能力。重新配置对于从故障状态恢复到非故障状态至关重要。与原生重构相比,虚拟重构电路(VRC)是一种fpga通用重构机制。VRC的非局部重构和庞大的体系结构阻碍了基于VRC的可进化硬件(Evolvable Hardware, EHW)向实时故障缓解的扩展。为了应对这一挑战,我们提出了一种内在约束进化,以提高可扩展性,并加速关键任务应用中基于vrc的故障缓解的进化过程。在未纳入相关工作的复杂ACM/SIGDA基准电路和空间任务实时电路上进行了实验。此外,对现有的无刷直流电动机控制电路的方法和提出的方法进行了比较研究。多路复用器的硬件利用率显著降低,使现有的VRC架构减少了77%。该方法采用故障定位方法,有效地缩小了搜索空间。与现有工作相比,该方法在收敛速度上平均提高了87%,这是通过进化时间来衡量的。
{"title":"Scalable and Accelerated Self Healing Control Circuit using Evolvable Hardware","authors":"Deepanjali.S, Noor Mahammad.Sk","doi":"10.1145/3634682","DOIUrl":"https://doi.org/10.1145/3634682","url":null,"abstract":"<p>Controllers are mission-critical components of any electronic design. By sending control signals, it decides which and when other data path elements must operate. Faults, especially Single Event Upset (SEU) occurrence in these components, can lead to functional/mission failure of the system when deployed in harsh environments. Hence, a competence to self-heal from SEU is highly required in the control path of the digital system. Reconfiguration is critical for recovering from a faulty state to a non-faulty state. Compared to native reconfiguration, the Virtual Reconfigurable Circuit (VRC) is an FPGA-generic reconfiguration mechanism. The non-partial reconfiguration in VRC and extensive architecture are considered hindrances in extending the VRC-based Evolvable Hardware (EHW) to real-time fault mitigation. To confront this challenge, we have proposed an intrinsic constrained evolution to improve the scalability and accelerate the evolution process for VRC-based fault mitigation in mission-critical applications. Experimentation is conducted on complex ACM/SIGDA benchmark circuits and real-time circuits used in space missions, which are not included in related works. In addition, a comparative study is made between existing and proposed methodologies for brushless DC motor control circuits. The hardware utilization in the multiplexer has been significantly reduced, resulting in up to 77% reduction in the existing VRC architecture. The proposed methodology employs a fault localization approach to narrow the search space effectively. This approach has yielded an 87% improvement on average in convergence speed, as measured by the evolution time compared to the existing work.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":"9 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138537682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamic Adaptation Using Deep Reinforcement Learning for Digital Microfluidic Biochips 基于深度强化学习的数字微流控生物芯片动态自适应
IF 1.4 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-11-23 DOI: 10.1145/3633458
Tung-Che Liang, Yi-Chen Chang, Zhanwei Zhong, Yaas Bigdeli, Tsung-Yi Ho, Krishnendu Chakrabarty, Richard Fair

We describe an exciting new application domain for deep reinforcement learning (RL): droplet routing on digital microfluidic biochips (DMFBs). A DMFB consists of a two-dimensional electrode array, and it manipulates droplets of liquid to automatically execute biochemical protocols for clinical chemistry. However, a major problem with DMFBs is that electrodes can degrade over time. The transportation of droplet transportation over these degraded electrodes can fail, thereby adversely impacting the integrity of the bioassay outcome. We demonstrated that the fomulation of droplet transportation as an RL problem enables the training of deep neural network policies that can adapt to the underlying health conditions of electrodes and ensure reliable fluidic operations. We describe an RL-based droplet-routing solution that can be used for various sizes of DMFBs. We highlight the reliable execution of an epigenetic bioassay with the RL droplet router on a fabricated DMFB. We show that the use of the RL approach on a simple micro-computer (Raspberry Pi 4) leads to acceptable performance for time-critical bioassays. We present a simulation environment based on the OpenAI Gym Interface for RL-guided droplet routing problems on DMFBs. We present results on our study of electrode degradation using fabricated DMFBs. The study supports the degradation model used in the simulator.

我们描述了一个令人兴奋的深度强化学习(RL)的新应用领域:数字微流控生物芯片(dmfb)上的液滴路由。DMFB由一个二维电极阵列组成,它操纵液滴自动执行临床化学的生化协议。然而,dmfb的一个主要问题是电极会随着时间的推移而退化。液滴在这些降解电极上的运输可能会失败,从而对生物测定结果的完整性产生不利影响。我们证明了将液滴运输作为RL问题的制定可以训练深度神经网络策略,该策略可以适应电极的潜在健康状况并确保可靠的流体操作。我们描述了一种基于rl的液滴路由解决方案,可用于各种尺寸的dmfb。我们强调可靠的执行表观遗传生物测定与RL液滴路由器上制造的DMFB。我们表明,在简单的微型计算机(树莓派4)上使用RL方法可以为时间关键型生物测定提供可接受的性能。我们提出了一个基于OpenAI Gym Interface的dmfb上rl引导液滴路由问题的仿真环境。我们介绍了我们的研究结果电极降解使用制造的dmfb。该研究支持了模拟器中使用的退化模型。
{"title":"Dynamic Adaptation Using Deep Reinforcement Learning for Digital Microfluidic Biochips","authors":"Tung-Che Liang, Yi-Chen Chang, Zhanwei Zhong, Yaas Bigdeli, Tsung-Yi Ho, Krishnendu Chakrabarty, Richard Fair","doi":"10.1145/3633458","DOIUrl":"https://doi.org/10.1145/3633458","url":null,"abstract":"<p>We describe an exciting new application domain for deep reinforcement learning (RL): droplet routing on digital microfluidic biochips (DMFBs). A DMFB consists of a two-dimensional electrode array, and it manipulates droplets of liquid to automatically execute biochemical protocols for clinical chemistry. However, a major problem with DMFBs is that electrodes can degrade over time. The transportation of droplet transportation over these degraded electrodes can fail, thereby adversely impacting the integrity of the bioassay outcome. We demonstrated that the fomulation of droplet transportation as an RL problem enables the training of deep neural network policies that can adapt to the underlying health conditions of electrodes and ensure reliable fluidic operations. We describe an RL-based droplet-routing solution that can be used for various sizes of DMFBs. We highlight the reliable execution of an epigenetic bioassay with the RL droplet router on a fabricated DMFB. We show that the use of the RL approach on a simple micro-computer (Raspberry Pi 4) leads to acceptable performance for time-critical bioassays. We present a simulation environment based on the OpenAI Gym Interface for RL-guided droplet routing problems on DMFBs. We present results on our study of electrode degradation using fabricated DMFBs. The study supports the degradation model used in the simulator.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":"235 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138537678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On-chip ESD Protection Design Methodologies by CAD Simulation 基于CAD仿真的片内ESD保护设计方法
IF 1.4 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-11-15 DOI: 10.1145/3593808
Zijin Pan, Xunyu Li, Weiquan Hao, Runyu Miao, Albert Wang

Electrostatic discharge (ESD) can cause malfunction or failure of integrated circuits (ICs). On-chip ESD protection design is a major IC design-for-reliability (DfR) challenge, particularly for complex chips made in advanced technology nodes. Traditional trial-and-error approaches become unacceptable to practical ESD protection designs for advanced ICs. Full-chip ESD protection circuit design optimization, prediction, and verification become essential to advanced chip designs, which highly depends on CAD algorithm and simulation that has been a constant research topic for decades. This paper reviews recent advances in CAD-enabled on-chip ESD protection circuit simulation design technologies and ESD-IC co-design methodologies. Key challenges of ESD CAD design practices are outlined. Practical ESD protection simulation design examples are discussed.

静电放电会引起集成电路的故障或失效。片内ESD保护设计是一个主要的IC可靠性设计(DfR)挑战,特别是对于在先进技术节点上制造的复杂芯片。传统的试错方法对于高级集成电路的实际ESD保护设计来说是不可接受的。全芯片ESD保护电路设计的优化、预测和验证成为先进芯片设计的关键,这在很大程度上依赖于CAD算法和仿真,这是几十年来不断研究的课题。本文综述了基于cad的片上ESD保护电路仿真设计技术和ESD- ic协同设计方法的最新进展。概述了ESD CAD设计实践的主要挑战。讨论了实际的ESD保护仿真设计实例。
{"title":"On-chip ESD Protection Design Methodologies by CAD Simulation","authors":"Zijin Pan, Xunyu Li, Weiquan Hao, Runyu Miao, Albert Wang","doi":"10.1145/3593808","DOIUrl":"https://doi.org/10.1145/3593808","url":null,"abstract":"<p><b>Electrostatic discharge (ESD)</b> can cause malfunction or failure of <b>integrated circuits (ICs)</b>. On-chip ESD protection design is a major IC <b>design-for-reliability (DfR)</b> challenge, particularly for complex chips made in advanced technology nodes. Traditional trial-and-error approaches become unacceptable to practical ESD protection designs for advanced ICs. Full-chip ESD protection circuit design optimization, prediction, and verification become essential to advanced chip designs, which highly depends on CAD algorithm and simulation that has been a constant research topic for decades. This paper reviews recent advances in CAD-enabled on-chip ESD protection circuit simulation design technologies and ESD-IC co-design methodologies. Key challenges of ESD CAD design practices are outlined. Practical ESD protection simulation design examples are discussed.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":"38 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138543466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Energy-Constrained Scheduling for Weakly Hard Real-Time Systems Using Standby-Sparing 弱硬实时系统的备用备用能量约束调度
4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-11-14 DOI: 10.1145/3631587
Linwei Niu, Danda B. Rawat, Jonathan Musselwhite, Zonghua Gu, Qingxu Deng
For real-time embedded systems, QoS (Quality of Service), fault tolerance, and energy budget constraint are among the primary design concerns. In this research, we investigate the problem of energy constrained standby-sparing for both periodic and aperiodic tasks in a weakly hard real-time environment. The standby-sparing systems adopt a primary processor and a spare processor to provide fault tolerance for both permanent and transient faults. For such kind of systems, we firstly propose several novel standby-sparing schemes for the periodic tasks which can ensure the system feasibility under tighter energy budget constraint than the traditional ones. Then based on them integrated approachs for both periodic and aperiodic tasks are proposed to minimize the aperiodic response time whilst achieving better energy and QoS performance under the given energy budget constraint. The evaluation results demonstrated that the proposed techniques significantly outperformed the existing state of the art approaches in terms of feasibility and system performance while ensuring QoS and fault tolerance under the given energy budget constraint.
对于实时嵌入式系统,QoS(服务质量)、容错和能源预算约束是主要的设计关注点。在本研究中,我们研究了弱硬实时环境下周期性和非周期性任务的能量约束备用节省问题。备用备用系统采用一个主处理器和一个备用处理器来提供永久和短暂故障的容错能力。针对这类系统,我们首先提出了几种新的周期任务备用方案,以保证系统在更严格的能量预算约束下的可行性。在此基础上,提出了周期任务和非周期任务的集成方法,在给定能量预算约束下,最小化非周期响应时间,同时获得更好的能量和QoS性能。评估结果表明,在给定的能量预算约束下,所提出的技术在保证QoS和容错性的同时,在可行性和系统性能方面明显优于现有的最先进方法。
{"title":"Energy-Constrained Scheduling for Weakly Hard Real-Time Systems Using Standby-Sparing","authors":"Linwei Niu, Danda B. Rawat, Jonathan Musselwhite, Zonghua Gu, Qingxu Deng","doi":"10.1145/3631587","DOIUrl":"https://doi.org/10.1145/3631587","url":null,"abstract":"For real-time embedded systems, QoS (Quality of Service), fault tolerance, and energy budget constraint are among the primary design concerns. In this research, we investigate the problem of energy constrained standby-sparing for both periodic and aperiodic tasks in a weakly hard real-time environment. The standby-sparing systems adopt a primary processor and a spare processor to provide fault tolerance for both permanent and transient faults. For such kind of systems, we firstly propose several novel standby-sparing schemes for the periodic tasks which can ensure the system feasibility under tighter energy budget constraint than the traditional ones. Then based on them integrated approachs for both periodic and aperiodic tasks are proposed to minimize the aperiodic response time whilst achieving better energy and QoS performance under the given energy budget constraint. The evaluation results demonstrated that the proposed techniques significantly outperformed the existing state of the art approaches in terms of feasibility and system performance while ensuring QoS and fault tolerance under the given energy budget constraint.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":"20 11","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134991838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Efficient Reinforcement Learning Based Framework for Exploring Logic Synthesis 一种高效的基于强化学习的逻辑综合探索框架
4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-11-10 DOI: 10.1145/3632174
Yu Qian, Xuegong Zhou, Hao Zhou, Lingli Wang
Logic synthesis is a crucial step in electronic design automation tools. The rapid developments of reinforcement learning (RL) have enabled the automated exploration of logic synthesis. Existing RL based methods may lead to data inefficiency, and the exploration approaches for FPGA and ASIC technology mapping in recent works lack the flexibility of the learning process. This work proposes ESE, a reinforcement learning based framework to efficiently learn the logic synthesis process. The framework supports the modeling of logic optimization and technology mapping for FPGA and ASIC. The optimization for the execution time of the synthesis script is also considered. For the modeling of FPGA mapping, the logic optimization and technology mapping are combined to be learned in a flexible way. For the modeling of ASIC mapping, the standard cell based optimization and LUT optimization operations are incorporated into the ASIC synthesis flow. To improve the utilization of samples, the Proximal Policy Optimization model is adopted. Furthermore, the framework is enhanced by supporting MIG based synthesis exploration. Experiments show that for FPGA technology mapping on the VTR benchmark, the average LUT-Level-Product and script runtime are improved by more than 18.3% and 12.4% respectively than previous works. For ASIC mapping on the EPFL benchmark, the average Area-Delay-Product is improved by 14.5%.
逻辑综合是电子设计自动化工具中至关重要的一步。强化学习(RL)的快速发展使逻辑综合的自动化探索成为可能。现有的基于RL的方法可能导致数据效率低下,并且最近研究的FPGA和ASIC技术映射的探索方法缺乏学习过程的灵活性。本工作提出了一种基于强化学习的框架ESE,以有效地学习逻辑综合过程。该框架支持FPGA和ASIC的逻辑优化建模和技术映射。还考虑了合成脚本执行时间的优化。对于FPGA映射的建模,将逻辑优化与技术映射相结合,以灵活的方式学习。对于ASIC映射的建模,将基于标准单元的优化和LUT优化操作纳入ASIC合成流程。为了提高样本利用率,采用了最近邻策略优化模型。此外,该框架通过支持基于MIG的合成探索得到增强。实验表明,对于FPGA技术在VTR基准上的映射,平均LUT-Level-Product和脚本运行时间分别比以前的工作提高了18.3%和12.4%以上。对于EPFL基准上的ASIC映射,平均面积延迟积提高了14.5%。
{"title":"An Efficient Reinforcement Learning Based Framework for Exploring Logic Synthesis","authors":"Yu Qian, Xuegong Zhou, Hao Zhou, Lingli Wang","doi":"10.1145/3632174","DOIUrl":"https://doi.org/10.1145/3632174","url":null,"abstract":"Logic synthesis is a crucial step in electronic design automation tools. The rapid developments of reinforcement learning (RL) have enabled the automated exploration of logic synthesis. Existing RL based methods may lead to data inefficiency, and the exploration approaches for FPGA and ASIC technology mapping in recent works lack the flexibility of the learning process. This work proposes ESE, a reinforcement learning based framework to efficiently learn the logic synthesis process. The framework supports the modeling of logic optimization and technology mapping for FPGA and ASIC. The optimization for the execution time of the synthesis script is also considered. For the modeling of FPGA mapping, the logic optimization and technology mapping are combined to be learned in a flexible way. For the modeling of ASIC mapping, the standard cell based optimization and LUT optimization operations are incorporated into the ASIC synthesis flow. To improve the utilization of samples, the Proximal Policy Optimization model is adopted. Furthermore, the framework is enhanced by supporting MIG based synthesis exploration. Experiments show that for FPGA technology mapping on the VTR benchmark, the average LUT-Level-Product and script runtime are improved by more than 18.3% and 12.4% respectively than previous works. For ASIC mapping on the EPFL benchmark, the average Area-Delay-Product is improved by 14.5%.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":"103 10","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135136662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
F lip : Data-Centric Edge CGRA Accelerator F:以数据为中心的边缘CGRA加速器
4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-11-03 DOI: 10.1145/3631118
Dan Wu, Peng Chen, Thilini Kaushalya Bandara, Zhaoying Li, Tulika Mitra
Coarse-Grained Reconfigurable Arrays (CGRA) are promising edge accelerators due to the outstanding balance in flexibility, performance, and energy efficiency. Classic CGRAs statically map compute operations onto the processing elements (PE) and route the data dependencies among the operations through the Network-on-Chip. However, CGRAs are designed for fine-grained static instruction-level parallelism and struggle to accelerate applications with dynamic and irregular data-level parallelism, such as graph processing. To address this limitation, we present Flip , a novel accelerator that enhances traditional CGRA architectures to boost the performance of graph applications. Flip retains the classic CGRA execution model while introducing a special data-centric mode for efficient graph processing. Specifically, it leverages the inherent data parallelism of graph algorithms by mapping graph vertices onto PEs rather than the operations, and supporting dynamic routing of temporary data according to the runtime evolution of the graph frontier. Experimental results demonstrate that Flip achieves up to 36 × speedup with merely 19% more area compared to classic CGRAs. Compared to state-of-the-art large-scale graph processors, Flip has similar energy efficiency and 2.2 × better area efficiency at a much-reduced power/area budget.
粗粒度可重构阵列(CGRA)由于在灵活性、性能和能效方面取得了很好的平衡,是一种很有前途的边缘加速器。经典的CGRAs将计算操作静态地映射到处理元素(PE)上,并通过片上网络(Network-on-Chip)在操作之间路由数据依赖关系。然而,CGRAs是为细粒度的静态指令级并行性而设计的,并且难以加速具有动态和不规则数据级并行性的应用程序,例如图处理。为了解决这一限制,我们提出了Flip,一种新型加速器,可以增强传统的CGRA架构,以提高图形应用程序的性能。Flip保留了经典的CGRA执行模型,同时引入了一种特殊的以数据为中心的高效图形处理模式。具体来说,它利用了图算法固有的数据并行性,通过将图顶点映射到pe而不是操作,并根据图边界的运行时演变支持临时数据的动态路由。实验结果表明,与传统的CGRAs相比,Flip实现了高达36倍的加速,仅增加了19%的面积。与最先进的大型图形处理器相比,Flip具有相似的能源效率和2.2倍的面积效率,功耗/面积预算大大降低。
{"title":"F <scp>lip</scp> : Data-Centric Edge CGRA Accelerator","authors":"Dan Wu, Peng Chen, Thilini Kaushalya Bandara, Zhaoying Li, Tulika Mitra","doi":"10.1145/3631118","DOIUrl":"https://doi.org/10.1145/3631118","url":null,"abstract":"Coarse-Grained Reconfigurable Arrays (CGRA) are promising edge accelerators due to the outstanding balance in flexibility, performance, and energy efficiency. Classic CGRAs statically map compute operations onto the processing elements (PE) and route the data dependencies among the operations through the Network-on-Chip. However, CGRAs are designed for fine-grained static instruction-level parallelism and struggle to accelerate applications with dynamic and irregular data-level parallelism, such as graph processing. To address this limitation, we present Flip , a novel accelerator that enhances traditional CGRA architectures to boost the performance of graph applications. Flip retains the classic CGRA execution model while introducing a special data-centric mode for efficient graph processing. Specifically, it leverages the inherent data parallelism of graph algorithms by mapping graph vertices onto PEs rather than the operations, and supporting dynamic routing of temporary data according to the runtime evolution of the graph frontier. Experimental results demonstrate that Flip achieves up to 36 × speedup with merely 19% more area compared to classic CGRAs. Compared to state-of-the-art large-scale graph processors, Flip has similar energy efficiency and 2.2 × better area efficiency at a much-reduced power/area budget.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":"41 16","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135818707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mathematical Framework for Optimizing Crossbar Allocation for ReRAM-based CNN Accelerators 基于reram的CNN加速器交叉棒分配优化数学框架
4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-11-02 DOI: 10.1145/3631523
Wanqian Li, Yinhe Han, Xiaoming Chen
The resistive random-access memory (ReRAM) has widely been used to accelerate convolutional neural networks (CNNs) thanks to its analog in-memory computing capability. ReRAM crossbars not only store layers’ weights, but also perform in-situ matrix-vector multiplications which are core operations of CNNs. To boost the performance of ReRAM-based CNN accelerators, crossbars can be duplicated to explore more intra-layer parallelism. The crossbar allocation scheme can significantly influence both the computing throughput and bandwidth requirements of ReRAM-based CNN accelerators. Under the resource constraints (i.e., crossbars and memory bandwidths), how to find the optimal number of crossbars for each layer to maximize the inference performance for an entire CNN is an unsolved problem. In this work, we find the optimal crossbar allocation scheme by mathematically modeling the problem as a constrained optimization problem and solving it with a dynamic programming based solver. Experiments demonstrate that our model for CNN inference time is almost precise, and the proposed framework can obtain solutions with near-optimal inference time. We also emphasize that communication (i.e., data access) is an important factor and must also be considered when determining the optimal crossbar allocation scheme.
电阻式随机存取存储器(ReRAM)由于其模拟内存计算能力而被广泛用于卷积神经网络(cnn)的加速。ReRAM交叉条不仅存储层的权值,还可以进行cnn的核心运算——原位矩阵向量乘法。为了提高基于reram的CNN加速器的性能,可以重复交叉条以探索更多的层内并行性。交叉条分配方案会显著影响基于reram的CNN加速器的计算吞吐量和带宽需求。在资源约束(即交叉条和内存带宽)下,如何找到每层的最优交叉条数量以最大化整个CNN的推理性能是一个尚未解决的问题。本文将该问题建模为一个约束优化问题,并利用基于动态规划的求解器进行求解,从而找到最优的横木分配方案。实验表明,我们的CNN推理时间模型几乎是精确的,所提出的框架可以获得近似最优推理时间的解。我们还强调,通信(即数据访问)是一个重要因素,在确定最佳交叉分配方案时也必须考虑到这一点。
{"title":"Mathematical Framework for Optimizing Crossbar Allocation for ReRAM-based CNN Accelerators","authors":"Wanqian Li, Yinhe Han, Xiaoming Chen","doi":"10.1145/3631523","DOIUrl":"https://doi.org/10.1145/3631523","url":null,"abstract":"The resistive random-access memory (ReRAM) has widely been used to accelerate convolutional neural networks (CNNs) thanks to its analog in-memory computing capability. ReRAM crossbars not only store layers’ weights, but also perform in-situ matrix-vector multiplications which are core operations of CNNs. To boost the performance of ReRAM-based CNN accelerators, crossbars can be duplicated to explore more intra-layer parallelism. The crossbar allocation scheme can significantly influence both the computing throughput and bandwidth requirements of ReRAM-based CNN accelerators. Under the resource constraints (i.e., crossbars and memory bandwidths), how to find the optimal number of crossbars for each layer to maximize the inference performance for an entire CNN is an unsolved problem. In this work, we find the optimal crossbar allocation scheme by mathematically modeling the problem as a constrained optimization problem and solving it with a dynamic programming based solver. Experiments demonstrate that our model for CNN inference time is almost precise, and the proposed framework can obtain solutions with near-optimal inference time. We also emphasize that communication (i.e., data access) is an important factor and must also be considered when determining the optimal crossbar allocation scheme.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":"11 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135875389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimal Model Partitioning with Low-Overhead Profiling on the PIM-based Platform for Deep Learning Inference 基于pim的深度学习推理平台上低开销的最优模型划分
4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-11-01 DOI: 10.1145/3628599
Seok Young Kim, Jaewook Lee, Yoonah Paik, Chang Hyun Kim, Won Jun Lee, Seon Wook Kim
Recently Processing-in-Memory (PIM) has become a promising solution to achieve energy-efficient computation in data-intensive applications by placing computation near or inside the memory. In most Deep Learning (DL) frameworks, a user manually partitions a model’s computational graph (CG) onto the computing devices by considering the devices’ capability and the data transfer. The Deep Neural Network (DNN) models become increasingly complex for improving accuracy; thus, it is exceptionally challenging to partition the execution to achieve the best performance, especially on a PIM-based platform requiring frequent offloading of large amounts of data. This paper proposes two novel algorithms for DL inference to resolve the challenge: low-overhead profiling and optimal model partitioning. First, we reconstruct CG by considering the devices’ capability to represent all the possible scheduling paths. Second, we develop a profiling algorithm to find the required minimum profiling paths to measure all the node and edge costs of the reconstructed CG. Finally, we devise the model partitioning algorithm to get the optimal minimum execution time using the dynamic programming technique with the profiled data. We evaluated our work by executing the BERT, RoBERTa, and GPT-2 models on the ARM multicores with the PIM-modeled FPGA platform with various sequence lengths. For three computing devices in the platform, i.e., CPU serial/parallel and PIM executions, we could find all the costs only in four profile runs, three for node costs and one for edge costs. Also, our model partitioning algorithm achieved the highest performance in all the experiments over the execution with manually assigned device priority and the state-of-the-art greedy approach.
最近,内存中处理(PIM)已经成为一种很有前途的解决方案,通过将计算放在内存附近或内存内部来实现数据密集型应用程序中的节能计算。在大多数深度学习(DL)框架中,用户通过考虑设备的能力和数据传输,手动将模型的计算图(CG)划分到计算设备上。为了提高精度,深度神经网络(DNN)模型变得越来越复杂;因此,为实现最佳性能而对执行进行分区是非常具有挑战性的,特别是在需要频繁卸载大量数据的基于pim的平台上。本文提出了两种新的深度学习推理算法:低开销分析和最优模型划分。首先,我们通过考虑设备表示所有可能调度路径的能力来重构CG。其次,我们开发了一种轮廓算法来寻找所需的最小轮廓路径来测量重构CG的所有节点和边缘成本。最后,我们设计了模型划分算法,利用动态规划技术得到了最优的最小执行时间。我们通过在ARM多核上使用pim建模的FPGA平台以不同的序列长度执行BERT、RoBERTa和GPT-2模型来评估我们的工作。对于平台中的三个计算设备,即CPU串行/并行和PIM执行,我们可以在四次配置文件运行中找到所有成本,三次用于节点成本,一次用于边缘成本。此外,我们的模型划分算法在手动分配设备优先级和最先进的贪婪方法的执行过程中取得了最高的性能。
{"title":"Optimal Model Partitioning with Low-Overhead Profiling on the PIM-based Platform for Deep Learning Inference","authors":"Seok Young Kim, Jaewook Lee, Yoonah Paik, Chang Hyun Kim, Won Jun Lee, Seon Wook Kim","doi":"10.1145/3628599","DOIUrl":"https://doi.org/10.1145/3628599","url":null,"abstract":"Recently Processing-in-Memory (PIM) has become a promising solution to achieve energy-efficient computation in data-intensive applications by placing computation near or inside the memory. In most Deep Learning (DL) frameworks, a user manually partitions a model’s computational graph (CG) onto the computing devices by considering the devices’ capability and the data transfer. The Deep Neural Network (DNN) models become increasingly complex for improving accuracy; thus, it is exceptionally challenging to partition the execution to achieve the best performance, especially on a PIM-based platform requiring frequent offloading of large amounts of data. This paper proposes two novel algorithms for DL inference to resolve the challenge: low-overhead profiling and optimal model partitioning. First, we reconstruct CG by considering the devices’ capability to represent all the possible scheduling paths. Second, we develop a profiling algorithm to find the required minimum profiling paths to measure all the node and edge costs of the reconstructed CG. Finally, we devise the model partitioning algorithm to get the optimal minimum execution time using the dynamic programming technique with the profiled data. We evaluated our work by executing the BERT, RoBERTa, and GPT-2 models on the ARM multicores with the PIM-modeled FPGA platform with various sequence lengths. For three computing devices in the platform, i.e., CPU serial/parallel and PIM executions, we could find all the costs only in four profile runs, three for node costs and one for edge costs. Also, our model partitioning algorithm achieved the highest performance in all the experiments over the execution with manually assigned device priority and the state-of-the-art greedy approach.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":"167 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135372069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Security of Electrical, Optical and Wireless On-Chip Interconnects: A Survey 电子、光学和无线片上互连的安全性:综述
4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-10-30 DOI: 10.1145/3631117
Hansika Weerasena, Prabhat Mishra
The advancement of manufacturing technologies has enabled the integration of more intellectual property (IP) cores on the same system-on-chip (SoC). Scalable and high throughput on-chip communication architecture has become a vital component in today’s SoCs. Diverse technologies such as electrical, wireless, optical, and hybrid are available for on-chip communication with different architectures supporting them. On-chip communication sub-system is shared across all the IPs and continuously used throughout the lifetime of the SoC. Therefore, the security of the on-chip communication is crucial because exploiting any vulnerability would be a goldmine for an attacker. In this survey, we provide a comprehensive review of threat models, attacks and countermeasures over diverse on-chip communication technologies as well as sophisticated architectures.
制造技术的进步使得在同一个片上系统(SoC)上集成更多的知识产权(IP)内核成为可能。可扩展和高吞吐量的片上通信架构已成为当今soc的重要组成部分。不同的技术,如电气、无线、光学和混合技术,可用于芯片上的通信,并有不同的架构支持它们。片上通信子系统在所有ip之间共享,并在SoC的整个生命周期中持续使用。因此,片上通信的安全性至关重要,因为利用任何漏洞都将是攻击者的金矿。在本调查中,我们提供了对各种片上通信技术以及复杂架构的威胁模型,攻击和对策的全面回顾。
{"title":"Security of Electrical, Optical and Wireless On-Chip Interconnects: A Survey","authors":"Hansika Weerasena, Prabhat Mishra","doi":"10.1145/3631117","DOIUrl":"https://doi.org/10.1145/3631117","url":null,"abstract":"The advancement of manufacturing technologies has enabled the integration of more intellectual property (IP) cores on the same system-on-chip (SoC). Scalable and high throughput on-chip communication architecture has become a vital component in today’s SoCs. Diverse technologies such as electrical, wireless, optical, and hybrid are available for on-chip communication with different architectures supporting them. On-chip communication sub-system is shared across all the IPs and continuously used throughout the lifetime of the SoC. Therefore, the security of the on-chip communication is crucial because exploiting any vulnerability would be a goldmine for an attacker. In this survey, we provide a comprehensive review of threat models, attacks and countermeasures over diverse on-chip communication technologies as well as sophisticated architectures.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136019604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
BOOM-Explorer: RISC-V BOOM Microarchitecture Design Space Exploration BOOM- explorer: RISC-V BOOM微架构设计空间探索
4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-10-26 DOI: 10.1145/3630013
Chen Bai, Qi Sun, Jianwang Zhai, Yuzhe Ma, Bei Yu, Martin D.F. Wong
Microarchitecture parameters tuning is critical in the microprocessor design cycle. It is a non-trivial design space exploration (DSE) problem due to the large solution space, cycle-accurate simulators’ modeling inaccuracy, and high simulation runtime for performance evaluations. Previous methods require massive expert efforts to construct interpretable equations or high computing resource demands to train black-box prediction models. This paper follows the black-box methods due to better solution qualities than analytical methods in general. We summarize two learned lessons and propose BOOM-Explorer accordingly. First, embedding microarchitecture domain knowledge in the DSE improves the solution quality. Second, BOOM-Explorer makes the microarchitecture DSE for register-transfer-level designs within the limited time budget feasible. We enhance BOOM-Explorer with the diversity-guidance, further improving the algorithm performance. Experimental results with RISC-V Berkeley-Out-of-Order Machine under 7-nm technology show that our proposed methodology achieves an average of (18.75% ) higher Pareto hypervolume, (35.47% ) less average distance to reference set, and (65.38% ) less overall running time compared to previous approaches.
微架构参数调优在微处理器设计周期中是至关重要的。由于求解空间大、循环精度模拟器建模不准确、性能评估的仿真运行时间长等问题,使其成为一个非平凡的设计空间探索问题。以前的方法需要大量的专家努力来构建可解释的方程,或者需要大量的计算资源来训练黑盒预测模型。由于溶液质量优于一般的分析方法,本文采用了黑盒法。我们总结了两个经验教训,并据此提出了BOOM-Explorer。首先,在DSE中嵌入微体系结构领域知识提高了解决方案的质量。其次,BOOM-Explorer使得用于寄存器-传输级设计的微架构DSE在有限的时间预算内可行。我们利用分集制导对BOOM-Explorer进行了改进,进一步提高了算法性能。在RISC-V伯克利无序机7纳米技术下的实验结果表明,与之前的方法相比,我们提出的方法实现了(18.75% )更高的帕雷托超体积,(35.47% )更短的平均参考距离,(65.38% )更短的总运行时间。
{"title":"BOOM-Explorer: RISC-V BOOM Microarchitecture Design Space Exploration","authors":"Chen Bai, Qi Sun, Jianwang Zhai, Yuzhe Ma, Bei Yu, Martin D.F. Wong","doi":"10.1145/3630013","DOIUrl":"https://doi.org/10.1145/3630013","url":null,"abstract":"Microarchitecture parameters tuning is critical in the microprocessor design cycle. It is a non-trivial design space exploration (DSE) problem due to the large solution space, cycle-accurate simulators’ modeling inaccuracy, and high simulation runtime for performance evaluations. Previous methods require massive expert efforts to construct interpretable equations or high computing resource demands to train black-box prediction models. This paper follows the black-box methods due to better solution qualities than analytical methods in general. We summarize two learned lessons and propose BOOM-Explorer accordingly. First, embedding microarchitecture domain knowledge in the DSE improves the solution quality. Second, BOOM-Explorer makes the microarchitecture DSE for register-transfer-level designs within the limited time budget feasible. We enhance BOOM-Explorer with the diversity-guidance, further improving the algorithm performance. Experimental results with RISC-V Berkeley-Out-of-Order Machine under 7-nm technology show that our proposed methodology achieves an average of (18.75% ) higher Pareto hypervolume, (35.47% ) less average distance to reference set, and (65.38% ) less overall running time compared to previous approaches.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":"33 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134907853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
ACM Transactions on Design Automation of Electronic Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1