Tianming Ni, Xiaoqing Wen, Hussam Amrouch, Cheng Zhuo, Peilin Song
The research on design for testability and reliability of security-aware hardware has been important in both academia and industry. With ever-growing globalization, commercial hardware design, manufacturing, transportation, and supply now involve many different countries, resulting in aggravated vulnerability from hardware design to manufacturing. Hardware with malicious purposes implanted from the third-party manufacturing process may control the operation of a circuit and tamper its functions, causing serious security issues. However, hardware includes not only devices and circuits but also systems. An important fact is that testability, reliability, and security technologies come from different design layers, but the impact evaluation is conducted at the system level. In other words, the testability, reliability, and security design of different layers can be carried out in a holistic manner to achieve optimization for the whole system. In addition, the testability, reliability, and security design technologies of each design layer can be collaboratively conducted to achieve better performance. The testability, reliability, and security tradeoff has garnered attention from academia and industry, particularly in the Post-Moore Era, due to the complexities and opportunities arising from new architectures and technologies.
{"title":"Introduction to the Special Issue on Design for Testability and Reliability of Security-aware Hardware","authors":"Tianming Ni, Xiaoqing Wen, Hussam Amrouch, Cheng Zhuo, Peilin Song","doi":"10.1145/3631476","DOIUrl":"https://doi.org/10.1145/3631476","url":null,"abstract":"<p>The research on design for testability and reliability of security-aware hardware has been important in both academia and industry. With ever-growing globalization, commercial hardware design, manufacturing, transportation, and supply now involve many different countries, resulting in aggravated vulnerability from hardware design to manufacturing. Hardware with malicious purposes implanted from the third-party manufacturing process may control the operation of a circuit and tamper its functions, causing serious security issues. However, hardware includes not only devices and circuits but also systems. An important fact is that testability, reliability, and security technologies come from different design layers, but the impact evaluation is conducted at the system level. In other words, the testability, reliability, and security design of different layers can be carried out in a holistic manner to achieve optimization for the whole system. In addition, the testability, reliability, and security design technologies of each design layer can be collaboratively conducted to achieve better performance. The testability, reliability, and security tradeoff has garnered attention from academia and industry, particularly in the Post-Moore Era, due to the complexities and opportunities arising from new architectures and technologies.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2023-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138744333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cache timing channel attacks exploit the inherent properties of cache memories: hit and miss time along with shared nature of cache to leak the secret information. The side channel and covert channel are the two well-known cache timing channel attacks. In this paper, we propose, Restricted Static Pseudo-Partitioning (RSPP), an effective partition based mitigation mechanisms that restricts the cache access of only the adversaries involved in the attack. It has an insignificant impact of only 1% in performance, as the benign process have access to full cache and restrictions are limited only to the suspicious processes and cache sets. It can be implemented with a maximum storage overhead of 1.45% of the total LLC size. This paper presents three variations of the proposed attack mitigation mechanism: RSPP, simplified-RSPP (S-RSPP) and core wise-RSPP (C-RSPP) with different hardware overheads. A full system simulator is used for evaluating the performance impact of RSPP. A detailed experimental analysis with different LLC and attack parameters is also discussed in the paper. RSPP is also compared with the existing defense mechanisms effective against cross-core covert channel attacks.
{"title":"RSPP: Restricted Static Pseudo-Partitioning for Mitigation of Cross-Core Covert Channel Attacks","authors":"Jaspinder Kaur, Shirshendu Das","doi":"10.1145/3637222","DOIUrl":"https://doi.org/10.1145/3637222","url":null,"abstract":"<p>Cache timing channel attacks exploit the inherent properties of cache memories: hit and miss time along with shared nature of cache to leak the secret information. The side channel and covert channel are the two well-known cache timing channel attacks. In this paper, we propose, Restricted Static Pseudo-Partitioning (RSPP), an effective partition based mitigation mechanisms that restricts the cache access of only the adversaries involved in the attack. It has an insignificant impact of only 1% in performance, as the benign process have access to full cache and restrictions are limited only to the suspicious processes and cache sets. It can be implemented with a maximum storage overhead of 1.45% of the total LLC size. This paper presents three variations of the proposed attack mitigation mechanism: RSPP, simplified-RSPP (S-RSPP) and core wise-RSPP (C-RSPP) with different hardware overheads. A full system simulator is used for evaluating the performance impact of RSPP. A detailed experimental analysis with different LLC and attack parameters is also discussed in the paper. RSPP is also compared with the existing defense mechanisms effective against cross-core covert channel attacks.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2023-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138580225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recently, GPU-accelerated placers such as DREAMPlace and Xplace have demonstrated their superiority over traditional CPU-reliant placers by achieving orders of magnitude speed up in placement runtime. However, due to their limited focus in placement objectives (e.g., wirelength and density), the placement quality achieved by DREAMPlace or Xplace is not comparable to that of commercial tools. In this paper, to bridge the gap between open-source and commercial placers, we present a novel placement optimization framework named GAN-Place that employs generative adversarial learning to transfer the placement quality of the industry-leading commercial placer, Synopsys ICC2, to existing open-source GPU-accelerated placers (DREAMPlace and Xplace). Without the knowledge of the underlying proprietary algorithms or constraints used by the commercial tools, our framework facilitates transfer learning to directly enhance the open-source placers by optimizing the proposed differentiable loss that denotes the “similarity” between DREAMPlace- or Xplace-generated placements and those in commercial databases. Experimental results on 7 industrial designs not only show the our GAN-Place immediately improves the Power, Performance, and Area (PPA) metrics at the placement stage, but also demonstrate that these improvements last firmly to the post-route stage, where we observe improvements by up to 8.3% in wirelength, 7.4% in power, and 37.6% in Total Negative Slack (TNS) on a commercial CPU benchmark.
{"title":"GAN-Place: Advancing Open-Source Placers to Commercial-Quality using Generative Adversarial Networks and Transfer Learning","authors":"Yi-Chen Lu, Haoxing Ren, Hao-Hsiang Hsiao, Sung Kyu Lim","doi":"10.1145/3636461","DOIUrl":"https://doi.org/10.1145/3636461","url":null,"abstract":"<p>Recently, GPU-accelerated placers such as DREAMPlace and Xplace have demonstrated their superiority over traditional CPU-reliant placers by achieving orders of magnitude speed up in placement runtime. However, due to their limited focus in placement objectives (e.g., wirelength and density), the placement quality achieved by DREAMPlace or Xplace is not comparable to that of commercial tools. In this paper, to bridge the gap between open-source and commercial placers, we present a novel placement optimization framework named GAN-Place that employs generative adversarial learning to transfer the placement quality of the industry-leading commercial placer, Synopsys ICC2, to existing open-source GPU-accelerated placers (DREAMPlace and Xplace). Without the knowledge of the underlying proprietary algorithms or constraints used by the commercial tools, our framework facilitates transfer learning to directly enhance the open-source placers by optimizing the proposed differentiable loss that denotes the “similarity” between DREAMPlace- or Xplace-generated placements and those in commercial databases. Experimental results on 7 industrial designs not only show the our GAN-Place immediately improves the Power, Performance, and Area (PPA) metrics at the placement stage, but also demonstrate that these improvements last firmly to the post-route stage, where we observe improvements by up to 8.3% in wirelength, 7.4% in power, and 37.6% in Total Negative Slack (TNS) on a commercial CPU benchmark.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2023-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138554273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bo Wang, Sheng Ma, Shengbai Luo, Lizhou Wu, Jianmin Zhang, Chunyuan Zhang, Tiejun Li
Deep learning has become a highly popular research field, and previously deep learning algorithms ran primarily on CPUs and GPUs. However, with the rapid development of deep learning, it was discovered that existing processors could not meet the specific large-scale computing requirements of deep learning, and custom deep learning accelerators have become popular. The majority of the primary workloads in deep learning are general matrix-matrix multiplications (GEMM), and emerging GEMMs are highly sparse and irregular. The TPU and SIGMA are typical GEMM accelerators in recent years, but the TPU does not support sparsity, and both the TPU and SIGMA have insufficient utilization rates of the Processing Element (PE). We design and implement the SparGD, a sparse GEMM accelerator with dynamic dataflow. The SparGD has specific PE structures, flexible distribution networks and reduction networks, and a simple dataflow switching module. When running sparse and irregular GEMMs, the SparGD can maintain high PE utilization while utilizing sparsity, and can switch to the optimal dataflow according to the computing environment. For sparse, irregular GEMMs, our experimental results show that the SparGD outperforms systolic arrays by 30 times and SIGMA by 3.6 times.
{"title":"SparGD: A Sparse GEMM Accelerator with Dynamic Dataflow","authors":"Bo Wang, Sheng Ma, Shengbai Luo, Lizhou Wu, Jianmin Zhang, Chunyuan Zhang, Tiejun Li","doi":"10.1145/3634703","DOIUrl":"https://doi.org/10.1145/3634703","url":null,"abstract":"<p>Deep learning has become a highly popular research field, and previously deep learning algorithms ran primarily on CPUs and GPUs. However, with the rapid development of deep learning, it was discovered that existing processors could not meet the specific large-scale computing requirements of deep learning, and custom deep learning accelerators have become popular. The majority of the primary workloads in deep learning are general matrix-matrix multiplications (GEMM), and emerging GEMMs are highly sparse and irregular. The TPU and SIGMA are typical GEMM accelerators in recent years, but the TPU does not support sparsity, and both the TPU and SIGMA have insufficient utilization rates of the Processing Element (PE). We design and implement the SparGD, a sparse GEMM accelerator with dynamic dataflow. The SparGD has specific PE structures, flexible distribution networks and reduction networks, and a simple dataflow switching module. When running sparse and irregular GEMMs, the SparGD can maintain high PE utilization while utilizing sparsity, and can switch to the optimal dataflow according to the computing environment. For sparse, irregular GEMMs, our experimental results show that the SparGD outperforms systolic arrays by 30 times and SIGMA by 3.6 times.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2023-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138537677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Controllers are mission-critical components of any electronic design. By sending control signals, it decides which and when other data path elements must operate. Faults, especially Single Event Upset (SEU) occurrence in these components, can lead to functional/mission failure of the system when deployed in harsh environments. Hence, a competence to self-heal from SEU is highly required in the control path of the digital system. Reconfiguration is critical for recovering from a faulty state to a non-faulty state. Compared to native reconfiguration, the Virtual Reconfigurable Circuit (VRC) is an FPGA-generic reconfiguration mechanism. The non-partial reconfiguration in VRC and extensive architecture are considered hindrances in extending the VRC-based Evolvable Hardware (EHW) to real-time fault mitigation. To confront this challenge, we have proposed an intrinsic constrained evolution to improve the scalability and accelerate the evolution process for VRC-based fault mitigation in mission-critical applications. Experimentation is conducted on complex ACM/SIGDA benchmark circuits and real-time circuits used in space missions, which are not included in related works. In addition, a comparative study is made between existing and proposed methodologies for brushless DC motor control circuits. The hardware utilization in the multiplexer has been significantly reduced, resulting in up to 77% reduction in the existing VRC architecture. The proposed methodology employs a fault localization approach to narrow the search space effectively. This approach has yielded an 87% improvement on average in convergence speed, as measured by the evolution time compared to the existing work.
{"title":"Scalable and Accelerated Self Healing Control Circuit using Evolvable Hardware","authors":"Deepanjali.S, Noor Mahammad.Sk","doi":"10.1145/3634682","DOIUrl":"https://doi.org/10.1145/3634682","url":null,"abstract":"<p>Controllers are mission-critical components of any electronic design. By sending control signals, it decides which and when other data path elements must operate. Faults, especially Single Event Upset (SEU) occurrence in these components, can lead to functional/mission failure of the system when deployed in harsh environments. Hence, a competence to self-heal from SEU is highly required in the control path of the digital system. Reconfiguration is critical for recovering from a faulty state to a non-faulty state. Compared to native reconfiguration, the Virtual Reconfigurable Circuit (VRC) is an FPGA-generic reconfiguration mechanism. The non-partial reconfiguration in VRC and extensive architecture are considered hindrances in extending the VRC-based Evolvable Hardware (EHW) to real-time fault mitigation. To confront this challenge, we have proposed an intrinsic constrained evolution to improve the scalability and accelerate the evolution process for VRC-based fault mitigation in mission-critical applications. Experimentation is conducted on complex ACM/SIGDA benchmark circuits and real-time circuits used in space missions, which are not included in related works. In addition, a comparative study is made between existing and proposed methodologies for brushless DC motor control circuits. The hardware utilization in the multiplexer has been significantly reduced, resulting in up to 77% reduction in the existing VRC architecture. The proposed methodology employs a fault localization approach to narrow the search space effectively. This approach has yielded an 87% improvement on average in convergence speed, as measured by the evolution time compared to the existing work.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2023-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138537682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We describe an exciting new application domain for deep reinforcement learning (RL): droplet routing on digital microfluidic biochips (DMFBs). A DMFB consists of a two-dimensional electrode array, and it manipulates droplets of liquid to automatically execute biochemical protocols for clinical chemistry. However, a major problem with DMFBs is that electrodes can degrade over time. The transportation of droplet transportation over these degraded electrodes can fail, thereby adversely impacting the integrity of the bioassay outcome. We demonstrated that the fomulation of droplet transportation as an RL problem enables the training of deep neural network policies that can adapt to the underlying health conditions of electrodes and ensure reliable fluidic operations. We describe an RL-based droplet-routing solution that can be used for various sizes of DMFBs. We highlight the reliable execution of an epigenetic bioassay with the RL droplet router on a fabricated DMFB. We show that the use of the RL approach on a simple micro-computer (Raspberry Pi 4) leads to acceptable performance for time-critical bioassays. We present a simulation environment based on the OpenAI Gym Interface for RL-guided droplet routing problems on DMFBs. We present results on our study of electrode degradation using fabricated DMFBs. The study supports the degradation model used in the simulator.
{"title":"Dynamic Adaptation Using Deep Reinforcement Learning for Digital Microfluidic Biochips","authors":"Tung-Che Liang, Yi-Chen Chang, Zhanwei Zhong, Yaas Bigdeli, Tsung-Yi Ho, Krishnendu Chakrabarty, Richard Fair","doi":"10.1145/3633458","DOIUrl":"https://doi.org/10.1145/3633458","url":null,"abstract":"<p>We describe an exciting new application domain for deep reinforcement learning (RL): droplet routing on digital microfluidic biochips (DMFBs). A DMFB consists of a two-dimensional electrode array, and it manipulates droplets of liquid to automatically execute biochemical protocols for clinical chemistry. However, a major problem with DMFBs is that electrodes can degrade over time. The transportation of droplet transportation over these degraded electrodes can fail, thereby adversely impacting the integrity of the bioassay outcome. We demonstrated that the fomulation of droplet transportation as an RL problem enables the training of deep neural network policies that can adapt to the underlying health conditions of electrodes and ensure reliable fluidic operations. We describe an RL-based droplet-routing solution that can be used for various sizes of DMFBs. We highlight the reliable execution of an epigenetic bioassay with the RL droplet router on a fabricated DMFB. We show that the use of the RL approach on a simple micro-computer (Raspberry Pi 4) leads to acceptable performance for time-critical bioassays. We present a simulation environment based on the OpenAI Gym Interface for RL-guided droplet routing problems on DMFBs. We present results on our study of electrode degradation using fabricated DMFBs. The study supports the degradation model used in the simulator.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2023-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138537678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zijin Pan, Xunyu Li, Weiquan Hao, Runyu Miao, Albert Wang
Electrostatic discharge (ESD) can cause malfunction or failure of integrated circuits (ICs). On-chip ESD protection design is a major IC design-for-reliability (DfR) challenge, particularly for complex chips made in advanced technology nodes. Traditional trial-and-error approaches become unacceptable to practical ESD protection designs for advanced ICs. Full-chip ESD protection circuit design optimization, prediction, and verification become essential to advanced chip designs, which highly depends on CAD algorithm and simulation that has been a constant research topic for decades. This paper reviews recent advances in CAD-enabled on-chip ESD protection circuit simulation design technologies and ESD-IC co-design methodologies. Key challenges of ESD CAD design practices are outlined. Practical ESD protection simulation design examples are discussed.
{"title":"On-chip ESD Protection Design Methodologies by CAD Simulation","authors":"Zijin Pan, Xunyu Li, Weiquan Hao, Runyu Miao, Albert Wang","doi":"10.1145/3593808","DOIUrl":"https://doi.org/10.1145/3593808","url":null,"abstract":"<p><b>Electrostatic discharge (ESD)</b> can cause malfunction or failure of <b>integrated circuits (ICs)</b>. On-chip ESD protection design is a major IC <b>design-for-reliability (DfR)</b> challenge, particularly for complex chips made in advanced technology nodes. Traditional trial-and-error approaches become unacceptable to practical ESD protection designs for advanced ICs. Full-chip ESD protection circuit design optimization, prediction, and verification become essential to advanced chip designs, which highly depends on CAD algorithm and simulation that has been a constant research topic for decades. This paper reviews recent advances in CAD-enabled on-chip ESD protection circuit simulation design technologies and ESD-IC co-design methodologies. Key challenges of ESD CAD design practices are outlined. Practical ESD protection simulation design examples are discussed.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2023-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138543466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Linwei Niu, Danda B. Rawat, Jonathan Musselwhite, Zonghua Gu, Qingxu Deng
For real-time embedded systems, QoS (Quality of Service), fault tolerance, and energy budget constraint are among the primary design concerns. In this research, we investigate the problem of energy constrained standby-sparing for both periodic and aperiodic tasks in a weakly hard real-time environment. The standby-sparing systems adopt a primary processor and a spare processor to provide fault tolerance for both permanent and transient faults. For such kind of systems, we firstly propose several novel standby-sparing schemes for the periodic tasks which can ensure the system feasibility under tighter energy budget constraint than the traditional ones. Then based on them integrated approachs for both periodic and aperiodic tasks are proposed to minimize the aperiodic response time whilst achieving better energy and QoS performance under the given energy budget constraint. The evaluation results demonstrated that the proposed techniques significantly outperformed the existing state of the art approaches in terms of feasibility and system performance while ensuring QoS and fault tolerance under the given energy budget constraint.
{"title":"Energy-Constrained Scheduling for Weakly Hard Real-Time Systems Using Standby-Sparing","authors":"Linwei Niu, Danda B. Rawat, Jonathan Musselwhite, Zonghua Gu, Qingxu Deng","doi":"10.1145/3631587","DOIUrl":"https://doi.org/10.1145/3631587","url":null,"abstract":"For real-time embedded systems, QoS (Quality of Service), fault tolerance, and energy budget constraint are among the primary design concerns. In this research, we investigate the problem of energy constrained standby-sparing for both periodic and aperiodic tasks in a weakly hard real-time environment. The standby-sparing systems adopt a primary processor and a spare processor to provide fault tolerance for both permanent and transient faults. For such kind of systems, we firstly propose several novel standby-sparing schemes for the periodic tasks which can ensure the system feasibility under tighter energy budget constraint than the traditional ones. Then based on them integrated approachs for both periodic and aperiodic tasks are proposed to minimize the aperiodic response time whilst achieving better energy and QoS performance under the given energy budget constraint. The evaluation results demonstrated that the proposed techniques significantly outperformed the existing state of the art approaches in terms of feasibility and system performance while ensuring QoS and fault tolerance under the given energy budget constraint.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134991838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Logic synthesis is a crucial step in electronic design automation tools. The rapid developments of reinforcement learning (RL) have enabled the automated exploration of logic synthesis. Existing RL based methods may lead to data inefficiency, and the exploration approaches for FPGA and ASIC technology mapping in recent works lack the flexibility of the learning process. This work proposes ESE, a reinforcement learning based framework to efficiently learn the logic synthesis process. The framework supports the modeling of logic optimization and technology mapping for FPGA and ASIC. The optimization for the execution time of the synthesis script is also considered. For the modeling of FPGA mapping, the logic optimization and technology mapping are combined to be learned in a flexible way. For the modeling of ASIC mapping, the standard cell based optimization and LUT optimization operations are incorporated into the ASIC synthesis flow. To improve the utilization of samples, the Proximal Policy Optimization model is adopted. Furthermore, the framework is enhanced by supporting MIG based synthesis exploration. Experiments show that for FPGA technology mapping on the VTR benchmark, the average LUT-Level-Product and script runtime are improved by more than 18.3% and 12.4% respectively than previous works. For ASIC mapping on the EPFL benchmark, the average Area-Delay-Product is improved by 14.5%.
{"title":"An Efficient Reinforcement Learning Based Framework for Exploring Logic Synthesis","authors":"Yu Qian, Xuegong Zhou, Hao Zhou, Lingli Wang","doi":"10.1145/3632174","DOIUrl":"https://doi.org/10.1145/3632174","url":null,"abstract":"Logic synthesis is a crucial step in electronic design automation tools. The rapid developments of reinforcement learning (RL) have enabled the automated exploration of logic synthesis. Existing RL based methods may lead to data inefficiency, and the exploration approaches for FPGA and ASIC technology mapping in recent works lack the flexibility of the learning process. This work proposes ESE, a reinforcement learning based framework to efficiently learn the logic synthesis process. The framework supports the modeling of logic optimization and technology mapping for FPGA and ASIC. The optimization for the execution time of the synthesis script is also considered. For the modeling of FPGA mapping, the logic optimization and technology mapping are combined to be learned in a flexible way. For the modeling of ASIC mapping, the standard cell based optimization and LUT optimization operations are incorporated into the ASIC synthesis flow. To improve the utilization of samples, the Proximal Policy Optimization model is adopted. Furthermore, the framework is enhanced by supporting MIG based synthesis exploration. Experiments show that for FPGA technology mapping on the VTR benchmark, the average LUT-Level-Product and script runtime are improved by more than 18.3% and 12.4% respectively than previous works. For ASIC mapping on the EPFL benchmark, the average Area-Delay-Product is improved by 14.5%.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135136662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Coarse-Grained Reconfigurable Arrays (CGRA) are promising edge accelerators due to the outstanding balance in flexibility, performance, and energy efficiency. Classic CGRAs statically map compute operations onto the processing elements (PE) and route the data dependencies among the operations through the Network-on-Chip. However, CGRAs are designed for fine-grained static instruction-level parallelism and struggle to accelerate applications with dynamic and irregular data-level parallelism, such as graph processing. To address this limitation, we present Flip , a novel accelerator that enhances traditional CGRA architectures to boost the performance of graph applications. Flip retains the classic CGRA execution model while introducing a special data-centric mode for efficient graph processing. Specifically, it leverages the inherent data parallelism of graph algorithms by mapping graph vertices onto PEs rather than the operations, and supporting dynamic routing of temporary data according to the runtime evolution of the graph frontier. Experimental results demonstrate that Flip achieves up to 36 × speedup with merely 19% more area compared to classic CGRAs. Compared to state-of-the-art large-scale graph processors, Flip has similar energy efficiency and 2.2 × better area efficiency at a much-reduced power/area budget.
{"title":"F <scp>lip</scp> : Data-Centric Edge CGRA Accelerator","authors":"Dan Wu, Peng Chen, Thilini Kaushalya Bandara, Zhaoying Li, Tulika Mitra","doi":"10.1145/3631118","DOIUrl":"https://doi.org/10.1145/3631118","url":null,"abstract":"Coarse-Grained Reconfigurable Arrays (CGRA) are promising edge accelerators due to the outstanding balance in flexibility, performance, and energy efficiency. Classic CGRAs statically map compute operations onto the processing elements (PE) and route the data dependencies among the operations through the Network-on-Chip. However, CGRAs are designed for fine-grained static instruction-level parallelism and struggle to accelerate applications with dynamic and irregular data-level parallelism, such as graph processing. To address this limitation, we present Flip , a novel accelerator that enhances traditional CGRA architectures to boost the performance of graph applications. Flip retains the classic CGRA execution model while introducing a special data-centric mode for efficient graph processing. Specifically, it leverages the inherent data parallelism of graph algorithms by mapping graph vertices onto PEs rather than the operations, and supporting dynamic routing of temporary data according to the runtime evolution of the graph frontier. Experimental results demonstrate that Flip achieves up to 36 × speedup with merely 19% more area compared to classic CGRAs. Compared to state-of-the-art large-scale graph processors, Flip has similar energy efficiency and 2.2 × better area efficiency at a much-reduced power/area budget.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135818707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}