首页 > 最新文献

2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)最新文献

英文 中文
Statistical Model Checking of Approximate Circuits: Challenges and Opportunities 近似电路的统计模型检验:挑战与机遇
Pub Date : 2020-03-01 DOI: 10.23919/DATE48585.2020.9116207
Josef Strnadel
Many works have shown that approximate circuits may play an important role in the development of resource-efficient electronic systems. This motivates many researchers to propose new approaches for finding an optimal trade-off between the approximation error and resource savings for predefined applications of approximate circuits. The works and approaches, however, focus mainly on design aspects regarding relaxed functional requirements while neglecting further aspects such as signal and parameter dynamics/stochasticity, relaxed/non-functional equivalence, testing or formal verification. This paper aims to take a step ahead by moving towards the formal verification of time-dependent properties of systems based on approximate circuits. Firstly, it presents our approach to modeling such systems by means of stochastic timed automata whereas our approach goes beyond digital, combinational and/or synchronous circuits and is applicable in the area of sequential, analog and/or asynchronous circuits as well. Secondly, the paper shows the principle and advantage of verifying properties of modeled approximate systems by the statistical model checking technique. Finally, the paper evaluates our approach and outlines future research perspectives.
许多工作表明,近似电路可能在资源节约型电子系统的发展中发挥重要作用。这促使许多研究人员提出新的方法,在近似电路的预定义应用中寻找近似误差和资源节约之间的最佳权衡。然而,这些工作和方法主要集中在关于宽松功能需求的设计方面,而忽略了诸如信号和参数动态/随机性、宽松/非功能等效、测试或形式验证等进一步的方面。本文的目的是向前迈出一步,朝着基于近似电路的系统的时间相关性质的正式验证迈进。首先,它介绍了我们通过随机时间自动机对此类系统建模的方法,而我们的方法超越了数字,组合和/或同步电路,并且适用于顺序,模拟和/或异步电路领域。其次,介绍了用统计模型检验技术验证建模近似系统性质的原理和优点。最后,本文评估了我们的方法并概述了未来的研究前景。
{"title":"Statistical Model Checking of Approximate Circuits: Challenges and Opportunities","authors":"Josef Strnadel","doi":"10.23919/DATE48585.2020.9116207","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116207","url":null,"abstract":"Many works have shown that approximate circuits may play an important role in the development of resource-efficient electronic systems. This motivates many researchers to propose new approaches for finding an optimal trade-off between the approximation error and resource savings for predefined applications of approximate circuits. The works and approaches, however, focus mainly on design aspects regarding relaxed functional requirements while neglecting further aspects such as signal and parameter dynamics/stochasticity, relaxed/non-functional equivalence, testing or formal verification. This paper aims to take a step ahead by moving towards the formal verification of time-dependent properties of systems based on approximate circuits. Firstly, it presents our approach to modeling such systems by means of stochastic timed automata whereas our approach goes beyond digital, combinational and/or synchronous circuits and is applicable in the area of sequential, analog and/or asynchronous circuits as well. Secondly, the paper shows the principle and advantage of verifying properties of modeled approximate systems by the statistical model checking technique. Finally, the paper evaluates our approach and outlines future research perspectives.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121228210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rescuing Logic Encryption in Post-SAT Era by Locking & Obfuscation 通过锁定和混淆拯救后sat时代的逻辑加密
Pub Date : 2020-03-01 DOI: 10.23919/DATE48585.2020.9116500
Amin Rezaei, Yuanqi Shen, H. Zhou
The active participation of external entities in the manufacturing flow has produced numerous hardware security issues in which piracy and overproduction are likely to be the most ubiquitous and expensive ones. The main approach to prevent unauthorized products from functioning is logic encryption that inserts key-controlled gates to the original circuit in a way that the valid behavior of the circuit only happens when the correct key is applied. The challenge for the security designer is to ensure neither the correct key nor the original circuit can be revealed by different analyses of the encrypted circuit. However, in state-of-the-art logic encryption works, a lot of performance is sold to guarantee security against powerful logic and structural attacks. This contradicts the primary reason of logic encryption that is to protect a precious design from being pirated and overproduced. In this paper, we propose a bilateral logic encryption platform that maintains high degree of security with small circuit modification. The robustness against exact and approximate attacks is also demonstrated.
外部实体在制造流程中的积极参与产生了许多硬件安全问题,其中盗版和生产过剩可能是最普遍和最昂贵的问题。防止未经授权的产品运行的主要方法是逻辑加密,即在原始电路中插入密钥控制的门,使电路的有效行为仅在应用正确的密钥时发生。安全设计人员面临的挑战是确保正确的密钥和原始电路不会通过对加密电路的不同分析而被泄露。然而,在最先进的逻辑加密工作中,很多性能都是为了保证对强大的逻辑和结构攻击的安全性。这与逻辑加密的主要原因相矛盾,逻辑加密是为了保护宝贵的设计不被盗版和过度生产。在本文中,我们提出了一个双边逻辑加密平台,以保持高的安全性和小的电路修改。对精确攻击和近似攻击的鲁棒性也进行了验证。
{"title":"Rescuing Logic Encryption in Post-SAT Era by Locking & Obfuscation","authors":"Amin Rezaei, Yuanqi Shen, H. Zhou","doi":"10.23919/DATE48585.2020.9116500","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116500","url":null,"abstract":"The active participation of external entities in the manufacturing flow has produced numerous hardware security issues in which piracy and overproduction are likely to be the most ubiquitous and expensive ones. The main approach to prevent unauthorized products from functioning is logic encryption that inserts key-controlled gates to the original circuit in a way that the valid behavior of the circuit only happens when the correct key is applied. The challenge for the security designer is to ensure neither the correct key nor the original circuit can be revealed by different analyses of the encrypted circuit. However, in state-of-the-art logic encryption works, a lot of performance is sold to guarantee security against powerful logic and structural attacks. This contradicts the primary reason of logic encryption that is to protect a precious design from being pirated and overproduced. In this paper, we propose a bilateral logic encryption platform that maintains high degree of security with small circuit modification. The robustness against exact and approximate attacks is also demonstrated.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116733707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
SCRIMP: A General Stochastic Computing Architecture using ReRAM in-Memory Processing SCRIMP:一种使用ReRAM在内存中处理的通用随机计算架构
Pub Date : 2020-03-01 DOI: 10.23919/DATE48585.2020.9116338
Saransh Gupta, M. Imani, Joonseop Sim, Andrew Huang, Fan Wu, M. Najafi, T. Simunic
Stochastic computing (SC) reduces the complexity of computation by representing numbers with long independent bit-streams. However, increasing performance in SC comes with increase in area and loss in accuracy. Processing in memory (PIM) with non-volatile memories (NVMs) computes data inplace, while having high memory density and supporting bitparallel operations with low energy. In this paper, we propose SCRIMP for stochastic computing acceleration with resistive RAM (ReRAM) in-memory processing, which enables SC in memory. SCRIMP can be used for a wide range of applications. It supports all SC encodings and operations in memory. It maximizes the performance and energy efficiency of implementing SC by introducing novel in-memory parallel stochastic number generation and efficient implication-based logic in memory. To show the efficiency of our stochastic architecture, we implement image processing on the proposed hardware.
随机计算(SC)通过用长独立的比特流表示数字来降低计算的复杂性。然而,SC性能的提高伴随着面积的增加和精度的降低。使用非易失性存储器(nvm)的内存处理(PIM)可以就地计算数据,同时具有高内存密度并支持低能耗的位并行操作。在本文中,我们提出了随机计算加速的SCRIMP与内存中的电阻性RAM (ReRAM)处理,使SC在内存中。SCRIMP可用于广泛的应用。它支持内存中的所有SC编码和操作。它通过在内存中引入新颖的并行随机数字生成和高效的基于蕴涵的内存逻辑,最大限度地提高了SC的性能和能源效率。为了证明随机结构的有效性,我们在所提出的硬件上实现了图像处理。
{"title":"SCRIMP: A General Stochastic Computing Architecture using ReRAM in-Memory Processing","authors":"Saransh Gupta, M. Imani, Joonseop Sim, Andrew Huang, Fan Wu, M. Najafi, T. Simunic","doi":"10.23919/DATE48585.2020.9116338","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116338","url":null,"abstract":"Stochastic computing (SC) reduces the complexity of computation by representing numbers with long independent bit-streams. However, increasing performance in SC comes with increase in area and loss in accuracy. Processing in memory (PIM) with non-volatile memories (NVMs) computes data inplace, while having high memory density and supporting bitparallel operations with low energy. In this paper, we propose SCRIMP for stochastic computing acceleration with resistive RAM (ReRAM) in-memory processing, which enables SC in memory. SCRIMP can be used for a wide range of applications. It supports all SC encodings and operations in memory. It maximizes the performance and energy efficiency of implementing SC by introducing novel in-memory parallel stochastic number generation and efficient implication-based logic in memory. To show the efficiency of our stochastic architecture, we implement image processing on the proposed hardware.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123890611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
AstroByte: Multi-FPGA Architecture for Accelerated Simulations of Spiking Astrocyte Neural Networks 星形胶质细胞神经网络加速模拟的多fpga架构
Pub Date : 2020-03-01 DOI: 10.23919/DATE48585.2020.9116312
Shvan Karim, J. Harkin, L. McDaid, B. Gardiner, Junxiu Liu
Spiking astrocyte neural networks (SANN) are a new computational paradigm that exhibit enhanced self-adapting and reliability properties. The inclusion of astrocyte behaviour increases the computational load and critically the number of connections, where each astrocyte typically communicates with up to 9 neurons (and their associated synapses) with feedback pathways from each neuron to the astrocyte. Each astrocyte cell also communicates with its neighbouring cell resulting in a significant interconnect density. The substantial level of parallelisms in SANNs lends itself to acceleration in hardware, however, the challenge in accelerating simulations of SANNs firmly resides in scalable interconnect and the ability to inject and retrieve data from the hardware. This paper presents a novel multi-FPGA acceleration architecture, AstroByte, for the speedup of SANNs. AstroByte explores Networks-on-Chip (NoC) routing mechanisms to address the challenge of communicating both spike event (neuron data) and numeric (astrocyte data) across significant interconnect pathways between astrocytes and neurons. AstroByte also exploits the NoC interconnect to inject data and retrieve runtime data from the accelerated SANN simulations. Results show that AstroByte can simulate SANN applications with speedup factors of between xl62 -xl88 over Matlab equivalent simulations.
脉冲星形胶质细胞神经网络(SANN)是一种新的计算范式,具有增强的自适应性和可靠性。星形胶质细胞的行为增加了计算负荷,关键是增加了连接的数量,其中每个星形胶质细胞通常与多达9个神经元(及其相关突触)进行通信,并通过每个神经元到星形胶质细胞的反馈通路。每个星形胶质细胞也与其相邻细胞通信,从而产生显著的互连密度。sann中的大量并行性有助于硬件的加速,然而,加速sann模拟的挑战主要在于可扩展的互连以及从硬件注入和检索数据的能力。本文提出了一种新的多fpga加速体系结构AstroByte,用于san的加速。AstroByte探索了片上网络(NoC)路由机制,以解决在星形胶质细胞和神经元之间的重要互连通路上传递峰值事件(神经元数据)和数字(星形胶质细胞数据)的挑战。AstroByte还利用NoC互连从加速的SANN模拟中注入数据和检索运行时数据。结果表明,AstroByte可以模拟SANN应用程序,与Matlab等效模拟相比,加速因子在xl62 -xl88之间。
{"title":"AstroByte: Multi-FPGA Architecture for Accelerated Simulations of Spiking Astrocyte Neural Networks","authors":"Shvan Karim, J. Harkin, L. McDaid, B. Gardiner, Junxiu Liu","doi":"10.23919/DATE48585.2020.9116312","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116312","url":null,"abstract":"Spiking astrocyte neural networks (SANN) are a new computational paradigm that exhibit enhanced self-adapting and reliability properties. The inclusion of astrocyte behaviour increases the computational load and critically the number of connections, where each astrocyte typically communicates with up to 9 neurons (and their associated synapses) with feedback pathways from each neuron to the astrocyte. Each astrocyte cell also communicates with its neighbouring cell resulting in a significant interconnect density. The substantial level of parallelisms in SANNs lends itself to acceleration in hardware, however, the challenge in accelerating simulations of SANNs firmly resides in scalable interconnect and the ability to inject and retrieve data from the hardware. This paper presents a novel multi-FPGA acceleration architecture, AstroByte, for the speedup of SANNs. AstroByte explores Networks-on-Chip (NoC) routing mechanisms to address the challenge of communicating both spike event (neuron data) and numeric (astrocyte data) across significant interconnect pathways between astrocytes and neurons. AstroByte also exploits the NoC interconnect to inject data and retrieve runtime data from the accelerated SANN simulations. Results show that AstroByte can simulate SANN applications with speedup factors of between xl62 -xl88 over Matlab equivalent simulations.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116718949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Are Cloud FPGAs Really Vulnerable to Power Analysis Attacks? 云fpga真的容易受到功率分析攻击吗?
Pub Date : 2020-03-01 DOI: 10.23919/DATE48585.2020.9116481
Ognjen Glamočanin, Louis Coulon, F. Regazzoni, Mirjana Stojilović
Recent works have demonstrated the possibility of extracting secrets from a cryptographic core running on an FPGA by means of remote power analysis attacks. To mount these attacks, an adversary implements a voltage fluctuation sensor in the FPGA logic, records the power consumption of the target cryptographic core, and recovers the secret key by running a power analysis attack on the recorded traces. Despite showing that the power analysis could also be performed without physical access to the cryptographic core, these works were mostly carried out on dedicated FPGA boards in a controlled environment, leaving open the question about the possibility to successfully mount these attacks on a real system deployed in the cloud. In this paper, we demonstrate, for the first time, a successful key recovery attack on an AES cryptographic accelerator running on an Amazon EC2 F1 instance. We collect the power traces using a delay-line based voltage drop sensor, adapted to the Xilinx Virtex Ultrascale+ architecture used on Amazon EC2 F1, where CARRY8 blocks do not have a monotonic delay increase at their outputs. Our results demonstrate that security concerns raised by multitenant FPGAs are indeed valid and that countermeasures should be put in place to mitigate them.
最近的工作已经证明了通过远程功率分析攻击从FPGA上运行的加密核心中提取秘密的可能性。为了发动这些攻击,攻击者在FPGA逻辑中实现电压波动传感器,记录目标加密核心的功耗,并通过对记录的迹线运行功率分析攻击来恢复密钥。尽管表明功率分析也可以在没有物理访问加密核心的情况下执行,但这些工作主要是在受控环境中的专用FPGA板上进行的,这留下了一个问题,即在云部署的真实系统上成功安装这些攻击的可能性。在本文中,我们首次演示了对运行在Amazon EC2 F1实例上的AES加密加速器的成功密钥恢复攻击。我们使用基于延迟线的电压降传感器收集电源走线,该传感器适用于Amazon EC2 F1上使用的Xilinx Virtex Ultrascale+架构,其中CARRY8块在其输出处没有单调延迟增加。我们的研究结果表明,多租户fpga提出的安全问题确实是有效的,应该采取对策来缓解这些问题。
{"title":"Are Cloud FPGAs Really Vulnerable to Power Analysis Attacks?","authors":"Ognjen Glamočanin, Louis Coulon, F. Regazzoni, Mirjana Stojilović","doi":"10.23919/DATE48585.2020.9116481","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116481","url":null,"abstract":"Recent works have demonstrated the possibility of extracting secrets from a cryptographic core running on an FPGA by means of remote power analysis attacks. To mount these attacks, an adversary implements a voltage fluctuation sensor in the FPGA logic, records the power consumption of the target cryptographic core, and recovers the secret key by running a power analysis attack on the recorded traces. Despite showing that the power analysis could also be performed without physical access to the cryptographic core, these works were mostly carried out on dedicated FPGA boards in a controlled environment, leaving open the question about the possibility to successfully mount these attacks on a real system deployed in the cloud. In this paper, we demonstrate, for the first time, a successful key recovery attack on an AES cryptographic accelerator running on an Amazon EC2 F1 instance. We collect the power traces using a delay-line based voltage drop sensor, adapted to the Xilinx Virtex Ultrascale+ architecture used on Amazon EC2 F1, where CARRY8 blocks do not have a monotonic delay increase at their outputs. Our results demonstrate that security concerns raised by multitenant FPGAs are indeed valid and that countermeasures should be put in place to mitigate them.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128008982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 41
Dynamic Thermal Management with Proactive Fan Speed Control Through Reinforcement Learning 通过强化学习实现主动风扇转速控制的动态热管理
Pub Date : 2020-03-01 DOI: 10.23919/DATE48585.2020.9116510
Arman Iranfar, F. Terraneo, Gabor Csordas, Marina Zapater, W. Fornaciari, David Atienza Alonso
Dynamic Thermal Management (DTM) has become a major challenge since it directly affects Multiprocessors Systems-on-chip (MPSoCs) performance, power consumption, and reliability. In this work, we propose a transient fan model, enabling adaptive fan speed control simulation for efficient DTM. Our model is validated through a thermal test chip achieving less than 2°C error in the worst case. With multiple fan speeds, however, the DTM design space grows significantly, which can ultimately make conventional solutions impractical. We address this challenge through a reinforcement learning-based solution to proactively determine the number of active cores, operating frequency, and fan speed. The proposed solution is able to reduce fan power by up to 40% compared to a DTM with constant fan speed with less than 1% performance degradation. Also, compared to a state-of-the-art DTM technique our solution improves the performance by up to 19% for the same fan power.
动态热管理(DTM)直接影响到多处理器片上系统(mpsoc)的性能、功耗和可靠性,因此已经成为一个重大挑战。在这项工作中,我们提出了一个瞬态风扇模型,实现了高效DTM的自适应风扇转速控制仿真。我们的模型通过热测试芯片进行验证,在最坏的情况下误差小于2°C。然而,随着多个风扇转速的增加,DTM的设计空间会显著增加,这最终会使传统的解决方案变得不切实际。我们通过一种基于强化学习的解决方案来应对这一挑战,该解决方案可以主动确定活动内核的数量、工作频率和风扇速度。与恒定风扇转速的DTM相比,该解决方案能够将风扇功率降低高达40%,而性能下降不到1%。此外,与最先进的DTM技术相比,我们的解决方案在相同风扇功率的情况下将性能提高了19%。
{"title":"Dynamic Thermal Management with Proactive Fan Speed Control Through Reinforcement Learning","authors":"Arman Iranfar, F. Terraneo, Gabor Csordas, Marina Zapater, W. Fornaciari, David Atienza Alonso","doi":"10.23919/DATE48585.2020.9116510","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116510","url":null,"abstract":"Dynamic Thermal Management (DTM) has become a major challenge since it directly affects Multiprocessors Systems-on-chip (MPSoCs) performance, power consumption, and reliability. In this work, we propose a transient fan model, enabling adaptive fan speed control simulation for efficient DTM. Our model is validated through a thermal test chip achieving less than 2°C error in the worst case. With multiple fan speeds, however, the DTM design space grows significantly, which can ultimately make conventional solutions impractical. We address this challenge through a reinforcement learning-based solution to proactively determine the number of active cores, operating frequency, and fan speed. The proposed solution is able to reduce fan power by up to 40% compared to a DTM with constant fan speed with less than 1% performance degradation. Also, compared to a state-of-the-art DTM technique our solution improves the performance by up to 19% for the same fan power.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133291539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Towards Serial-Equivalent Multi-Core Parallel Routing for FPGAs fpga串行等效多核并行路由研究
Pub Date : 2020-03-01 DOI: 10.23919/DATE48585.2020.9116313
Minghua Shen, Nong Xiao
In this paper, we present a serial-equivalent parallel router for FPGAs on modern multi-core processors. We are based on the inherent net order of serial router to schedule all the nets into a series of stages, where the non-conflicting nets are scheduled in same stage and the conflicting nets are scheduled in different stages. We explore the parallel routing of non-conflicting nets on multi-core processors for a significant speedup. We perform the data synchronization of conflicting stages using MPI-based message queue for a feasible routing solution. Note that load balance is always used to guide the multi-core parallel routing. Experimental results show that our parallel router provides about 19.13× speedup on average using 32 processor cores comparing to the serial router. Notably, our parallel router generates exactly the same wirelength as the serial router satisfying serial equivalency.
本文提出了一种适用于现代多核处理器fpga的串行等效并行路由器。我们根据串行路由器固有的网络顺序,将所有的网络调度到一系列的阶段,其中不冲突的网络在同一阶段调度,冲突的网络在不同阶段调度。我们探索了多核处理器上无冲突网络的并行路由,以获得显著的加速。为了寻求一种可行的路由解决方案,我们使用基于mpi的消息队列来执行冲突阶段的数据同步。请注意,负载平衡总是用于指导多核并行路由。实验结果表明,与串行路由器相比,我们的并行路由器在使用32个处理器核的情况下平均提供了19.13倍的加速。值得注意的是,我们的并行路由器与满足串行等效的串行路由器产生完全相同的无线长度。
{"title":"Towards Serial-Equivalent Multi-Core Parallel Routing for FPGAs","authors":"Minghua Shen, Nong Xiao","doi":"10.23919/DATE48585.2020.9116313","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116313","url":null,"abstract":"In this paper, we present a serial-equivalent parallel router for FPGAs on modern multi-core processors. We are based on the inherent net order of serial router to schedule all the nets into a series of stages, where the non-conflicting nets are scheduled in same stage and the conflicting nets are scheduled in different stages. We explore the parallel routing of non-conflicting nets on multi-core processors for a significant speedup. We perform the data synchronization of conflicting stages using MPI-based message queue for a feasible routing solution. Note that load balance is always used to guide the multi-core parallel routing. Experimental results show that our parallel router provides about 19.13× speedup on average using 32 processor cores comparing to the serial router. Notably, our parallel router generates exactly the same wirelength as the serial router satisfying serial equivalency.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"122 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115040770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Runtime Accuracy-Configurable Approximate Hardware Synthesis Using Logic Gating and Relaxation 运行时精度-使用逻辑门控和松弛的可配置近似硬件合成
Pub Date : 2020-03-01 DOI: 10.23919/DATE48585.2020.9116272
Tanfer Alan, A. Gerstlauer, J. Henkel
Approximate computing trades off computation accuracy against energy efficiency. Algorithms from several modern application domains such as decision making and computer vision are tolerant to approximations while still meeting their requirements. The extent of approximation tolerance, however, significantly varies with a change in input characteristics and applications.We propose a novel hybrid approach for the synthesis of runtime accuracy configurable hardware that minimizes energy consumption at area expense. To that end, first we explore instantiating multiple hardware blocks with different fixed approximation levels. These blocks can be selected dynamically and thus allow to configure the accuracy during runtime. They benefit from having fewer transistors and also synthesis relaxations in contrast to state-of-the-art gating mechanisms which only switch off a group of logic. Our hybrid approach combines instantiating such blocks with area-efficient gating mechanisms that reduce toggling activity, creating a fine-grained design-time knob on energy vs. area. Examining total energy savings for a Sobel Filter under different workloads and accuracy tolerances show that our method finds Pareto-optimal solutions providing up to 16% and 44% energy savings compared to state-of-the-art accuracy-configurable gating mechanism and an exact hardware block, respectively, at 2x area cost.
近似计算权衡了计算精度和能源效率。一些现代应用领域的算法,如决策和计算机视觉,在满足其要求的同时对近似具有容忍度。然而,近似公差的范围随着输入特性和应用的变化而显著变化。我们提出了一种新的混合方法,用于合成运行时精度可配置硬件,以最大限度地减少能耗。为此,我们首先探索实例化具有不同固定近似级别的多个硬件块。可以动态地选择这些块,从而允许在运行时配置准确性。与只关闭一组逻辑的最先进的门控机制相比,它们的优点是拥有更少的晶体管和合成弛豫。我们的混合方法结合了实例化这样的块和面积有效的门控机制,减少了切换活动,创建了一个细粒度的能量与面积的设计时间钮。在不同工作负载和精度公差下,对Sobel过滤器的总节能进行了检查,结果表明,与最先进的可精确配置的门控机制和精确的硬件块相比,我们的方法发现帕累托最优解决方案分别提供了高达16%和44%的节能,面积成本分别为2倍。
{"title":"Runtime Accuracy-Configurable Approximate Hardware Synthesis Using Logic Gating and Relaxation","authors":"Tanfer Alan, A. Gerstlauer, J. Henkel","doi":"10.23919/DATE48585.2020.9116272","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116272","url":null,"abstract":"Approximate computing trades off computation accuracy against energy efficiency. Algorithms from several modern application domains such as decision making and computer vision are tolerant to approximations while still meeting their requirements. The extent of approximation tolerance, however, significantly varies with a change in input characteristics and applications.We propose a novel hybrid approach for the synthesis of runtime accuracy configurable hardware that minimizes energy consumption at area expense. To that end, first we explore instantiating multiple hardware blocks with different fixed approximation levels. These blocks can be selected dynamically and thus allow to configure the accuracy during runtime. They benefit from having fewer transistors and also synthesis relaxations in contrast to state-of-the-art gating mechanisms which only switch off a group of logic. Our hybrid approach combines instantiating such blocks with area-efficient gating mechanisms that reduce toggling activity, creating a fine-grained design-time knob on energy vs. area. Examining total energy savings for a Sobel Filter under different workloads and accuracy tolerances show that our method finds Pareto-optimal solutions providing up to 16% and 44% energy savings compared to state-of-the-art accuracy-configurable gating mechanism and an exact hardware block, respectively, at 2x area cost.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115215371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Saving Power by Converting Flip-Flop to 3-Phase Latch-Based Designs 通过将触发器转换为基于三相锁存器的设计来节省功率
Pub Date : 2020-03-01 DOI: 10.23919/DATE48585.2020.9116563
Huimei Cheng, Xi Li, Yichen Gu, P. Beerel
Latches are smaller and lower power than flip-flops (FFs) and are typically used in a time-borrowing master-slave configuration. This paper presents an automatic flow for converting arbitrarily-complex single-clock-domain FF-based RTL designs to efficient 3-phase latch-based designs with reduced number of required latches, saving both register and clock-tree power. Post place-and-route results demonstrate that our 3-phase latch-based designs save an average of 15.5% and 18.5% power on a variety of ISCAS, CEP, and CPU benchmark circuits, compared to their more traditional FF and master-slave based alternatives.
锁存器比触发器(ff)更小,功耗更低,通常用于借用时间的主从配置。本文提出了一种将任意复杂的基于单时钟域ff的RTL设计转换为高效的基于三相锁存器的设计的自动流程,减少了所需的锁存器数量,节省了寄存器和时钟树的功率。放置和路由后的结果表明,与更传统的FF和基于主从的替代方案相比,我们基于三相锁存的设计在各种ISCAS, CEP和CPU基准电路上平均节省15.5%和18.5%的功耗。
{"title":"Saving Power by Converting Flip-Flop to 3-Phase Latch-Based Designs","authors":"Huimei Cheng, Xi Li, Yichen Gu, P. Beerel","doi":"10.23919/DATE48585.2020.9116563","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116563","url":null,"abstract":"Latches are smaller and lower power than flip-flops (FFs) and are typically used in a time-borrowing master-slave configuration. This paper presents an automatic flow for converting arbitrarily-complex single-clock-domain FF-based RTL designs to efficient 3-phase latch-based designs with reduced number of required latches, saving both register and clock-tree power. Post place-and-route results demonstrate that our 3-phase latch-based designs save an average of 15.5% and 18.5% power on a variety of ISCAS, CEP, and CPU benchmark circuits, compared to their more traditional FF and master-slave based alternatives.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114647564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Template schedule construction for global real-time scheduling on unrelated multiprocessor platforms 非相关多处理器平台上全局实时调度的模板调度构建
Pub Date : 2020-03-01 DOI: 10.23919/DATE48585.2020.9116409
A. Bertout, J. Goossens, E. Grolleau, Xavier Poczekajlo
The seminal work on the global real-time scheduling of periodic tasks on unrelated multiprocessor platforms is based on a two-step method. First, the workload of each task is distributed over the processors and it is proved that this first step success ensures the existence of a feasible schedule. Then, using this workload assignment as an input, a template schedule construction method is presented. In this work, we review the seminal work and show by using a counter-example that this second step is incomplete. Thus, we propose and prove correct a novel and efficient algorithm to build the template schedule.
不相关多处理器平台上周期性任务全局实时调度的开创性工作是基于两步法的。首先,将每个任务的工作负载分配到处理器上,并证明了这一步的成功确保了可行调度的存在。然后,以工作量分配为输入,提出了一种模板调度构建方法。在这项工作中,我们回顾了开创性的工作,并通过使用一个反例表明,第二步是不完整的。因此,我们提出并证明了一种新的高效的模板调度算法。
{"title":"Template schedule construction for global real-time scheduling on unrelated multiprocessor platforms","authors":"A. Bertout, J. Goossens, E. Grolleau, Xavier Poczekajlo","doi":"10.23919/DATE48585.2020.9116409","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116409","url":null,"abstract":"The seminal work on the global real-time scheduling of periodic tasks on unrelated multiprocessor platforms is based on a two-step method. First, the workload of each task is distributed over the processors and it is proved that this first step success ensures the existence of a feasible schedule. Then, using this workload assignment as an input, a template schedule construction method is presented. In this work, we review the seminal work and show by using a counter-example that this second step is incomplete. Thus, we propose and prove correct a novel and efficient algorithm to build the template schedule.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"165 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114532295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
期刊
2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1