首页 > 最新文献

2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)最新文献

英文 中文
Statistical Model Checking of Approximate Circuits: Challenges and Opportunities 近似电路的统计模型检验:挑战与机遇
Pub Date : 2020-03-01 DOI: 10.23919/DATE48585.2020.9116207
Josef Strnadel
Many works have shown that approximate circuits may play an important role in the development of resource-efficient electronic systems. This motivates many researchers to propose new approaches for finding an optimal trade-off between the approximation error and resource savings for predefined applications of approximate circuits. The works and approaches, however, focus mainly on design aspects regarding relaxed functional requirements while neglecting further aspects such as signal and parameter dynamics/stochasticity, relaxed/non-functional equivalence, testing or formal verification. This paper aims to take a step ahead by moving towards the formal verification of time-dependent properties of systems based on approximate circuits. Firstly, it presents our approach to modeling such systems by means of stochastic timed automata whereas our approach goes beyond digital, combinational and/or synchronous circuits and is applicable in the area of sequential, analog and/or asynchronous circuits as well. Secondly, the paper shows the principle and advantage of verifying properties of modeled approximate systems by the statistical model checking technique. Finally, the paper evaluates our approach and outlines future research perspectives.
许多工作表明,近似电路可能在资源节约型电子系统的发展中发挥重要作用。这促使许多研究人员提出新的方法,在近似电路的预定义应用中寻找近似误差和资源节约之间的最佳权衡。然而,这些工作和方法主要集中在关于宽松功能需求的设计方面,而忽略了诸如信号和参数动态/随机性、宽松/非功能等效、测试或形式验证等进一步的方面。本文的目的是向前迈出一步,朝着基于近似电路的系统的时间相关性质的正式验证迈进。首先,它介绍了我们通过随机时间自动机对此类系统建模的方法,而我们的方法超越了数字,组合和/或同步电路,并且适用于顺序,模拟和/或异步电路领域。其次,介绍了用统计模型检验技术验证建模近似系统性质的原理和优点。最后,本文评估了我们的方法并概述了未来的研究前景。
{"title":"Statistical Model Checking of Approximate Circuits: Challenges and Opportunities","authors":"Josef Strnadel","doi":"10.23919/DATE48585.2020.9116207","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116207","url":null,"abstract":"Many works have shown that approximate circuits may play an important role in the development of resource-efficient electronic systems. This motivates many researchers to propose new approaches for finding an optimal trade-off between the approximation error and resource savings for predefined applications of approximate circuits. The works and approaches, however, focus mainly on design aspects regarding relaxed functional requirements while neglecting further aspects such as signal and parameter dynamics/stochasticity, relaxed/non-functional equivalence, testing or formal verification. This paper aims to take a step ahead by moving towards the formal verification of time-dependent properties of systems based on approximate circuits. Firstly, it presents our approach to modeling such systems by means of stochastic timed automata whereas our approach goes beyond digital, combinational and/or synchronous circuits and is applicable in the area of sequential, analog and/or asynchronous circuits as well. Secondly, the paper shows the principle and advantage of verifying properties of modeled approximate systems by the statistical model checking technique. Finally, the paper evaluates our approach and outlines future research perspectives.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121228210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rescuing Logic Encryption in Post-SAT Era by Locking & Obfuscation 通过锁定和混淆拯救后sat时代的逻辑加密
Pub Date : 2020-03-01 DOI: 10.23919/DATE48585.2020.9116500
Amin Rezaei, Yuanqi Shen, H. Zhou
The active participation of external entities in the manufacturing flow has produced numerous hardware security issues in which piracy and overproduction are likely to be the most ubiquitous and expensive ones. The main approach to prevent unauthorized products from functioning is logic encryption that inserts key-controlled gates to the original circuit in a way that the valid behavior of the circuit only happens when the correct key is applied. The challenge for the security designer is to ensure neither the correct key nor the original circuit can be revealed by different analyses of the encrypted circuit. However, in state-of-the-art logic encryption works, a lot of performance is sold to guarantee security against powerful logic and structural attacks. This contradicts the primary reason of logic encryption that is to protect a precious design from being pirated and overproduced. In this paper, we propose a bilateral logic encryption platform that maintains high degree of security with small circuit modification. The robustness against exact and approximate attacks is also demonstrated.
外部实体在制造流程中的积极参与产生了许多硬件安全问题,其中盗版和生产过剩可能是最普遍和最昂贵的问题。防止未经授权的产品运行的主要方法是逻辑加密,即在原始电路中插入密钥控制的门,使电路的有效行为仅在应用正确的密钥时发生。安全设计人员面临的挑战是确保正确的密钥和原始电路不会通过对加密电路的不同分析而被泄露。然而,在最先进的逻辑加密工作中,很多性能都是为了保证对强大的逻辑和结构攻击的安全性。这与逻辑加密的主要原因相矛盾,逻辑加密是为了保护宝贵的设计不被盗版和过度生产。在本文中,我们提出了一个双边逻辑加密平台,以保持高的安全性和小的电路修改。对精确攻击和近似攻击的鲁棒性也进行了验证。
{"title":"Rescuing Logic Encryption in Post-SAT Era by Locking & Obfuscation","authors":"Amin Rezaei, Yuanqi Shen, H. Zhou","doi":"10.23919/DATE48585.2020.9116500","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116500","url":null,"abstract":"The active participation of external entities in the manufacturing flow has produced numerous hardware security issues in which piracy and overproduction are likely to be the most ubiquitous and expensive ones. The main approach to prevent unauthorized products from functioning is logic encryption that inserts key-controlled gates to the original circuit in a way that the valid behavior of the circuit only happens when the correct key is applied. The challenge for the security designer is to ensure neither the correct key nor the original circuit can be revealed by different analyses of the encrypted circuit. However, in state-of-the-art logic encryption works, a lot of performance is sold to guarantee security against powerful logic and structural attacks. This contradicts the primary reason of logic encryption that is to protect a precious design from being pirated and overproduced. In this paper, we propose a bilateral logic encryption platform that maintains high degree of security with small circuit modification. The robustness against exact and approximate attacks is also demonstrated.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116733707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
SCRIMP: A General Stochastic Computing Architecture using ReRAM in-Memory Processing SCRIMP:一种使用ReRAM在内存中处理的通用随机计算架构
Pub Date : 2020-03-01 DOI: 10.23919/DATE48585.2020.9116338
Saransh Gupta, M. Imani, Joonseop Sim, Andrew Huang, Fan Wu, M. Najafi, T. Simunic
Stochastic computing (SC) reduces the complexity of computation by representing numbers with long independent bit-streams. However, increasing performance in SC comes with increase in area and loss in accuracy. Processing in memory (PIM) with non-volatile memories (NVMs) computes data inplace, while having high memory density and supporting bitparallel operations with low energy. In this paper, we propose SCRIMP for stochastic computing acceleration with resistive RAM (ReRAM) in-memory processing, which enables SC in memory. SCRIMP can be used for a wide range of applications. It supports all SC encodings and operations in memory. It maximizes the performance and energy efficiency of implementing SC by introducing novel in-memory parallel stochastic number generation and efficient implication-based logic in memory. To show the efficiency of our stochastic architecture, we implement image processing on the proposed hardware.
随机计算(SC)通过用长独立的比特流表示数字来降低计算的复杂性。然而,SC性能的提高伴随着面积的增加和精度的降低。使用非易失性存储器(nvm)的内存处理(PIM)可以就地计算数据,同时具有高内存密度并支持低能耗的位并行操作。在本文中,我们提出了随机计算加速的SCRIMP与内存中的电阻性RAM (ReRAM)处理,使SC在内存中。SCRIMP可用于广泛的应用。它支持内存中的所有SC编码和操作。它通过在内存中引入新颖的并行随机数字生成和高效的基于蕴涵的内存逻辑,最大限度地提高了SC的性能和能源效率。为了证明随机结构的有效性,我们在所提出的硬件上实现了图像处理。
{"title":"SCRIMP: A General Stochastic Computing Architecture using ReRAM in-Memory Processing","authors":"Saransh Gupta, M. Imani, Joonseop Sim, Andrew Huang, Fan Wu, M. Najafi, T. Simunic","doi":"10.23919/DATE48585.2020.9116338","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116338","url":null,"abstract":"Stochastic computing (SC) reduces the complexity of computation by representing numbers with long independent bit-streams. However, increasing performance in SC comes with increase in area and loss in accuracy. Processing in memory (PIM) with non-volatile memories (NVMs) computes data inplace, while having high memory density and supporting bitparallel operations with low energy. In this paper, we propose SCRIMP for stochastic computing acceleration with resistive RAM (ReRAM) in-memory processing, which enables SC in memory. SCRIMP can be used for a wide range of applications. It supports all SC encodings and operations in memory. It maximizes the performance and energy efficiency of implementing SC by introducing novel in-memory parallel stochastic number generation and efficient implication-based logic in memory. To show the efficiency of our stochastic architecture, we implement image processing on the proposed hardware.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123890611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
AstroByte: Multi-FPGA Architecture for Accelerated Simulations of Spiking Astrocyte Neural Networks 星形胶质细胞神经网络加速模拟的多fpga架构
Pub Date : 2020-03-01 DOI: 10.23919/DATE48585.2020.9116312
Shvan Karim, J. Harkin, L. McDaid, B. Gardiner, Junxiu Liu
Spiking astrocyte neural networks (SANN) are a new computational paradigm that exhibit enhanced self-adapting and reliability properties. The inclusion of astrocyte behaviour increases the computational load and critically the number of connections, where each astrocyte typically communicates with up to 9 neurons (and their associated synapses) with feedback pathways from each neuron to the astrocyte. Each astrocyte cell also communicates with its neighbouring cell resulting in a significant interconnect density. The substantial level of parallelisms in SANNs lends itself to acceleration in hardware, however, the challenge in accelerating simulations of SANNs firmly resides in scalable interconnect and the ability to inject and retrieve data from the hardware. This paper presents a novel multi-FPGA acceleration architecture, AstroByte, for the speedup of SANNs. AstroByte explores Networks-on-Chip (NoC) routing mechanisms to address the challenge of communicating both spike event (neuron data) and numeric (astrocyte data) across significant interconnect pathways between astrocytes and neurons. AstroByte also exploits the NoC interconnect to inject data and retrieve runtime data from the accelerated SANN simulations. Results show that AstroByte can simulate SANN applications with speedup factors of between xl62 -xl88 over Matlab equivalent simulations.
脉冲星形胶质细胞神经网络(SANN)是一种新的计算范式,具有增强的自适应性和可靠性。星形胶质细胞的行为增加了计算负荷,关键是增加了连接的数量,其中每个星形胶质细胞通常与多达9个神经元(及其相关突触)进行通信,并通过每个神经元到星形胶质细胞的反馈通路。每个星形胶质细胞也与其相邻细胞通信,从而产生显著的互连密度。sann中的大量并行性有助于硬件的加速,然而,加速sann模拟的挑战主要在于可扩展的互连以及从硬件注入和检索数据的能力。本文提出了一种新的多fpga加速体系结构AstroByte,用于san的加速。AstroByte探索了片上网络(NoC)路由机制,以解决在星形胶质细胞和神经元之间的重要互连通路上传递峰值事件(神经元数据)和数字(星形胶质细胞数据)的挑战。AstroByte还利用NoC互连从加速的SANN模拟中注入数据和检索运行时数据。结果表明,AstroByte可以模拟SANN应用程序,与Matlab等效模拟相比,加速因子在xl62 -xl88之间。
{"title":"AstroByte: Multi-FPGA Architecture for Accelerated Simulations of Spiking Astrocyte Neural Networks","authors":"Shvan Karim, J. Harkin, L. McDaid, B. Gardiner, Junxiu Liu","doi":"10.23919/DATE48585.2020.9116312","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116312","url":null,"abstract":"Spiking astrocyte neural networks (SANN) are a new computational paradigm that exhibit enhanced self-adapting and reliability properties. The inclusion of astrocyte behaviour increases the computational load and critically the number of connections, where each astrocyte typically communicates with up to 9 neurons (and their associated synapses) with feedback pathways from each neuron to the astrocyte. Each astrocyte cell also communicates with its neighbouring cell resulting in a significant interconnect density. The substantial level of parallelisms in SANNs lends itself to acceleration in hardware, however, the challenge in accelerating simulations of SANNs firmly resides in scalable interconnect and the ability to inject and retrieve data from the hardware. This paper presents a novel multi-FPGA acceleration architecture, AstroByte, for the speedup of SANNs. AstroByte explores Networks-on-Chip (NoC) routing mechanisms to address the challenge of communicating both spike event (neuron data) and numeric (astrocyte data) across significant interconnect pathways between astrocytes and neurons. AstroByte also exploits the NoC interconnect to inject data and retrieve runtime data from the accelerated SANN simulations. Results show that AstroByte can simulate SANN applications with speedup factors of between xl62 -xl88 over Matlab equivalent simulations.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116718949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Are Cloud FPGAs Really Vulnerable to Power Analysis Attacks? 云fpga真的容易受到功率分析攻击吗?
Pub Date : 2020-03-01 DOI: 10.23919/DATE48585.2020.9116481
Ognjen Glamočanin, Louis Coulon, F. Regazzoni, Mirjana Stojilović
Recent works have demonstrated the possibility of extracting secrets from a cryptographic core running on an FPGA by means of remote power analysis attacks. To mount these attacks, an adversary implements a voltage fluctuation sensor in the FPGA logic, records the power consumption of the target cryptographic core, and recovers the secret key by running a power analysis attack on the recorded traces. Despite showing that the power analysis could also be performed without physical access to the cryptographic core, these works were mostly carried out on dedicated FPGA boards in a controlled environment, leaving open the question about the possibility to successfully mount these attacks on a real system deployed in the cloud. In this paper, we demonstrate, for the first time, a successful key recovery attack on an AES cryptographic accelerator running on an Amazon EC2 F1 instance. We collect the power traces using a delay-line based voltage drop sensor, adapted to the Xilinx Virtex Ultrascale+ architecture used on Amazon EC2 F1, where CARRY8 blocks do not have a monotonic delay increase at their outputs. Our results demonstrate that security concerns raised by multitenant FPGAs are indeed valid and that countermeasures should be put in place to mitigate them.
最近的工作已经证明了通过远程功率分析攻击从FPGA上运行的加密核心中提取秘密的可能性。为了发动这些攻击,攻击者在FPGA逻辑中实现电压波动传感器,记录目标加密核心的功耗,并通过对记录的迹线运行功率分析攻击来恢复密钥。尽管表明功率分析也可以在没有物理访问加密核心的情况下执行,但这些工作主要是在受控环境中的专用FPGA板上进行的,这留下了一个问题,即在云部署的真实系统上成功安装这些攻击的可能性。在本文中,我们首次演示了对运行在Amazon EC2 F1实例上的AES加密加速器的成功密钥恢复攻击。我们使用基于延迟线的电压降传感器收集电源走线,该传感器适用于Amazon EC2 F1上使用的Xilinx Virtex Ultrascale+架构,其中CARRY8块在其输出处没有单调延迟增加。我们的研究结果表明,多租户fpga提出的安全问题确实是有效的,应该采取对策来缓解这些问题。
{"title":"Are Cloud FPGAs Really Vulnerable to Power Analysis Attacks?","authors":"Ognjen Glamočanin, Louis Coulon, F. Regazzoni, Mirjana Stojilović","doi":"10.23919/DATE48585.2020.9116481","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116481","url":null,"abstract":"Recent works have demonstrated the possibility of extracting secrets from a cryptographic core running on an FPGA by means of remote power analysis attacks. To mount these attacks, an adversary implements a voltage fluctuation sensor in the FPGA logic, records the power consumption of the target cryptographic core, and recovers the secret key by running a power analysis attack on the recorded traces. Despite showing that the power analysis could also be performed without physical access to the cryptographic core, these works were mostly carried out on dedicated FPGA boards in a controlled environment, leaving open the question about the possibility to successfully mount these attacks on a real system deployed in the cloud. In this paper, we demonstrate, for the first time, a successful key recovery attack on an AES cryptographic accelerator running on an Amazon EC2 F1 instance. We collect the power traces using a delay-line based voltage drop sensor, adapted to the Xilinx Virtex Ultrascale+ architecture used on Amazon EC2 F1, where CARRY8 blocks do not have a monotonic delay increase at their outputs. Our results demonstrate that security concerns raised by multitenant FPGAs are indeed valid and that countermeasures should be put in place to mitigate them.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128008982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 41
Dynamic Thermal Management with Proactive Fan Speed Control Through Reinforcement Learning 通过强化学习实现主动风扇转速控制的动态热管理
Pub Date : 2020-03-01 DOI: 10.23919/DATE48585.2020.9116510
Arman Iranfar, F. Terraneo, Gabor Csordas, Marina Zapater, W. Fornaciari, David Atienza Alonso
Dynamic Thermal Management (DTM) has become a major challenge since it directly affects Multiprocessors Systems-on-chip (MPSoCs) performance, power consumption, and reliability. In this work, we propose a transient fan model, enabling adaptive fan speed control simulation for efficient DTM. Our model is validated through a thermal test chip achieving less than 2°C error in the worst case. With multiple fan speeds, however, the DTM design space grows significantly, which can ultimately make conventional solutions impractical. We address this challenge through a reinforcement learning-based solution to proactively determine the number of active cores, operating frequency, and fan speed. The proposed solution is able to reduce fan power by up to 40% compared to a DTM with constant fan speed with less than 1% performance degradation. Also, compared to a state-of-the-art DTM technique our solution improves the performance by up to 19% for the same fan power.
动态热管理(DTM)直接影响到多处理器片上系统(mpsoc)的性能、功耗和可靠性,因此已经成为一个重大挑战。在这项工作中,我们提出了一个瞬态风扇模型,实现了高效DTM的自适应风扇转速控制仿真。我们的模型通过热测试芯片进行验证,在最坏的情况下误差小于2°C。然而,随着多个风扇转速的增加,DTM的设计空间会显著增加,这最终会使传统的解决方案变得不切实际。我们通过一种基于强化学习的解决方案来应对这一挑战,该解决方案可以主动确定活动内核的数量、工作频率和风扇速度。与恒定风扇转速的DTM相比,该解决方案能够将风扇功率降低高达40%,而性能下降不到1%。此外,与最先进的DTM技术相比,我们的解决方案在相同风扇功率的情况下将性能提高了19%。
{"title":"Dynamic Thermal Management with Proactive Fan Speed Control Through Reinforcement Learning","authors":"Arman Iranfar, F. Terraneo, Gabor Csordas, Marina Zapater, W. Fornaciari, David Atienza Alonso","doi":"10.23919/DATE48585.2020.9116510","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116510","url":null,"abstract":"Dynamic Thermal Management (DTM) has become a major challenge since it directly affects Multiprocessors Systems-on-chip (MPSoCs) performance, power consumption, and reliability. In this work, we propose a transient fan model, enabling adaptive fan speed control simulation for efficient DTM. Our model is validated through a thermal test chip achieving less than 2°C error in the worst case. With multiple fan speeds, however, the DTM design space grows significantly, which can ultimately make conventional solutions impractical. We address this challenge through a reinforcement learning-based solution to proactively determine the number of active cores, operating frequency, and fan speed. The proposed solution is able to reduce fan power by up to 40% compared to a DTM with constant fan speed with less than 1% performance degradation. Also, compared to a state-of-the-art DTM technique our solution improves the performance by up to 19% for the same fan power.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133291539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
MiniDelay: Multi-Strategy Timing-Aware Layer Assignment for Advanced Technology Nodes MiniDelay:先进技术节点的多策略时间感知层分配
Pub Date : 2020-03-01 DOI: 10.23919/DATE48585.2020.9116269
Xinghai Zhang, Zhen Zhuang, Genggeng Liu, Xing Huang, Wen-Hao Liu, Wenzhong Guo, Ting-Chi Wang
Layer assignment, a major step in global routing of integrated circuits, is usually performed to assign segments of nets to multiple layers. Besides the traditional optimization goals such as overflow and via count, interconnect delay plays an important role in determining chip performance and has been attracting much attention in recent years. Accordingly, in this paper, we propose MiniDelay, a timing-aware layer assignment algorithm to minimize delay for advanced technology nodes, taking both wire congestion and coupling effect into account. MiniDelay consists of the following three key techniques: 1) a non-default-rule routing technique is adopted to reduce the delay of timing critical nets, 2) an effective congestion assessment method is proposed to optimize delay of nets and via count simultaneously, and 3) a net scalpel technique is proposed to further reduce the maximum delay of nets, so that the chip performance can be improved in a global manner. Experimental results on multiple benchmarks confirm that the proposed algorithm leads to lower delay and few vias, while achieving the best solution quality among the existing algorithms with the shortest runtime.
层分配是集成电路全局路由的一个重要步骤,通常用于将网段分配到多个层。除了溢出和过孔数等传统的优化目标外,互连延迟在决定芯片性能方面也起着重要的作用,近年来备受关注。因此,在本文中,我们提出了miniddelay,一种时间感知层分配算法,以最小化先进技术节点的延迟,同时考虑了线路拥塞和耦合效应。miniddelay由以下三个关键技术组成:1)采用非默认规则路由技术来降低定时关键网络的延迟,2)提出有效的拥塞评估方法来同时优化网络和通过计数的延迟,3)提出网络手术刀技术来进一步降低网络的最大延迟,从而从全局上提高芯片性能。在多个基准测试上的实验结果表明,该算法具有较低的延迟和较少的过孔,同时在现有算法中以最短的运行时间获得了最佳的解质量。
{"title":"MiniDelay: Multi-Strategy Timing-Aware Layer Assignment for Advanced Technology Nodes","authors":"Xinghai Zhang, Zhen Zhuang, Genggeng Liu, Xing Huang, Wen-Hao Liu, Wenzhong Guo, Ting-Chi Wang","doi":"10.23919/DATE48585.2020.9116269","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116269","url":null,"abstract":"Layer assignment, a major step in global routing of integrated circuits, is usually performed to assign segments of nets to multiple layers. Besides the traditional optimization goals such as overflow and via count, interconnect delay plays an important role in determining chip performance and has been attracting much attention in recent years. Accordingly, in this paper, we propose MiniDelay, a timing-aware layer assignment algorithm to minimize delay for advanced technology nodes, taking both wire congestion and coupling effect into account. MiniDelay consists of the following three key techniques: 1) a non-default-rule routing technique is adopted to reduce the delay of timing critical nets, 2) an effective congestion assessment method is proposed to optimize delay of nets and via count simultaneously, and 3) a net scalpel technique is proposed to further reduce the maximum delay of nets, so that the chip performance can be improved in a global manner. Experimental results on multiple benchmarks confirm that the proposed algorithm leads to lower delay and few vias, while achieving the best solution quality among the existing algorithms with the shortest runtime.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115747573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
L2L: A Highly Accurate Log_2_Lead Quantization of Pre-trained Neural Networks 一种高精度的预训练神经网络log__lead量化
Pub Date : 2020-03-01 DOI: 10.23919/DATE48585.2020.9116373
Salim Ullah, Siddharth Gupta, K. Ahuja, Aruna Tiwari, Akash Kumar
Deep Neural Networks are one of the machine learning techniques which are increasingly used in a variety of applications. However, the significantly high memory and computation demands of deep neural networks often limit their deployment on embedded systems. Many recent works have considered this problem by proposing different types of data quantization schemes. However, most of these techniques either require post-quantization retraining of deep neural networks or bear a significant loss in output accuracy. In this paper, we propose a novel quantization technique for parameters of pre-trained deep neural networks. Our technique significantly maintains the accuracy of the parameters and does not require retraining of the networks. Compared to the single-precision floating-point numbers-based implementation, our proposed 8-bit quantization technique generates only ~1% and the ~0.4%, loss in top-1 and top-5 accuracies respectively for VGG16 network using ImageNet dataset.
深度神经网络是机器学习技术的一种,在各种应用中得到越来越多的应用。然而,深度神经网络的高内存和计算需求往往限制了其在嵌入式系统上的部署。最近的许多工作都通过提出不同类型的数据量化方案来考虑这个问题。然而,这些技术要么需要对深度神经网络进行量化后的再训练,要么在输出精度上有很大的损失。本文提出了一种新的深度神经网络参数量化方法。我们的技术显著地保持了参数的准确性,并且不需要对网络进行再训练。与基于单精度浮点数的实现相比,我们提出的8位量化技术在使用ImageNet数据集的VGG16网络中,top-1和top-5精度的损失分别为~1%和~0.4%。
{"title":"L2L: A Highly Accurate Log_2_Lead Quantization of Pre-trained Neural Networks","authors":"Salim Ullah, Siddharth Gupta, K. Ahuja, Aruna Tiwari, Akash Kumar","doi":"10.23919/DATE48585.2020.9116373","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116373","url":null,"abstract":"Deep Neural Networks are one of the machine learning techniques which are increasingly used in a variety of applications. However, the significantly high memory and computation demands of deep neural networks often limit their deployment on embedded systems. Many recent works have considered this problem by proposing different types of data quantization schemes. However, most of these techniques either require post-quantization retraining of deep neural networks or bear a significant loss in output accuracy. In this paper, we propose a novel quantization technique for parameters of pre-trained deep neural networks. Our technique significantly maintains the accuracy of the parameters and does not require retraining of the networks. Compared to the single-precision floating-point numbers-based implementation, our proposed 8-bit quantization technique generates only ~1% and the ~0.4%, loss in top-1 and top-5 accuracies respectively for VGG16 network using ImageNet dataset.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"2011 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125633733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Security Enhancement for RRAM Computing System through Obfuscating Crossbar Row Connections 通过混淆横排连接增强RRAM计算系统的安全性
Pub Date : 2020-03-01 DOI: 10.23919/DATE48585.2020.9116549
Minhui Zou, Zhenhua Zhu, Yi Cai, Junlong Zhou, Chengliang Wang, Yu Wang
Neural networks (NN) have gained great success in visual object recognition and natural language processing, but this kind of data-intensive applications requires huge data movements between computing units and memory. Emerging resistive random-access memory (RRAM) computing systems have demonstrated great potential in avoiding the huge data movements by performing matrix-vector-multiplications in memory. However, the nonvolatility of the RRAM devices may lead to potential stealing of the NN weights stored in crossbars and the adversary could extract the NN models from the stolen weights. This paper proposes an effective security enhancing method for RRAM computing systems to thwart this sort of piracy attack. We first analyze the theft methods of the NN weights. Then we propose an efficient security enhancing technique based on obfuscating the row connections between positive crossbars and their pairing negative crossbars. Two heuristic techniques are also presented to optimize the hardware overhead of the obfuscation module. Compared with existing NN security work, our method eliminates the additional RRAM writing operations used for encryption/decryption, without shortening the lifetime of RRAM computing systems. The experiment results show that the proposed methods ensure the trial times of brute-force attack are more than (16!)17 and the classification accuracy of the incorrectly extracted NN models is less than 20%, with minimal area overhead.
神经网络(NN)在视觉对象识别和自然语言处理方面取得了巨大的成功,但这类数据密集型应用需要在计算单元和存储器之间进行大量的数据移动。新兴的电阻式随机存取存储器(RRAM)计算系统通过在存储器中执行矩阵向量乘法,在避免大量数据移动方面显示出巨大的潜力。然而,RRAM器件的非易失性可能导致存储在交叉条中的神经网络权值被窃取,攻击者可以从被盗的权值中提取神经网络模型。本文提出了一种有效的RRAM计算系统的安全增强方法,以抵御这类盗版攻击。首先分析了神经网络权值的窃取方法。在此基础上,提出了一种基于模糊化正横条与其配对的负横条之间的行连接的有效安全增强技术。提出了两种启发式技术来优化混淆模块的硬件开销。与现有的神经网络安全工作相比,我们的方法消除了用于加密/解密的额外RRAM写入操作,而不会缩短RRAM计算系统的使用寿命。实验结果表明,所提出的方法确保了暴力攻击的试验次数大于(16!)17次,错误提取的NN模型的分类准确率小于20%,且面积开销最小。
{"title":"Security Enhancement for RRAM Computing System through Obfuscating Crossbar Row Connections","authors":"Minhui Zou, Zhenhua Zhu, Yi Cai, Junlong Zhou, Chengliang Wang, Yu Wang","doi":"10.23919/DATE48585.2020.9116549","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116549","url":null,"abstract":"Neural networks (NN) have gained great success in visual object recognition and natural language processing, but this kind of data-intensive applications requires huge data movements between computing units and memory. Emerging resistive random-access memory (RRAM) computing systems have demonstrated great potential in avoiding the huge data movements by performing matrix-vector-multiplications in memory. However, the nonvolatility of the RRAM devices may lead to potential stealing of the NN weights stored in crossbars and the adversary could extract the NN models from the stolen weights. This paper proposes an effective security enhancing method for RRAM computing systems to thwart this sort of piracy attack. We first analyze the theft methods of the NN weights. Then we propose an efficient security enhancing technique based on obfuscating the row connections between positive crossbars and their pairing negative crossbars. Two heuristic techniques are also presented to optimize the hardware overhead of the obfuscation module. Compared with existing NN security work, our method eliminates the additional RRAM writing operations used for encryption/decryption, without shortening the lifetime of RRAM computing systems. The experiment results show that the proposed methods ensure the trial times of brute-force attack are more than (16!)17 and the classification accuracy of the incorrectly extracted NN models is less than 20%, with minimal area overhead.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125224352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
2DCC: Cache Compression in Two Dimensions 二维缓存压缩
Pub Date : 2020-03-01 DOI: 10.23919/DATE48585.2020.9116279
Amin Ghasemazar, M. Ewais, Prashant J. Nair, Mieszko Lis
The importance of caches for performance, and their high silicon area cost, have motivated hardware solutions that transparently compress the cached data to increase effective capacity without sacrificing silicon area. To this end, prior work has taken one of two approaches: either (a) deduplicating identical cache blocks across the cache to take advantage of inter-block redundancy or (b) compressing common patterns within each cache block to take advantage of intra-block redundancy.(p)(/p)In this paper, we demonstrate that leveraging only one of these redundancy types leads to a significant loss in compression opportunities for several applications: some workloads exhibit either inter-block or intra-block redundancy, while others exhibit both. We propose 2DCC (Two Dimensional Cache Compression), a simple technique that takes advantage of both types of redundancy. Across the SPEC and Parsec benchmark suites, 2DCC results in a 2.12× compression factor (geomean) compared to 1.44–1.49× for best prior techniques on an iso-silicon basis. For the cache-sensitive subset of these benchmarks run in isolation, 2DCC also achieves a 11.7% speedup (geomean).
缓存对性能的重要性以及它们的高硅面积成本促使硬件解决方案透明地压缩缓存数据,以在不牺牲硅面积的情况下增加有效容量。为此,之前的工作采取了两种方法中的一种:(a)在缓存中重复删除相同的缓存块以利用块间冗余或(b)压缩每个缓存块内的公共模式以利用块内冗余。(p)(/p)在本文中,我们证明仅利用这些冗余类型中的一种会导致几个应用程序压缩机会的重大损失:一些工作负载表现出块间冗余或块内冗余,而其他工作负载则两者都表现出来。我们提出2DCC(二维缓存压缩),这是一种利用两种冗余的简单技术。在SPEC和Parsec基准测试套件中,2DCC的压缩系数(几何系数)为2.12倍,而在等硅基础上,最佳的先前技术的压缩系数为1.44 - 1.49倍。对于隔离运行的这些基准测试的缓存敏感子集,2DCC还实现了11.7%的加速(几何)。
{"title":"2DCC: Cache Compression in Two Dimensions","authors":"Amin Ghasemazar, M. Ewais, Prashant J. Nair, Mieszko Lis","doi":"10.23919/DATE48585.2020.9116279","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116279","url":null,"abstract":"The importance of caches for performance, and their high silicon area cost, have motivated hardware solutions that transparently compress the cached data to increase effective capacity without sacrificing silicon area. To this end, prior work has taken one of two approaches: either (a) deduplicating identical cache blocks across the cache to take advantage of inter-block redundancy or (b) compressing common patterns within each cache block to take advantage of intra-block redundancy.(p)(/p)In this paper, we demonstrate that leveraging only one of these redundancy types leads to a significant loss in compression opportunities for several applications: some workloads exhibit either inter-block or intra-block redundancy, while others exhibit both. We propose 2DCC (Two Dimensional Cache Compression), a simple technique that takes advantage of both types of redundancy. Across the SPEC and Parsec benchmark suites, 2DCC results in a 2.12× compression factor (geomean) compared to 1.44–1.49× for best prior techniques on an iso-silicon basis. For the cache-sensitive subset of these benchmarks run in isolation, 2DCC also achieves a 11.7% speedup (geomean).","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125224393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1