Pub Date : 2020-03-01DOI: 10.23919/DATE48585.2020.9116207
Josef Strnadel
Many works have shown that approximate circuits may play an important role in the development of resource-efficient electronic systems. This motivates many researchers to propose new approaches for finding an optimal trade-off between the approximation error and resource savings for predefined applications of approximate circuits. The works and approaches, however, focus mainly on design aspects regarding relaxed functional requirements while neglecting further aspects such as signal and parameter dynamics/stochasticity, relaxed/non-functional equivalence, testing or formal verification. This paper aims to take a step ahead by moving towards the formal verification of time-dependent properties of systems based on approximate circuits. Firstly, it presents our approach to modeling such systems by means of stochastic timed automata whereas our approach goes beyond digital, combinational and/or synchronous circuits and is applicable in the area of sequential, analog and/or asynchronous circuits as well. Secondly, the paper shows the principle and advantage of verifying properties of modeled approximate systems by the statistical model checking technique. Finally, the paper evaluates our approach and outlines future research perspectives.
{"title":"Statistical Model Checking of Approximate Circuits: Challenges and Opportunities","authors":"Josef Strnadel","doi":"10.23919/DATE48585.2020.9116207","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116207","url":null,"abstract":"Many works have shown that approximate circuits may play an important role in the development of resource-efficient electronic systems. This motivates many researchers to propose new approaches for finding an optimal trade-off between the approximation error and resource savings for predefined applications of approximate circuits. The works and approaches, however, focus mainly on design aspects regarding relaxed functional requirements while neglecting further aspects such as signal and parameter dynamics/stochasticity, relaxed/non-functional equivalence, testing or formal verification. This paper aims to take a step ahead by moving towards the formal verification of time-dependent properties of systems based on approximate circuits. Firstly, it presents our approach to modeling such systems by means of stochastic timed automata whereas our approach goes beyond digital, combinational and/or synchronous circuits and is applicable in the area of sequential, analog and/or asynchronous circuits as well. Secondly, the paper shows the principle and advantage of verifying properties of modeled approximate systems by the statistical model checking technique. Finally, the paper evaluates our approach and outlines future research perspectives.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121228210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-03-01DOI: 10.23919/DATE48585.2020.9116500
Amin Rezaei, Yuanqi Shen, H. Zhou
The active participation of external entities in the manufacturing flow has produced numerous hardware security issues in which piracy and overproduction are likely to be the most ubiquitous and expensive ones. The main approach to prevent unauthorized products from functioning is logic encryption that inserts key-controlled gates to the original circuit in a way that the valid behavior of the circuit only happens when the correct key is applied. The challenge for the security designer is to ensure neither the correct key nor the original circuit can be revealed by different analyses of the encrypted circuit. However, in state-of-the-art logic encryption works, a lot of performance is sold to guarantee security against powerful logic and structural attacks. This contradicts the primary reason of logic encryption that is to protect a precious design from being pirated and overproduced. In this paper, we propose a bilateral logic encryption platform that maintains high degree of security with small circuit modification. The robustness against exact and approximate attacks is also demonstrated.
{"title":"Rescuing Logic Encryption in Post-SAT Era by Locking & Obfuscation","authors":"Amin Rezaei, Yuanqi Shen, H. Zhou","doi":"10.23919/DATE48585.2020.9116500","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116500","url":null,"abstract":"The active participation of external entities in the manufacturing flow has produced numerous hardware security issues in which piracy and overproduction are likely to be the most ubiquitous and expensive ones. The main approach to prevent unauthorized products from functioning is logic encryption that inserts key-controlled gates to the original circuit in a way that the valid behavior of the circuit only happens when the correct key is applied. The challenge for the security designer is to ensure neither the correct key nor the original circuit can be revealed by different analyses of the encrypted circuit. However, in state-of-the-art logic encryption works, a lot of performance is sold to guarantee security against powerful logic and structural attacks. This contradicts the primary reason of logic encryption that is to protect a precious design from being pirated and overproduced. In this paper, we propose a bilateral logic encryption platform that maintains high degree of security with small circuit modification. The robustness against exact and approximate attacks is also demonstrated.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116733707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-03-01DOI: 10.23919/DATE48585.2020.9116338
Saransh Gupta, M. Imani, Joonseop Sim, Andrew Huang, Fan Wu, M. Najafi, T. Simunic
Stochastic computing (SC) reduces the complexity of computation by representing numbers with long independent bit-streams. However, increasing performance in SC comes with increase in area and loss in accuracy. Processing in memory (PIM) with non-volatile memories (NVMs) computes data inplace, while having high memory density and supporting bitparallel operations with low energy. In this paper, we propose SCRIMP for stochastic computing acceleration with resistive RAM (ReRAM) in-memory processing, which enables SC in memory. SCRIMP can be used for a wide range of applications. It supports all SC encodings and operations in memory. It maximizes the performance and energy efficiency of implementing SC by introducing novel in-memory parallel stochastic number generation and efficient implication-based logic in memory. To show the efficiency of our stochastic architecture, we implement image processing on the proposed hardware.
{"title":"SCRIMP: A General Stochastic Computing Architecture using ReRAM in-Memory Processing","authors":"Saransh Gupta, M. Imani, Joonseop Sim, Andrew Huang, Fan Wu, M. Najafi, T. Simunic","doi":"10.23919/DATE48585.2020.9116338","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116338","url":null,"abstract":"Stochastic computing (SC) reduces the complexity of computation by representing numbers with long independent bit-streams. However, increasing performance in SC comes with increase in area and loss in accuracy. Processing in memory (PIM) with non-volatile memories (NVMs) computes data inplace, while having high memory density and supporting bitparallel operations with low energy. In this paper, we propose SCRIMP for stochastic computing acceleration with resistive RAM (ReRAM) in-memory processing, which enables SC in memory. SCRIMP can be used for a wide range of applications. It supports all SC encodings and operations in memory. It maximizes the performance and energy efficiency of implementing SC by introducing novel in-memory parallel stochastic number generation and efficient implication-based logic in memory. To show the efficiency of our stochastic architecture, we implement image processing on the proposed hardware.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123890611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-03-01DOI: 10.23919/DATE48585.2020.9116312
Shvan Karim, J. Harkin, L. McDaid, B. Gardiner, Junxiu Liu
Spiking astrocyte neural networks (SANN) are a new computational paradigm that exhibit enhanced self-adapting and reliability properties. The inclusion of astrocyte behaviour increases the computational load and critically the number of connections, where each astrocyte typically communicates with up to 9 neurons (and their associated synapses) with feedback pathways from each neuron to the astrocyte. Each astrocyte cell also communicates with its neighbouring cell resulting in a significant interconnect density. The substantial level of parallelisms in SANNs lends itself to acceleration in hardware, however, the challenge in accelerating simulations of SANNs firmly resides in scalable interconnect and the ability to inject and retrieve data from the hardware. This paper presents a novel multi-FPGA acceleration architecture, AstroByte, for the speedup of SANNs. AstroByte explores Networks-on-Chip (NoC) routing mechanisms to address the challenge of communicating both spike event (neuron data) and numeric (astrocyte data) across significant interconnect pathways between astrocytes and neurons. AstroByte also exploits the NoC interconnect to inject data and retrieve runtime data from the accelerated SANN simulations. Results show that AstroByte can simulate SANN applications with speedup factors of between xl62 -xl88 over Matlab equivalent simulations.
{"title":"AstroByte: Multi-FPGA Architecture for Accelerated Simulations of Spiking Astrocyte Neural Networks","authors":"Shvan Karim, J. Harkin, L. McDaid, B. Gardiner, Junxiu Liu","doi":"10.23919/DATE48585.2020.9116312","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116312","url":null,"abstract":"Spiking astrocyte neural networks (SANN) are a new computational paradigm that exhibit enhanced self-adapting and reliability properties. The inclusion of astrocyte behaviour increases the computational load and critically the number of connections, where each astrocyte typically communicates with up to 9 neurons (and their associated synapses) with feedback pathways from each neuron to the astrocyte. Each astrocyte cell also communicates with its neighbouring cell resulting in a significant interconnect density. The substantial level of parallelisms in SANNs lends itself to acceleration in hardware, however, the challenge in accelerating simulations of SANNs firmly resides in scalable interconnect and the ability to inject and retrieve data from the hardware. This paper presents a novel multi-FPGA acceleration architecture, AstroByte, for the speedup of SANNs. AstroByte explores Networks-on-Chip (NoC) routing mechanisms to address the challenge of communicating both spike event (neuron data) and numeric (astrocyte data) across significant interconnect pathways between astrocytes and neurons. AstroByte also exploits the NoC interconnect to inject data and retrieve runtime data from the accelerated SANN simulations. Results show that AstroByte can simulate SANN applications with speedup factors of between xl62 -xl88 over Matlab equivalent simulations.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116718949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-03-01DOI: 10.23919/DATE48585.2020.9116481
Ognjen Glamočanin, Louis Coulon, F. Regazzoni, Mirjana Stojilović
Recent works have demonstrated the possibility of extracting secrets from a cryptographic core running on an FPGA by means of remote power analysis attacks. To mount these attacks, an adversary implements a voltage fluctuation sensor in the FPGA logic, records the power consumption of the target cryptographic core, and recovers the secret key by running a power analysis attack on the recorded traces. Despite showing that the power analysis could also be performed without physical access to the cryptographic core, these works were mostly carried out on dedicated FPGA boards in a controlled environment, leaving open the question about the possibility to successfully mount these attacks on a real system deployed in the cloud. In this paper, we demonstrate, for the first time, a successful key recovery attack on an AES cryptographic accelerator running on an Amazon EC2 F1 instance. We collect the power traces using a delay-line based voltage drop sensor, adapted to the Xilinx Virtex Ultrascale+ architecture used on Amazon EC2 F1, where CARRY8 blocks do not have a monotonic delay increase at their outputs. Our results demonstrate that security concerns raised by multitenant FPGAs are indeed valid and that countermeasures should be put in place to mitigate them.
{"title":"Are Cloud FPGAs Really Vulnerable to Power Analysis Attacks?","authors":"Ognjen Glamočanin, Louis Coulon, F. Regazzoni, Mirjana Stojilović","doi":"10.23919/DATE48585.2020.9116481","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116481","url":null,"abstract":"Recent works have demonstrated the possibility of extracting secrets from a cryptographic core running on an FPGA by means of remote power analysis attacks. To mount these attacks, an adversary implements a voltage fluctuation sensor in the FPGA logic, records the power consumption of the target cryptographic core, and recovers the secret key by running a power analysis attack on the recorded traces. Despite showing that the power analysis could also be performed without physical access to the cryptographic core, these works were mostly carried out on dedicated FPGA boards in a controlled environment, leaving open the question about the possibility to successfully mount these attacks on a real system deployed in the cloud. In this paper, we demonstrate, for the first time, a successful key recovery attack on an AES cryptographic accelerator running on an Amazon EC2 F1 instance. We collect the power traces using a delay-line based voltage drop sensor, adapted to the Xilinx Virtex Ultrascale+ architecture used on Amazon EC2 F1, where CARRY8 blocks do not have a monotonic delay increase at their outputs. Our results demonstrate that security concerns raised by multitenant FPGAs are indeed valid and that countermeasures should be put in place to mitigate them.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128008982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-03-01DOI: 10.23919/DATE48585.2020.9116510
Arman Iranfar, F. Terraneo, Gabor Csordas, Marina Zapater, W. Fornaciari, David Atienza Alonso
Dynamic Thermal Management (DTM) has become a major challenge since it directly affects Multiprocessors Systems-on-chip (MPSoCs) performance, power consumption, and reliability. In this work, we propose a transient fan model, enabling adaptive fan speed control simulation for efficient DTM. Our model is validated through a thermal test chip achieving less than 2°C error in the worst case. With multiple fan speeds, however, the DTM design space grows significantly, which can ultimately make conventional solutions impractical. We address this challenge through a reinforcement learning-based solution to proactively determine the number of active cores, operating frequency, and fan speed. The proposed solution is able to reduce fan power by up to 40% compared to a DTM with constant fan speed with less than 1% performance degradation. Also, compared to a state-of-the-art DTM technique our solution improves the performance by up to 19% for the same fan power.
{"title":"Dynamic Thermal Management with Proactive Fan Speed Control Through Reinforcement Learning","authors":"Arman Iranfar, F. Terraneo, Gabor Csordas, Marina Zapater, W. Fornaciari, David Atienza Alonso","doi":"10.23919/DATE48585.2020.9116510","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116510","url":null,"abstract":"Dynamic Thermal Management (DTM) has become a major challenge since it directly affects Multiprocessors Systems-on-chip (MPSoCs) performance, power consumption, and reliability. In this work, we propose a transient fan model, enabling adaptive fan speed control simulation for efficient DTM. Our model is validated through a thermal test chip achieving less than 2°C error in the worst case. With multiple fan speeds, however, the DTM design space grows significantly, which can ultimately make conventional solutions impractical. We address this challenge through a reinforcement learning-based solution to proactively determine the number of active cores, operating frequency, and fan speed. The proposed solution is able to reduce fan power by up to 40% compared to a DTM with constant fan speed with less than 1% performance degradation. Also, compared to a state-of-the-art DTM technique our solution improves the performance by up to 19% for the same fan power.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133291539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Layer assignment, a major step in global routing of integrated circuits, is usually performed to assign segments of nets to multiple layers. Besides the traditional optimization goals such as overflow and via count, interconnect delay plays an important role in determining chip performance and has been attracting much attention in recent years. Accordingly, in this paper, we propose MiniDelay, a timing-aware layer assignment algorithm to minimize delay for advanced technology nodes, taking both wire congestion and coupling effect into account. MiniDelay consists of the following three key techniques: 1) a non-default-rule routing technique is adopted to reduce the delay of timing critical nets, 2) an effective congestion assessment method is proposed to optimize delay of nets and via count simultaneously, and 3) a net scalpel technique is proposed to further reduce the maximum delay of nets, so that the chip performance can be improved in a global manner. Experimental results on multiple benchmarks confirm that the proposed algorithm leads to lower delay and few vias, while achieving the best solution quality among the existing algorithms with the shortest runtime.
{"title":"MiniDelay: Multi-Strategy Timing-Aware Layer Assignment for Advanced Technology Nodes","authors":"Xinghai Zhang, Zhen Zhuang, Genggeng Liu, Xing Huang, Wen-Hao Liu, Wenzhong Guo, Ting-Chi Wang","doi":"10.23919/DATE48585.2020.9116269","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116269","url":null,"abstract":"Layer assignment, a major step in global routing of integrated circuits, is usually performed to assign segments of nets to multiple layers. Besides the traditional optimization goals such as overflow and via count, interconnect delay plays an important role in determining chip performance and has been attracting much attention in recent years. Accordingly, in this paper, we propose MiniDelay, a timing-aware layer assignment algorithm to minimize delay for advanced technology nodes, taking both wire congestion and coupling effect into account. MiniDelay consists of the following three key techniques: 1) a non-default-rule routing technique is adopted to reduce the delay of timing critical nets, 2) an effective congestion assessment method is proposed to optimize delay of nets and via count simultaneously, and 3) a net scalpel technique is proposed to further reduce the maximum delay of nets, so that the chip performance can be improved in a global manner. Experimental results on multiple benchmarks confirm that the proposed algorithm leads to lower delay and few vias, while achieving the best solution quality among the existing algorithms with the shortest runtime.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115747573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-03-01DOI: 10.23919/DATE48585.2020.9116373
Salim Ullah, Siddharth Gupta, K. Ahuja, Aruna Tiwari, Akash Kumar
Deep Neural Networks are one of the machine learning techniques which are increasingly used in a variety of applications. However, the significantly high memory and computation demands of deep neural networks often limit their deployment on embedded systems. Many recent works have considered this problem by proposing different types of data quantization schemes. However, most of these techniques either require post-quantization retraining of deep neural networks or bear a significant loss in output accuracy. In this paper, we propose a novel quantization technique for parameters of pre-trained deep neural networks. Our technique significantly maintains the accuracy of the parameters and does not require retraining of the networks. Compared to the single-precision floating-point numbers-based implementation, our proposed 8-bit quantization technique generates only ~1% and the ~0.4%, loss in top-1 and top-5 accuracies respectively for VGG16 network using ImageNet dataset.
{"title":"L2L: A Highly Accurate Log_2_Lead Quantization of Pre-trained Neural Networks","authors":"Salim Ullah, Siddharth Gupta, K. Ahuja, Aruna Tiwari, Akash Kumar","doi":"10.23919/DATE48585.2020.9116373","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116373","url":null,"abstract":"Deep Neural Networks are one of the machine learning techniques which are increasingly used in a variety of applications. However, the significantly high memory and computation demands of deep neural networks often limit their deployment on embedded systems. Many recent works have considered this problem by proposing different types of data quantization schemes. However, most of these techniques either require post-quantization retraining of deep neural networks or bear a significant loss in output accuracy. In this paper, we propose a novel quantization technique for parameters of pre-trained deep neural networks. Our technique significantly maintains the accuracy of the parameters and does not require retraining of the networks. Compared to the single-precision floating-point numbers-based implementation, our proposed 8-bit quantization technique generates only ~1% and the ~0.4%, loss in top-1 and top-5 accuracies respectively for VGG16 network using ImageNet dataset.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"2011 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125633733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-03-01DOI: 10.23919/DATE48585.2020.9116549
Minhui Zou, Zhenhua Zhu, Yi Cai, Junlong Zhou, Chengliang Wang, Yu Wang
Neural networks (NN) have gained great success in visual object recognition and natural language processing, but this kind of data-intensive applications requires huge data movements between computing units and memory. Emerging resistive random-access memory (RRAM) computing systems have demonstrated great potential in avoiding the huge data movements by performing matrix-vector-multiplications in memory. However, the nonvolatility of the RRAM devices may lead to potential stealing of the NN weights stored in crossbars and the adversary could extract the NN models from the stolen weights. This paper proposes an effective security enhancing method for RRAM computing systems to thwart this sort of piracy attack. We first analyze the theft methods of the NN weights. Then we propose an efficient security enhancing technique based on obfuscating the row connections between positive crossbars and their pairing negative crossbars. Two heuristic techniques are also presented to optimize the hardware overhead of the obfuscation module. Compared with existing NN security work, our method eliminates the additional RRAM writing operations used for encryption/decryption, without shortening the lifetime of RRAM computing systems. The experiment results show that the proposed methods ensure the trial times of brute-force attack are more than (16!)17 and the classification accuracy of the incorrectly extracted NN models is less than 20%, with minimal area overhead.
{"title":"Security Enhancement for RRAM Computing System through Obfuscating Crossbar Row Connections","authors":"Minhui Zou, Zhenhua Zhu, Yi Cai, Junlong Zhou, Chengliang Wang, Yu Wang","doi":"10.23919/DATE48585.2020.9116549","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116549","url":null,"abstract":"Neural networks (NN) have gained great success in visual object recognition and natural language processing, but this kind of data-intensive applications requires huge data movements between computing units and memory. Emerging resistive random-access memory (RRAM) computing systems have demonstrated great potential in avoiding the huge data movements by performing matrix-vector-multiplications in memory. However, the nonvolatility of the RRAM devices may lead to potential stealing of the NN weights stored in crossbars and the adversary could extract the NN models from the stolen weights. This paper proposes an effective security enhancing method for RRAM computing systems to thwart this sort of piracy attack. We first analyze the theft methods of the NN weights. Then we propose an efficient security enhancing technique based on obfuscating the row connections between positive crossbars and their pairing negative crossbars. Two heuristic techniques are also presented to optimize the hardware overhead of the obfuscation module. Compared with existing NN security work, our method eliminates the additional RRAM writing operations used for encryption/decryption, without shortening the lifetime of RRAM computing systems. The experiment results show that the proposed methods ensure the trial times of brute-force attack are more than (16!)17 and the classification accuracy of the incorrectly extracted NN models is less than 20%, with minimal area overhead.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125224352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-03-01DOI: 10.23919/DATE48585.2020.9116279
Amin Ghasemazar, M. Ewais, Prashant J. Nair, Mieszko Lis
The importance of caches for performance, and their high silicon area cost, have motivated hardware solutions that transparently compress the cached data to increase effective capacity without sacrificing silicon area. To this end, prior work has taken one of two approaches: either (a) deduplicating identical cache blocks across the cache to take advantage of inter-block redundancy or (b) compressing common patterns within each cache block to take advantage of intra-block redundancy.(p)(/p)In this paper, we demonstrate that leveraging only one of these redundancy types leads to a significant loss in compression opportunities for several applications: some workloads exhibit either inter-block or intra-block redundancy, while others exhibit both. We propose 2DCC (Two Dimensional Cache Compression), a simple technique that takes advantage of both types of redundancy. Across the SPEC and Parsec benchmark suites, 2DCC results in a 2.12× compression factor (geomean) compared to 1.44–1.49× for best prior techniques on an iso-silicon basis. For the cache-sensitive subset of these benchmarks run in isolation, 2DCC also achieves a 11.7% speedup (geomean).
{"title":"2DCC: Cache Compression in Two Dimensions","authors":"Amin Ghasemazar, M. Ewais, Prashant J. Nair, Mieszko Lis","doi":"10.23919/DATE48585.2020.9116279","DOIUrl":"https://doi.org/10.23919/DATE48585.2020.9116279","url":null,"abstract":"The importance of caches for performance, and their high silicon area cost, have motivated hardware solutions that transparently compress the cached data to increase effective capacity without sacrificing silicon area. To this end, prior work has taken one of two approaches: either (a) deduplicating identical cache blocks across the cache to take advantage of inter-block redundancy or (b) compressing common patterns within each cache block to take advantage of intra-block redundancy.(p)(/p)In this paper, we demonstrate that leveraging only one of these redundancy types leads to a significant loss in compression opportunities for several applications: some workloads exhibit either inter-block or intra-block redundancy, while others exhibit both. We propose 2DCC (Two Dimensional Cache Compression), a simple technique that takes advantage of both types of redundancy. Across the SPEC and Parsec benchmark suites, 2DCC results in a 2.12× compression factor (geomean) compared to 1.44–1.49× for best prior techniques on an iso-silicon basis. For the cache-sensitive subset of these benchmarks run in isolation, 2DCC also achieves a 11.7% speedup (geomean).","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125224393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}