Statistical side channel fingerprinting is a popular hardware Trojan detection method, wherein a parametric signature of a chip is collected and compared to a trusted region in a multi-dimensional space. This trusted region is statistically established so that, despite the uncertainty incurred by process variations, the fingerprint of Trojan-free chips is expected to fall within this region while the fingerprint of Trojan-infested chips is expected to fall outside. Learning this trusted region, however, assumes availability of a small set of trusted (i.e. “golden”) chips. Herein, we rescind this assumption and we demonstrate that an almost equally effective trusted region can be learned through a combination of a trusted simulation model, measurements from process control monitors (PCMs) which are typically present either on die or on wafer kerf, and advanced statistical tail modeling techniques. Effectiveness of this method is evaluated using silicon measurements from two hardware Trojan-infested versions of a wireless cryptographic integrated circuit.
{"title":"Hardware Trojan detection through golden chip-free statistical side-channel fingerprinting","authors":"Yu Liu, K. Huang, Y. Makris","doi":"10.1145/2593069.2593147","DOIUrl":"https://doi.org/10.1145/2593069.2593147","url":null,"abstract":"Statistical side channel fingerprinting is a popular hardware Trojan detection method, wherein a parametric signature of a chip is collected and compared to a trusted region in a multi-dimensional space. This trusted region is statistically established so that, despite the uncertainty incurred by process variations, the fingerprint of Trojan-free chips is expected to fall within this region while the fingerprint of Trojan-infested chips is expected to fall outside. Learning this trusted region, however, assumes availability of a small set of trusted (i.e. “golden”) chips. Herein, we rescind this assumption and we demonstrate that an almost equally effective trusted region can be learned through a combination of a trusted simulation model, measurements from process control monitors (PCMs) which are typically present either on die or on wafer kerf, and advanced statistical tail modeling techniques. Effectiveness of this method is evaluated using silicon measurements from two hardware Trojan-infested versions of a wireless cryptographic integrated circuit.","PeriodicalId":433816,"journal":{"name":"2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131133761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The ability to generate secure random numbers is fundamental to the security of cryptographic protocols. Random Number Generators (RNGs) start to appear in recent modern Intel CPUs as used in desktops and servers. Solutions for embedded devices, such as e.g. sensor nodes and wireless routers, are still severely lacking however. In this paper we present the implementation of a secure pseudo-random number generator (PRNG) for the ARM Cortex-M microcontroller family, one of the most popular embedded platforms at this moment. For compactness and compatibility reasons, our implementation is software only. It uses the start-up values of on-chip SRAM as random seed and uses the KECCAK hash function for both entropy extraction as well as pseudo-random number generation. Getting KECCAK very compact in terms of memory requirements is therefore essential. KECCAK is a tunable algorithm: in this paper we discuss the minimum security requirements and the storage costs as a function of the KECCAK variant. The KECCAK permutation of our choice, KECCAK-f[200], is implemented in only 400 bytes. To the best of our knowledge, this is the smallest KECCAK implementation published so far. With the addition of initialization, hashing, padding and output generation functions, our complete solution fits within 496 bytes of ROM and requires 52 bytes of RAM. One byte of pseudo-random data, with a security level of at least 128 bits, can be generated in 3337 cyles on an ARM CortexM3/4, i.e. 50 KiB/s on a development board, plenty fast for a cryptographic PRNG in an embedded setting.
{"title":"Software only, extremely compact, Keccak-based secure PRNG on ARM Cortex-M","authors":"A. V. Herrewege, I. Verbauwhede","doi":"10.1145/2593069.2593218","DOIUrl":"https://doi.org/10.1145/2593069.2593218","url":null,"abstract":"The ability to generate secure random numbers is fundamental to the security of cryptographic protocols. Random Number Generators (RNGs) start to appear in recent modern Intel CPUs as used in desktops and servers. Solutions for embedded devices, such as e.g. sensor nodes and wireless routers, are still severely lacking however. In this paper we present the implementation of a secure pseudo-random number generator (PRNG) for the ARM Cortex-M microcontroller family, one of the most popular embedded platforms at this moment. For compactness and compatibility reasons, our implementation is software only. It uses the start-up values of on-chip SRAM as random seed and uses the KECCAK hash function for both entropy extraction as well as pseudo-random number generation. Getting KECCAK very compact in terms of memory requirements is therefore essential. KECCAK is a tunable algorithm: in this paper we discuss the minimum security requirements and the storage costs as a function of the KECCAK variant. The KECCAK permutation of our choice, KECCAK-f[200], is implemented in only 400 bytes. To the best of our knowledge, this is the smallest KECCAK implementation published so far. With the addition of initialization, hashing, padding and output generation functions, our complete solution fits within 496 bytes of ROM and requires 52 bytes of RAM. One byte of pseudo-random data, with a security level of at least 128 bits, can be generated in 3337 cyles on an ARM CortexM3/4, i.e. 50 KiB/s on a development board, plenty fast for a cryptographic PRNG in an embedded setting.","PeriodicalId":433816,"journal":{"name":"2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132995483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
While active studies have been conducted to reduce the power consumption of display-related components of mobile devices, previous work has rarely approached the issues without having to deteriorate graphical quality. In this paper, we propose an effective scheme to reduce display energy consumption without compromising user experience. We first define a metric called the content rate from which an appropriate refresh rate is determined for displaying content. The proposed system then sets an optimal refresh rate based on the content rate. Extensive experiments demonstrate that our system effectively reduces the total power in commercial smartphones, yet the display quality is satisfactorily maintained.
{"title":"Content-centric display energy management for mobile devices","authors":"Dongwon Kim, Nohyun Jung, H. Cha","doi":"10.1145/2593069.2593113","DOIUrl":"https://doi.org/10.1145/2593069.2593113","url":null,"abstract":"While active studies have been conducted to reduce the power consumption of display-related components of mobile devices, previous work has rarely approached the issues without having to deteriorate graphical quality. In this paper, we propose an effective scheme to reduce display energy consumption without compromising user experience. We first define a metric called the content rate from which an appropriate refresh rate is determined for displaying content. The proposed system then sets an optimal refresh rate based on the content rate. Extensive experiments demonstrate that our system effectively reduces the total power in commercial smartphones, yet the display quality is satisfactorily maintained.","PeriodicalId":433816,"journal":{"name":"2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132645763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Circuit camouflage technologies can be integrated into standard logic cell developments using traditional CAD tools. Camouflaged logic cells are integrated into a typical design flow using standard front end and back end models. Camouflaged logic cells obfuscate a circuit's function by introducing subtle cell design changes at the GDS level. The logic function of a camouflaged logic cell is extremely difficult to determine through silicon imaging analysis preventing netlist extraction, clones and counterfeits. The application of circuit camouflage as part of a customer's design flow can protect hardware IP from reverse engineering. Camouflage fill techniques further inhibit Trojan circuit insertion by completely filling the design with realistic circuitry that does not affect the primary design function. All unused silicon appears to be functional circuitry, so an attacker cannot find space to insert a Trojan circuit. The integration of circuit camouflage techniques is compatible with standard chip design flows and EDA tools, and ICs using such techniques have been successfully employed in high-attack commercial and government segments. Protected under issued and pending patents.
{"title":"Circuit camouflage integration for hardware IP protection","authors":"Ron Cocchi, J. Baukus, Lap-Wai Chow, B. Wang","doi":"10.1145/2593069.2602554","DOIUrl":"https://doi.org/10.1145/2593069.2602554","url":null,"abstract":"Circuit camouflage technologies can be integrated into standard logic cell developments using traditional CAD tools. Camouflaged logic cells are integrated into a typical design flow using standard front end and back end models. Camouflaged logic cells obfuscate a circuit's function by introducing subtle cell design changes at the GDS level. The logic function of a camouflaged logic cell is extremely difficult to determine through silicon imaging analysis preventing netlist extraction, clones and counterfeits. The application of circuit camouflage as part of a customer's design flow can protect hardware IP from reverse engineering. Camouflage fill techniques further inhibit Trojan circuit insertion by completely filling the design with realistic circuitry that does not affect the primary design function. All unused silicon appears to be functional circuitry, so an attacker cannot find space to insert a Trojan circuit. The integration of circuit camouflage techniques is compatible with standard chip design flows and EDA tools, and ICs using such techniques have been successfully employed in high-attack commercial and government segments. Protected under issued and pending patents.","PeriodicalId":433816,"journal":{"name":"2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134605170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Metal-configurable gate-array spare cells, which have versatile functionality, are developed to overcome the inflexibility of standard spare cells used in conventional metal-only engineering change order (ECO). In this paper, we focus on functional ECO optimization using the new type of spare cells to fully exploit its strength. We observe that this functional ECO problem has the nature of dynamic logical and physical costs for selecting spare gate arrays. Unlike existing functional ECO works, which perform technology mapping based on ECO patches, we perform reverse mapping from spare gate arrays to handle these dynamic costs. We devise a spare array relation graph to record geometrical adjacency among spare gate arrays and interleave with the and-inverter network of ECO patches. To avoid redundant traversal and monitor the dynamic costs, we adopt A* search to simultaneously traverse and map between the logical ECO network and the physical spare array relation graph.
{"title":"Functional ECO using metal-configurable gate-array spare cells","authors":"Hua-Yu Chang, I. Jiang, Yao-Wen Chang","doi":"10.1145/2593069.2593145","DOIUrl":"https://doi.org/10.1145/2593069.2593145","url":null,"abstract":"Metal-configurable gate-array spare cells, which have versatile functionality, are developed to overcome the inflexibility of standard spare cells used in conventional metal-only engineering change order (ECO). In this paper, we focus on functional ECO optimization using the new type of spare cells to fully exploit its strength. We observe that this functional ECO problem has the nature of dynamic logical and physical costs for selecting spare gate arrays. Unlike existing functional ECO works, which perform technology mapping based on ECO patches, we perform reverse mapping from spare gate arrays to handle these dynamic costs. We devise a spare array relation graph to record geometrical adjacency among spare gate arrays and interleave with the and-inverter network of ECO patches. To avoid redundant traversal and monitor the dynamic costs, we adopt A* search to simultaneously traverse and map between the logical ECO network and the physical spare array relation graph.","PeriodicalId":433816,"journal":{"name":"2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131652430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With current memory scalability challenges, Phase Change Memory (PCM) is viewed as an attractive replacement to DRAM. The preliminary concern for PCM applicability is its limited write endurance that is highly affected by process variation in nanometer regime. This increases the variation in cell lifetime resulting in early and sudden reduction in main memory capacity due to wear-out of few cells. When some memory pages reach their endurance limits, other pages may be far from their limits even when using a perfect wear-leveling. Recent studies have proposed redirection or correction schemes to alleviate this problem, but all suffer from poor throughput or latency. On contrary, we present On-Demand Page Paired PCM (OD3P), a technique that mitigates the problem of fast failure of pages by redirecting them onto other healthy pages, leading to gradual capacity degradation. Compared to a state-of-the-art error correction scheme for PCM, our experiments indicated that OD3P can improve PCM time-to-failure and system performance (IPC) by 12% and 14%, respectively, under multi-threaded and multi-programmed workloads.
{"title":"OD3P: On-Demand Page Paired PCM","authors":"Marjan Asadinia, M. Arjomand, H. Sarbazi-Azad","doi":"10.1145/2593069.2593166","DOIUrl":"https://doi.org/10.1145/2593069.2593166","url":null,"abstract":"With current memory scalability challenges, Phase Change Memory (PCM) is viewed as an attractive replacement to DRAM. The preliminary concern for PCM applicability is its limited write endurance that is highly affected by process variation in nanometer regime. This increases the variation in cell lifetime resulting in early and sudden reduction in main memory capacity due to wear-out of few cells. When some memory pages reach their endurance limits, other pages may be far from their limits even when using a perfect wear-leveling. Recent studies have proposed redirection or correction schemes to alleviate this problem, but all suffer from poor throughput or latency. On contrary, we present On-Demand Page Paired PCM (OD3P), a technique that mitigates the problem of fast failure of pages by redirecting them onto other healthy pages, leading to gradual capacity degradation. Compared to a state-of-the-art error correction scheme for PCM, our experiments indicated that OD3P can improve PCM time-to-failure and system performance (IPC) by 12% and 14%, respectively, under multi-threaded and multi-programmed workloads.","PeriodicalId":433816,"journal":{"name":"2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114316298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Multi-level Cell Spin-Transfer Torque Random AccessMemory (MLC STT-RAM) is a promising nonvolatile memory technology for high-capacity and high-performance applications. However, the reliability concerns and the complicated access mechanism greatly hinder the application of MLC STT-RAM. In this work, we develop a holistic solution set, namely, state-restrict MLC STT-RAM (SR-MLC STT-RAM) to improve the data integrity and performance of MLC STT-RAM with the minimized information density degradation. Three techniques: state restriction (StatRes), error pattern removal (ErrPR), and ternary coding (TerCode) are proposed at circuit level to reduce the read and write errors of MLC STT-RAMcells. State pre-recovery (PreREC) technique is also developed at architecture level to improve the access performance of SR-MLC STT-RAM by eliminating unnecessary two-step write operations. Our simulations show that compared to conventional MLC STT-RAM, SR-MLC STT-RAM can enhance the write and read reliability of memory cells by 10 - 10000×, allowing the application of simple error correction code schemes. Compared to single-level-cell (SLC) STT-RAM, SR-MLC STT-RAM based cache design can boost the system performance by 6.2% on average by leveraging the increased cache capacity at the same area and the improved write latency.
{"title":"State-restrict MLC STT-RAM designs for high-reliable high-performance memory system","authors":"Wujie Wen, Yaojun Zhang, Mengjie Mao, Yiran Chen","doi":"10.1145/2593069.2593220","DOIUrl":"https://doi.org/10.1145/2593069.2593220","url":null,"abstract":"Multi-level Cell Spin-Transfer Torque Random AccessMemory (MLC STT-RAM) is a promising nonvolatile memory technology for high-capacity and high-performance applications. However, the reliability concerns and the complicated access mechanism greatly hinder the application of MLC STT-RAM. In this work, we develop a holistic solution set, namely, state-restrict MLC STT-RAM (SR-MLC STT-RAM) to improve the data integrity and performance of MLC STT-RAM with the minimized information density degradation. Three techniques: state restriction (StatRes), error pattern removal (ErrPR), and ternary coding (TerCode) are proposed at circuit level to reduce the read and write errors of MLC STT-RAMcells. State pre-recovery (PreREC) technique is also developed at architecture level to improve the access performance of SR-MLC STT-RAM by eliminating unnecessary two-step write operations. Our simulations show that compared to conventional MLC STT-RAM, SR-MLC STT-RAM can enhance the write and read reliability of memory cells by 10 - 10000×, allowing the application of simple error correction code schemes. Compared to single-level-cell (SLC) STT-RAM, SR-MLC STT-RAM based cache design can boost the system performance by 6.2% on average by leveraging the increased cache capacity at the same area and the improved write latency.","PeriodicalId":433816,"journal":{"name":"2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114304069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Memory speed has become a major performance bottleneck as more and more cores are integrated on a multi-core chip. The widening latency gap between high speed cores and memory has led to the evolution of multi-level SRAM/DRAM cache hierarchies that exploit the latency benefits of smaller caches (e.g. private L1 and L2 SRAM caches) and the capacity benefits of larger caches (e.g. shared L3 SRAM and shared L4 DRAM cache). The main problem of employing large L3/L4 caches is their high tag lookup latency. To solve this problem, we introduce the novel concept of small and low latency SRAM/DRAM Tag-Cache structures that can quickly determine whether an access to the large L3/L4 caches will be a hit or a miss. The performance of the proposed Tag-Cache architecture depends upon the Tag-Cache hit rate and to improve it we propose a novel Tag-Cache insertion policy and a DRAM row buffer mapping policy that reduce the latency of memory requests. For a 16-core system, this improves the average harmonic mean instruction per cycle throughput of latency sensitive applications by 13.3% compared to state-of-the-art.
{"title":"Reducing latency in an SRAM/DRAM cache hierarchy via a novel Tag-Cache architecture","authors":"F. Hameed, L. Bauer, J. Henkel","doi":"10.1145/2593069.2593197","DOIUrl":"https://doi.org/10.1145/2593069.2593197","url":null,"abstract":"Memory speed has become a major performance bottleneck as more and more cores are integrated on a multi-core chip. The widening latency gap between high speed cores and memory has led to the evolution of multi-level SRAM/DRAM cache hierarchies that exploit the latency benefits of smaller caches (e.g. private L1 and L2 SRAM caches) and the capacity benefits of larger caches (e.g. shared L3 SRAM and shared L4 DRAM cache). The main problem of employing large L3/L4 caches is their high tag lookup latency. To solve this problem, we introduce the novel concept of small and low latency SRAM/DRAM Tag-Cache structures that can quickly determine whether an access to the large L3/L4 caches will be a hit or a miss. The performance of the proposed Tag-Cache architecture depends upon the Tag-Cache hit rate and to improve it we propose a novel Tag-Cache insertion policy and a DRAM row buffer mapping policy that reduce the latency of memory requests. For a 16-core system, this improves the average harmonic mean instruction per cycle throughput of latency sensitive applications by 13.3% compared to state-of-the-art.","PeriodicalId":433816,"journal":{"name":"2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114824991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Due to the resolution limitations of optical lithography equipment, 1D gridded layout design is gaining steam. Self-aligned double patterning (SADP) is a mature technology for printing 1D layouts. However, for 20nm and beyond, SADP using a single trim mask becomes insufficient for printing all 1D layouts. A viable solution is to complement SADP with e-beam lithography. In this paper, in order to increase the throughput of printing a 1D layout, we consider the problem of e-beam shot count minimization subject to bounded line end extension constraints. Two different approaches of utilizing the trim mask and e-beam to print a layout are considered. The first approach is under the assumption that the trim mask and e-beam are used for end cutting. The second is under the assumption that the trim mask and e-beam are used to rid of all unnecessary portions. We propose elegant ILP formulations for both approaches. Experimental results show that both ILP formulations can be solved efficiently. The pros and cons of the two approaches for manufacturing 1D layout are discussed.
{"title":"Throughput optimization for SADP and e-beam based manufacturing of 1D layout","authors":"Yixiao Ding, C. Chu, Wai-Kei Mak","doi":"10.1145/2593069.2593233","DOIUrl":"https://doi.org/10.1145/2593069.2593233","url":null,"abstract":"Due to the resolution limitations of optical lithography equipment, 1D gridded layout design is gaining steam. Self-aligned double patterning (SADP) is a mature technology for printing 1D layouts. However, for 20nm and beyond, SADP using a single trim mask becomes insufficient for printing all 1D layouts. A viable solution is to complement SADP with e-beam lithography. In this paper, in order to increase the throughput of printing a 1D layout, we consider the problem of e-beam shot count minimization subject to bounded line end extension constraints. Two different approaches of utilizing the trim mask and e-beam to print a layout are considered. The first approach is under the assumption that the trim mask and e-beam are used for end cutting. The second is under the assumption that the trim mask and e-beam are used to rid of all unnecessary portions. We propose elegant ILP formulations for both approaches. Experimental results show that both ILP formulations can be solved efficiently. The pros and cons of the two approaches for manufacturing 1D layout are discussed.","PeriodicalId":433816,"journal":{"name":"2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121898565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Non-volatile memory devices such as phase change memories and memristors are promising alternatives to SRAM and DRAM main memories as they provide higher density and improved energy efficiency. However, non-volatile main memories (NVMM) introduce security vulnerabilities. Sensitive data such as passwords and keys residing in the NVMM will persist and can be probed after power down. We propose sneak-path encryption (SPE), for memristor-based NVMM. SPE exploits the physical parameters, multi-level cell (MLC) capability and the sneak paths in cross-bar memories to encrypt the data stored in memristor-based NVMM. We investigate three attacks on NVMMs and show the resilience of SPE against them. We use a cycle accurate simulator to evaluate the security and performance impact of SPE based NVMM. SPE can secure the NVMM with a latency of 16 cycles and ~1.5% performance overhead.
{"title":"Secure memristor-based main memory","authors":"Sachhidh Kannan, Naghmeh Karimi, O. Sinanoglu","doi":"10.1145/2593069.2593212","DOIUrl":"https://doi.org/10.1145/2593069.2593212","url":null,"abstract":"Non-volatile memory devices such as phase change memories and memristors are promising alternatives to SRAM and DRAM main memories as they provide higher density and improved energy efficiency. However, non-volatile main memories (NVMM) introduce security vulnerabilities. Sensitive data such as passwords and keys residing in the NVMM will persist and can be probed after power down. We propose sneak-path encryption (SPE), for memristor-based NVMM. SPE exploits the physical parameters, multi-level cell (MLC) capability and the sneak paths in cross-bar memories to encrypt the data stored in memristor-based NVMM. We investigate three attacks on NVMMs and show the resilience of SPE against them. We use a cycle accurate simulator to evaluate the security and performance impact of SPE based NVMM. SPE can secure the NVMM with a latency of 16 cycles and ~1.5% performance overhead.","PeriodicalId":433816,"journal":{"name":"2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122151529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}