Pub Date : 2012-09-30DOI: 10.1109/ICCD.2012.6378642
Claus Braun, S. Holst, H. Wunderlich, Juan Manuel Castillo-Sanchez, J. Gross
Markov-Chain Monte-Carlo (MCMC) methods are an important class of simulation techniques, which execute a sequence of simulation steps, where each new step depends on the previous ones. Due to this fundamental dependency, MCMC methods are inherently hard to parallelize on any architecture. The upcoming generations of hybrid CPU/GPGPU architectures with their multi-core CPUs and tightly coupled many-core GPGPUs provide new acceleration opportunities especially for MCMC methods, if the new degrees of freedom are exploited correctly. In this paper, the outcomes of an interdisciplinary collaboration are presented, which focused on the parallel mapping of a MCMC molecular simulation from thermodynamics to hybrid CPU/GPGPU computing systems. While the mapping is designed for upcoming hybrid architectures, the implementation of this approach on an NVIDIA Tesla system already leads to a substantial speedup of more than 87× despite the additional communication overheads.
{"title":"Acceleration of Monte-Carlo molecular simulations on hybrid computing architectures","authors":"Claus Braun, S. Holst, H. Wunderlich, Juan Manuel Castillo-Sanchez, J. Gross","doi":"10.1109/ICCD.2012.6378642","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378642","url":null,"abstract":"Markov-Chain Monte-Carlo (MCMC) methods are an important class of simulation techniques, which execute a sequence of simulation steps, where each new step depends on the previous ones. Due to this fundamental dependency, MCMC methods are inherently hard to parallelize on any architecture. The upcoming generations of hybrid CPU/GPGPU architectures with their multi-core CPUs and tightly coupled many-core GPGPUs provide new acceleration opportunities especially for MCMC methods, if the new degrees of freedom are exploited correctly. In this paper, the outcomes of an interdisciplinary collaboration are presented, which focused on the parallel mapping of a MCMC molecular simulation from thermodynamics to hybrid CPU/GPGPU computing systems. While the mapping is designed for upcoming hybrid architectures, the implementation of this approach on an NVIDIA Tesla system already leads to a substantial speedup of more than 87× despite the additional communication overheads.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133936378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-09-30DOI: 10.1109/ICCD.2012.6378679
M. Kaneko
Post-Silicon Tuning is an emerging technology for improving performance-yield of VLSIs under process variations. This paper focuses especially on the post-silicon timing-skew tuning (PSST) via programmable delay elements (PDEs), and proposes a novel tuning algorithm which utilizes only the result of setup and hold timing tests, not the result of costly delay-time measurements. The basic framework of our PSST consists of the construction of Control-value Constraint Graph from the results of timing-tests, and the computation of longest path lengths on this graph for finding safe PDE setting. Even though the cost for timing test is smaller than a delay-time measurement, the cost of timing-tests is still a dominant part of the PSST cost, and its reduction is a crucial problem. Longest path lengths which we need to compute depends directly on edge weights in the “longest-paths tree”, but for co-tree edges, their exact edge weights are not always necessary. Based on this observation, we propose timing-test scheduling for reducing the timing-test cost for PDE tuning. The experimental simulation results show that our approach reduces the test cost by almost half or more.
{"title":"Timing-test scheduling for constraint-graph based post-silicon skew tuning","authors":"M. Kaneko","doi":"10.1109/ICCD.2012.6378679","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378679","url":null,"abstract":"Post-Silicon Tuning is an emerging technology for improving performance-yield of VLSIs under process variations. This paper focuses especially on the post-silicon timing-skew tuning (PSST) via programmable delay elements (PDEs), and proposes a novel tuning algorithm which utilizes only the result of setup and hold timing tests, not the result of costly delay-time measurements. The basic framework of our PSST consists of the construction of Control-value Constraint Graph from the results of timing-tests, and the computation of longest path lengths on this graph for finding safe PDE setting. Even though the cost for timing test is smaller than a delay-time measurement, the cost of timing-tests is still a dominant part of the PSST cost, and its reduction is a crucial problem. Longest path lengths which we need to compute depends directly on edge weights in the “longest-paths tree”, but for co-tree edges, their exact edge weights are not always necessary. Based on this observation, we propose timing-test scheduling for reducing the timing-test cost for PDE tuning. The experimental simulation results show that our approach reduces the test cost by almost half or more.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115361907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-09-30DOI: 10.1109/ICCD.2012.6378637
D. Milojevic, Sachin Idgunji, Djordje Jevdjic, Emre Ozer, P. Lotfi-Kamran, Andreas Panteli, A. Prodromou, C. Nicopoulos, D. Hardy, B. Falsafi, Yiannakis Sazeides
We propose a power-efficient many-core server-on-chip system with 3D-stacked Wide I/O DRAM targeting cloud workloads in datacenters. The integration of 3D-stacked Wide I/O DRAM on top of a logic die increases available memory bandwidth by using dense and fast Through-Silicon Vias (TSVs) instead of off-chip IOs, enabling faster data transfers at much lower energy per bit. We demonstrate a methodology that includes full-system microarchitectural modeling and rapid virtual physical prototyping with emphasis on the thermal analysis. Our findings show that while executing CPU-centric benchmarks (e.g. SPECInt and Dhrystone), the temperature in the server-on-chip (logic+DRAM) is in the range of 175-200°C at a power consumption of less than 20W, exceeding the reliable operating bounds without any cooling solutions, even with embedded cores. However, with real cloud workloads, the power density in the server-on-chip remains much below the temperatures reached by the CPU-centric workloads as a result of much lower power burnt by memory-intensive cloud workloads. We show that such a server-on-chip system is feasible with a low-cost passive heat sink eliminating the need for a high-cost active heat sink with an attached fan, creating an opportunity for overall cost and energy savings in datacenters.
{"title":"Thermal characterization of cloud workloads on a power-efficient server-on-chip","authors":"D. Milojevic, Sachin Idgunji, Djordje Jevdjic, Emre Ozer, P. Lotfi-Kamran, Andreas Panteli, A. Prodromou, C. Nicopoulos, D. Hardy, B. Falsafi, Yiannakis Sazeides","doi":"10.1109/ICCD.2012.6378637","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378637","url":null,"abstract":"We propose a power-efficient many-core server-on-chip system with 3D-stacked Wide I/O DRAM targeting cloud workloads in datacenters. The integration of 3D-stacked Wide I/O DRAM on top of a logic die increases available memory bandwidth by using dense and fast Through-Silicon Vias (TSVs) instead of off-chip IOs, enabling faster data transfers at much lower energy per bit. We demonstrate a methodology that includes full-system microarchitectural modeling and rapid virtual physical prototyping with emphasis on the thermal analysis. Our findings show that while executing CPU-centric benchmarks (e.g. SPECInt and Dhrystone), the temperature in the server-on-chip (logic+DRAM) is in the range of 175-200°C at a power consumption of less than 20W, exceeding the reliable operating bounds without any cooling solutions, even with embedded cores. However, with real cloud workloads, the power density in the server-on-chip remains much below the temperatures reached by the CPU-centric workloads as a result of much lower power burnt by memory-intensive cloud workloads. We show that such a server-on-chip system is feasible with a low-cost passive heat sink eliminating the need for a high-cost active heat sink with an attached fan, creating an opportunity for overall cost and energy savings in datacenters.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115754169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-09-30DOI: 10.1109/ICCD.2012.6378693
Kiyeon Lee, Moo-Kyoung Chung, Soojung Ryu, Yeon-Gon Cho, Sangyeun Cho
This paper explores high-bandwidth data cache designs for a coarse-grained reconfigurable architecture processor family capable of achieving a high degree of instruction level parallelism. To meet stringent power, area and time-to-market constraints, we take an architectural approach rather than circuit-level multi-porting approaches. We closely examine two design choices: single-level banked cache (SLC) and two-level cache (TLC). A detailed simulation study using a set of microbenchmarks and industry-strength benchmarks finds that both SLC and TLC offer a reasonably competitive performance at a small implementation cost compared with a hypothetical cache with perfect ports and a multi-bank scratchpad memory.
{"title":"Design and evaluation of a four-port data cache for high instruction level parallelism reconfigurable processors","authors":"Kiyeon Lee, Moo-Kyoung Chung, Soojung Ryu, Yeon-Gon Cho, Sangyeun Cho","doi":"10.1109/ICCD.2012.6378693","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378693","url":null,"abstract":"This paper explores high-bandwidth data cache designs for a coarse-grained reconfigurable architecture processor family capable of achieving a high degree of instruction level parallelism. To meet stringent power, area and time-to-market constraints, we take an architectural approach rather than circuit-level multi-porting approaches. We closely examine two design choices: single-level banked cache (SLC) and two-level cache (TLC). A detailed simulation study using a set of microbenchmarks and industry-strength benchmarks finds that both SLC and TLC offer a reasonably competitive performance at a small implementation cost compared with a hypothetical cache with perfect ports and a multi-bank scratchpad memory.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125410596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-09-30DOI: 10.1109/ICCD.2012.6378625
M. Ghasemazar, H. Goudarzi, Massoud Pedram
Power dissipation and die temperature have become key performance limiters in today's high-performance Chip Multiprocessors (CMPs.) Dynamic power management solutions have been proposed to manage resources in a CMP based on the measured power dissipation, performance, and die temperature of processing cores. In this paper, we develop a robust framework for power and thermal management of heterogeneous CMPs subject to variability and uncertainty in system parameters. More precisely, we first model and formulate the problem of maximizing the task throughput of a heterogeneous CMP (a.k.a., asymmetric multi-core architecture) subject to a total power budget and a per-core temperature limit. Next we develop a solution framework, called Variation-aware Power/Thermal Manager (VPTM), which is a hierarchical dynamic power and thermal management solution targeting heterogeneous CMP architectures. VPTM utilizes dynamic voltage and frequency scaling (DVFS) and core consolidation techniques to control the core power consumptions, which implicitly regulate the core temperatures. An algorithm is proposed for core consolidation and application assignment, and a convex program is formulated and solved to produce optimal DVFS settings. Finally, a feedback controller is employed to compensate for variations in key system parameters at runtime. Experimental results show highly promising performance improvements for VPTM compared to the state-of-the-art techniques.
{"title":"Robust optimization of a Chip Multiprocessor's performance under power and thermal constraints","authors":"M. Ghasemazar, H. Goudarzi, Massoud Pedram","doi":"10.1109/ICCD.2012.6378625","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378625","url":null,"abstract":"Power dissipation and die temperature have become key performance limiters in today's high-performance Chip Multiprocessors (CMPs.) Dynamic power management solutions have been proposed to manage resources in a CMP based on the measured power dissipation, performance, and die temperature of processing cores. In this paper, we develop a robust framework for power and thermal management of heterogeneous CMPs subject to variability and uncertainty in system parameters. More precisely, we first model and formulate the problem of maximizing the task throughput of a heterogeneous CMP (a.k.a., asymmetric multi-core architecture) subject to a total power budget and a per-core temperature limit. Next we develop a solution framework, called Variation-aware Power/Thermal Manager (VPTM), which is a hierarchical dynamic power and thermal management solution targeting heterogeneous CMP architectures. VPTM utilizes dynamic voltage and frequency scaling (DVFS) and core consolidation techniques to control the core power consumptions, which implicitly regulate the core temperatures. An algorithm is proposed for core consolidation and application assignment, and a convex program is formulated and solved to produce optimal DVFS settings. Finally, a feedback controller is employed to compensate for variations in key system parameters at runtime. Experimental results show highly promising performance improvements for VPTM compared to the state-of-the-art techniques.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124063198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-09-30DOI: 10.1109/ICCD.2012.6378687
Moo-Kyoung Chung, Yeon-Gon Cho, Soojung Ryu
Though Coarse Grained Reconfigurable Architecture (CGRA) is a flexible alternative for high performance computing, it has a crucial problem on instruction code whose size is so large that the instruction memory takes a significant portion of silicon area and power consumption. This article proposes an efficient dictionary-based compression method for the CGRA instruction code, where code bit-fields are rearranged and grouped together according to locality characteristics and the most efficient compression mode is selected for each group and kernel. The proposed method can reinstall the dictionary contents adaptively for each kernel. Experimental results show that the proposed method achieved an average compression ratio 0.56 in 4×4 array of function units for well-optimized applications.
{"title":"Efficient code compression for coarse grained reconfigurable architectures","authors":"Moo-Kyoung Chung, Yeon-Gon Cho, Soojung Ryu","doi":"10.1109/ICCD.2012.6378687","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378687","url":null,"abstract":"Though Coarse Grained Reconfigurable Architecture (CGRA) is a flexible alternative for high performance computing, it has a crucial problem on instruction code whose size is so large that the instruction memory takes a significant portion of silicon area and power consumption. This article proposes an efficient dictionary-based compression method for the CGRA instruction code, where code bit-fields are rearranged and grouped together according to locality characteristics and the most efficient compression mode is selected for each group and kernel. The proposed method can reinstall the dictionary contents adaptively for each kernel. Experimental results show that the proposed method achieved an average compression ratio 0.56 in 4×4 array of function units for well-optimized applications.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123380029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-09-30DOI: 10.1109/ICCD.2012.6378636
Faissal M. Sleiman, R. Dreslinski, T. Wenisch
This paper investigates Embedded Way Prediction for large last-level caches (LLCs): an architecture and circuit design to provide the latency of parallel tag-data access at substantial energy savings. Existing way prediction approaches for L1 caches are compromised by the high associativity and filtered temporal locality of LLCs. We demonstrate: (1) the need for wide partial tag comparison, which we implement with a dynamic CAM alongside the data sub-array wordline decode, and (2) the inhibit bit, an architectural innovation to provide accurate predictions when the partial tag comparison is inconclusive. We present circuit critical-path and architectural power/performance studies demonstrating speedups of up to 15.4% (6.6% average) for scientific and server applications, matching the performance of parallel tag-data access while reducing energy overhead by 40%.
{"title":"Embedded way prediction for last-level caches","authors":"Faissal M. Sleiman, R. Dreslinski, T. Wenisch","doi":"10.1109/ICCD.2012.6378636","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378636","url":null,"abstract":"This paper investigates Embedded Way Prediction for large last-level caches (LLCs): an architecture and circuit design to provide the latency of parallel tag-data access at substantial energy savings. Existing way prediction approaches for L1 caches are compromised by the high associativity and filtered temporal locality of LLCs. We demonstrate: (1) the need for wide partial tag comparison, which we implement with a dynamic CAM alongside the data sub-array wordline decode, and (2) the inhibit bit, an architectural innovation to provide accurate predictions when the partial tag comparison is inconclusive. We present circuit critical-path and architectural power/performance studies demonstrating speedups of up to 15.4% (6.6% average) for scientific and server applications, matching the performance of parallel tag-data access while reducing energy overhead by 40%.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125636321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-09-30DOI: 10.1109/ICCD.2012.6378655
A. Valero, J. Sahuquillo, S. Petit, P. López, J. Duato
Cache memories have been typically implemented with Static Random Access Memory (SRAM) technology. This technology presents a fast access time but high energy consumption and low density. As opposite, the recently appeared embedded Dynamic RAM (eDRAM) technology allows caches to be built with lower energy and area, although with a slower access time. The eDRAM technology provides important leakage and area savings, especially in huge Last-Level Caches (LLCs), which occupy almost half the silicon area in some recent microprocessors. This paper proposes a novel hybrid LLC, which combines SRAM and eDRAM banks to address the trade-off among performance, energy, and area. To this end, we explore the optimal percentage of SRAM and eDRAM banks that achieves the best target trade-off. Architectural mechanisms have been devised to keep the most likely accessed blocks in fast SRAM banks as well as to avoid unnecessary destructive reads. Experimental results show that, compared to a conventional SRAM LLC with the same storage capacity, performance degradation does not surpass, on average, 2.9% (even with 12.5% of banks built with SRAM technology), whereas area savings can be as high as 46% for a 1MB-16way LLC. For a 45nm technology node, the energy-delay squared product confirms that a hybrid cache is a better design than the conventional SRAM cache regardless the number of eDRAM banks, and also better than a conventional eDRAM cache when the number of SRAM banks is a quarter or an eighth of the cache banks.
{"title":"Analyzing the optimal ratio of SRAM banks in hybrid caches","authors":"A. Valero, J. Sahuquillo, S. Petit, P. López, J. Duato","doi":"10.1109/ICCD.2012.6378655","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378655","url":null,"abstract":"Cache memories have been typically implemented with Static Random Access Memory (SRAM) technology. This technology presents a fast access time but high energy consumption and low density. As opposite, the recently appeared embedded Dynamic RAM (eDRAM) technology allows caches to be built with lower energy and area, although with a slower access time. The eDRAM technology provides important leakage and area savings, especially in huge Last-Level Caches (LLCs), which occupy almost half the silicon area in some recent microprocessors. This paper proposes a novel hybrid LLC, which combines SRAM and eDRAM banks to address the trade-off among performance, energy, and area. To this end, we explore the optimal percentage of SRAM and eDRAM banks that achieves the best target trade-off. Architectural mechanisms have been devised to keep the most likely accessed blocks in fast SRAM banks as well as to avoid unnecessary destructive reads. Experimental results show that, compared to a conventional SRAM LLC with the same storage capacity, performance degradation does not surpass, on average, 2.9% (even with 12.5% of banks built with SRAM technology), whereas area savings can be as high as 46% for a 1MB-16way LLC. For a 45nm technology node, the energy-delay squared product confirms that a hybrid cache is a better design than the conventional SRAM cache regardless the number of eDRAM banks, and also better than a conventional eDRAM cache when the number of SRAM banks is a quarter or an eighth of the cache banks.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134347682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-09-30DOI: 10.1109/ICCD.2012.6378631
Trey Reece, D. Limbrick, Xiaowen Wang, B. Kiddie, W. H. Robinson
Many experimental hardware Trojans from the literature explore the potential threat vectors, but do not address the stealthiness of the malicious hardware. If a Trojan requires a large amount of area or power, then it can be easier to detect. Instead, a more focused attack can potentially avoid detection. This paper explores the cost in both area and power consumption of several small, focused attacks on an Intel 8051 microcontroller implemented with a standard cell library. The resulting cost in total area varied from a 0.4% increase in the design, down to a 0.150% increase in the design. Dynamic and leakage power showed similar results.
{"title":"Stealth assessment of hardware Trojans in a microcontroller","authors":"Trey Reece, D. Limbrick, Xiaowen Wang, B. Kiddie, W. H. Robinson","doi":"10.1109/ICCD.2012.6378631","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378631","url":null,"abstract":"Many experimental hardware Trojans from the literature explore the potential threat vectors, but do not address the stealthiness of the malicious hardware. If a Trojan requires a large amount of area or power, then it can be easier to detect. Instead, a more focused attack can potentially avoid detection. This paper explores the cost in both area and power consumption of several small, focused attacks on an Intel 8051 microcontroller implemented with a standard cell library. The resulting cost in total area varied from a 0.4% increase in the design, down to a 0.150% increase in the design. Dynamic and leakage power showed similar results.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126321549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-09-30DOI: 10.1109/ICCD.2012.6378664
Zhe Zhang, Weijun Xiao, Nohhyun Park, D. Lilja
Phase change memory (PCM) is a promising technology to solve energy and performance bottlenecks for memory and storage systems. To help understand the reliability characteristics of PCM devices, we present a simple fault model to categorize four types of PCM errors. Based on our proposed fault model, we conduct extensive experiments on real PCM devices at the memory module level. Numerical results uncover many interesting trends in terms of the lifetime of PCM devices and error behaviors. Specifically, PCM lifetime for the memory chips we tested is greater than 14 million cycles, which is much longer than for flash memory devices. In addition, the distributions for four types of errors are quite different. These results can be used for estimating PCM lifetime and for measuring the fabrication quality of individual PCM memory chips.
{"title":"Memory module-level testing and error behaviors for phase change memory","authors":"Zhe Zhang, Weijun Xiao, Nohhyun Park, D. Lilja","doi":"10.1109/ICCD.2012.6378664","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378664","url":null,"abstract":"Phase change memory (PCM) is a promising technology to solve energy and performance bottlenecks for memory and storage systems. To help understand the reliability characteristics of PCM devices, we present a simple fault model to categorize four types of PCM errors. Based on our proposed fault model, we conduct extensive experiments on real PCM devices at the memory module level. Numerical results uncover many interesting trends in terms of the lifetime of PCM devices and error behaviors. Specifically, PCM lifetime for the memory chips we tested is greater than 14 million cycles, which is much longer than for flash memory devices. In addition, the distributions for four types of errors are quite different. These results can be used for estimating PCM lifetime and for measuring the fabrication quality of individual PCM memory chips.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131460950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}