Pub Date : 2008-12-01DOI: 10.1109/ICCD.2008.4751892
T. Matsunaga, S. Kimura, Y. Matsunaga
This paper addresses parallel prefix adder synthesis which targets minimization of the total switching activities under bitwise timing constraints. This problem is treated as synthesis of prefix graphs which represent global structures of parallel prefix adders at technology-independent level. An approach for timing-driven area minimization has been proposed which first finds the exact minimum solution on a specific subset of prefix graphs by dynamic programming, then restructures the result for further reduction by removing restriction on the subset. This approach can be applied for switching cost minimization almost directly, though it is not so effective as area minimization in some cases. In this paper, a heuristic is proposed which estimates the effect of the restructuring phase and improve cost calculation for some specific cases. Through various kinds of experiments, conditions where this approach can be executed effectively is also discussed.
{"title":"Synthesis of parallel prefix adders considering switching activities","authors":"T. Matsunaga, S. Kimura, Y. Matsunaga","doi":"10.1109/ICCD.2008.4751892","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751892","url":null,"abstract":"This paper addresses parallel prefix adder synthesis which targets minimization of the total switching activities under bitwise timing constraints. This problem is treated as synthesis of prefix graphs which represent global structures of parallel prefix adders at technology-independent level. An approach for timing-driven area minimization has been proposed which first finds the exact minimum solution on a specific subset of prefix graphs by dynamic programming, then restructures the result for further reduction by removing restriction on the subset. This approach can be applied for switching cost minimization almost directly, though it is not so effective as area minimization in some cases. In this paper, a heuristic is proposed which estimates the effect of the restructuring phase and improve cost calculation for some specific cases. Through various kinds of experiments, conditions where this approach can be executed effectively is also discussed.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"160 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134145360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-12-01DOI: 10.1109/ICCD.2008.4751924
N. Seki, Lei Zhao, J. Kei, D. Ikebuchi, Y. Kojima, Y. Hasegawa, H. Amano, Toshihiro Kashima, S. Takeda, T. Shirai, M. Nakata, K. Usami, T. Sunata, J. Kanai, M. Namiki, Masaaki Kondo, Hiroshi Nakamura
A fine-grain dynamic power gating is proposed for saving the leakage power in MIPS R3000 by sleep control and applied to a processor pipeline. An execution unit is divided into four small units: multiplier, divider, shifter and other (CLU). The power of each unit is cut off dynamically, based on the operation. We tape-outed the prototype chip Geyser-0, which provides an R3000 Core with the power reduction technique, 16 KB caches and translation lookaside buffer (TLB) using 90 nm CMOS technology. The evaluation results of four benchmark programs for embedded applications show that 47% of the leakage power is reduced on average with 41% area overhead.
{"title":"A fine-grain dynamic sleep control scheme in MIPS R3000","authors":"N. Seki, Lei Zhao, J. Kei, D. Ikebuchi, Y. Kojima, Y. Hasegawa, H. Amano, Toshihiro Kashima, S. Takeda, T. Shirai, M. Nakata, K. Usami, T. Sunata, J. Kanai, M. Namiki, Masaaki Kondo, Hiroshi Nakamura","doi":"10.1109/ICCD.2008.4751924","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751924","url":null,"abstract":"A fine-grain dynamic power gating is proposed for saving the leakage power in MIPS R3000 by sleep control and applied to a processor pipeline. An execution unit is divided into four small units: multiplier, divider, shifter and other (CLU). The power of each unit is cut off dynamically, based on the operation. We tape-outed the prototype chip Geyser-0, which provides an R3000 Core with the power reduction technique, 16 KB caches and translation lookaside buffer (TLB) using 90 nm CMOS technology. The evaluation results of four benchmark programs for embedded applications show that 47% of the leakage power is reduced on average with 41% area overhead.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115142151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-12-01DOI: 10.1109/ICCD.2008.4751875
Venkatesan Packirisamy, Yangchun Luo, W. Hung, Antonia Zhai, P. Yew, Tin-fook Ngai
Computer industry has adopted multi-threaded and multi-core architectures as the clock rate increase stalled in early 2000psilas. However, because of the lack of compilers and other related software technologies, most of the general-purpose applications today still cannot take advantage of such architectures to improve their performance. Thread-level speculation (TLS) has been proposed as a way of using these multi-threaded architectures to parallelize general-purpose applications. Both simultaneous multithreading (SMT) and chip multiprocessors (CMP) have been extended to implement TLS. While the characteristics of SMT and CMP have been widely studied under multi-programmed and parallel workloads, their behavior under TLS workload is not well understood. The TLS workload due to speculative nature of the threads which could potentially be rollbacked and due to variable degree of parallelism available in applications, exhibits unique characteristics which makes it different from other workloads. In this paper, we present a detailed study of the performance, power consumption and thermal effect of these multithreaded architectures against that of a Superscalar with equal chip area. A wide spectrum of design choices and tradeoffs are also studied using commonly used simulation techniques. We show that the SMT based TLS architecture performs about 21% better than the best CMP based configuration while it suffers about 16% power overhead. In terms of Energy-Delay-Squared product (ED2), SMT based TLS performs about 26% better than the best CMP based TLS configuration and 11% better than the superscalar architecture. But the SMT based TLS configuration, causes more thermal stress than the CMP based TLS architectures.
{"title":"Efficiency of thread-level speculation in SMT and CMP architectures - performance, power and thermal perspective","authors":"Venkatesan Packirisamy, Yangchun Luo, W. Hung, Antonia Zhai, P. Yew, Tin-fook Ngai","doi":"10.1109/ICCD.2008.4751875","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751875","url":null,"abstract":"Computer industry has adopted multi-threaded and multi-core architectures as the clock rate increase stalled in early 2000psilas. However, because of the lack of compilers and other related software technologies, most of the general-purpose applications today still cannot take advantage of such architectures to improve their performance. Thread-level speculation (TLS) has been proposed as a way of using these multi-threaded architectures to parallelize general-purpose applications. Both simultaneous multithreading (SMT) and chip multiprocessors (CMP) have been extended to implement TLS. While the characteristics of SMT and CMP have been widely studied under multi-programmed and parallel workloads, their behavior under TLS workload is not well understood. The TLS workload due to speculative nature of the threads which could potentially be rollbacked and due to variable degree of parallelism available in applications, exhibits unique characteristics which makes it different from other workloads. In this paper, we present a detailed study of the performance, power consumption and thermal effect of these multithreaded architectures against that of a Superscalar with equal chip area. A wide spectrum of design choices and tradeoffs are also studied using commonly used simulation techniques. We show that the SMT based TLS architecture performs about 21% better than the best CMP based configuration while it suffers about 16% power overhead. In terms of Energy-Delay-Squared product (ED2), SMT based TLS performs about 26% better than the best CMP based TLS configuration and 11% better than the superscalar architecture. But the SMT based TLS configuration, causes more thermal stress than the CMP based TLS architectures.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131731822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-11-10DOI: 10.1109/ICCD.2008.4751896
H. Saleh, E. Swartzlander
A floating-point fused dot-product unit is presented that performs single-precision floating-point multiplication and addition operations on two pairs of data in a time that is only 150% the time required for a conventional floating-point multiplication. When placed and routed in a 45 nm process, the fused dot-product unit occupied about 70% of the area needed to implement a parallel dot-product unit using conventional floating-point adders and multipliers. The speed of the fused dot-product is 27% faster than the speed of the conventional parallel approach. The numerical result of the fused unit is more accurate because one rounding operation is needed versus at least three for other approaches.
{"title":"A floating-point fused dot-product unit","authors":"H. Saleh, E. Swartzlander","doi":"10.1109/ICCD.2008.4751896","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751896","url":null,"abstract":"A floating-point fused dot-product unit is presented that performs single-precision floating-point multiplication and addition operations on two pairs of data in a time that is only 150% the time required for a conventional floating-point multiplication. When placed and routed in a 45 nm process, the fused dot-product unit occupied about 70% of the area needed to implement a parallel dot-product unit using conventional floating-point adders and multipliers. The speed of the fused dot-product is 27% faster than the speed of the conventional parallel approach. The numerical result of the fused unit is more accurate because one rounding operation is needed versus at least three for other approaches.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130771363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-10-01DOI: 10.1109/ICCD.2008.4751857
K. Shi
This paper presents two area and power-delay efficient state retention pulsed flops with scan and reset capabilities for sub-90 nm production low-power designs. The proposed flops also mitigate area overhead and integration complexity in SoC designs by implementing a single retention control signal and shared function/scan mode clock.
{"title":"Area and power-delay efficient state retention pulse-triggered flip-flops with scan and reset capabilities","authors":"K. Shi","doi":"10.1109/ICCD.2008.4751857","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751857","url":null,"abstract":"This paper presents two area and power-delay efficient state retention pulsed flops with scan and reset capabilities for sub-90 nm production low-power designs. The proposed flops also mitigate area overhead and integration complexity in SoC designs by implementing a single retention control signal and shared function/scan mode clock.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114675245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-10-01DOI: 10.1109/ICCD.2008.4751859
Yulei Zhang, Ling Zhang, A. Tsuchiya, M. Hashimoto, Chung-Kuan Cheng
To address the performance limitation brought by the scaling issues of on-chip global wires, a new configuration for global wiring using on-chip lossy transmission lines(T-lines) is proposed and optimized in this paper. Firstly, we use passive compensation and repeated transceivers composed by sense amplifier and inverter chain to compensate the distortion and attenuation of on-chip T-lines. Secondly, an optimization flow for designing this scheme based on eye-diagram prediction and sequential quadratic programming (SQP) is proposed. This flow is employed to study the latency, power dissipation and throughput performance of the new global wiring scheme as the technology scales from 90nm to 22nm. Compared with conventional repeater insertion methods, our experimental results demonstrate that, at 22nm technology node, this new scheme reduces the normalized delay by 85.1%, the normalized energy consumption by 98.8%. Furthermore, all the performance metrics are scalable as the technology advances, which makes this new signaling scheme a potential candidate to break the “interconnect wall” of digital system performance.
{"title":"On-chip high performance signaling using passive compensation","authors":"Yulei Zhang, Ling Zhang, A. Tsuchiya, M. Hashimoto, Chung-Kuan Cheng","doi":"10.1109/ICCD.2008.4751859","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751859","url":null,"abstract":"To address the performance limitation brought by the scaling issues of on-chip global wires, a new configuration for global wiring using on-chip lossy transmission lines(T-lines) is proposed and optimized in this paper. Firstly, we use passive compensation and repeated transceivers composed by sense amplifier and inverter chain to compensate the distortion and attenuation of on-chip T-lines. Secondly, an optimization flow for designing this scheme based on eye-diagram prediction and sequential quadratic programming (SQP) is proposed. This flow is employed to study the latency, power dissipation and throughput performance of the new global wiring scheme as the technology scales from 90nm to 22nm. Compared with conventional repeater insertion methods, our experimental results demonstrate that, at 22nm technology node, this new scheme reduces the normalized delay by 85.1%, the normalized energy consumption by 98.8%. Furthermore, all the performance metrics are scalable as the technology advances, which makes this new signaling scheme a potential candidate to break the “interconnect wall” of digital system performance.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123682557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-10-01DOI: 10.1109/ICCD.2008.4751835
I. Jiang, Ming-Hua Wu
Interconnect delay and low power are two of the main issues in nano technology. Buffer insertion during routing effectively reduces interconnect delay; power state management and multiple supply voltage significantly lower power consumption. However, buffering without considering power states in multiple supply voltage designs may cause the signal integrity problem. This paper first considers power states into buffered tree construction. Based on a hierarchical approach combined with dynamic programming, we can simultaneously minimize power, satisfy timing constraints and maintain signal integrity.
{"title":"Power-state-aware buffered tree construction","authors":"I. Jiang, Ming-Hua Wu","doi":"10.1109/ICCD.2008.4751835","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751835","url":null,"abstract":"Interconnect delay and low power are two of the main issues in nano technology. Buffer insertion during routing effectively reduces interconnect delay; power state management and multiple supply voltage significantly lower power consumption. However, buffering without considering power states in multiple supply voltage designs may cause the signal integrity problem. This paper first considers power states into buffered tree construction. Based on a hierarchical approach combined with dynamic programming, we can simultaneously minimize power, satisfy timing constraints and maintain signal integrity.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"288 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122977840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-10-01DOI: 10.1109/ICCD.2008.4751833
Nasir Mohyuddin, E. Pakbaznia, Massoud Pedram
A gate level probabilistic error propagation model is presented which takes as input the Boolean function of the gate, the signal and error probabilities of the gate inputs, and the gate error probability and produces the error probability at the output of the gate. The presented model uses the Boolean difference calculus and can be applied to the problem of calculating the error probability at the primary outputs of a multi-level Boolean circuit with a time complexity which is linear in the number of gates in the circuit. This is done by starting from the primary inputs and moving toward the primary outputs by using a post-order traversal. Experimental results demonstrate the accuracy and efficiency of the proposed approach compared to the other known methods for error calculation in VLSI circuits.
{"title":"Probabilistic error propagation in logic circuits using the Boolean difference calculus","authors":"Nasir Mohyuddin, E. Pakbaznia, Massoud Pedram","doi":"10.1109/ICCD.2008.4751833","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751833","url":null,"abstract":"A gate level probabilistic error propagation model is presented which takes as input the Boolean function of the gate, the signal and error probabilities of the gate inputs, and the gate error probability and produces the error probability at the output of the gate. The presented model uses the Boolean difference calculus and can be applied to the problem of calculating the error probability at the primary outputs of a multi-level Boolean circuit with a time complexity which is linear in the number of gates in the circuit. This is done by starting from the primary inputs and moving toward the primary outputs by using a post-order traversal. Experimental results demonstrate the accuracy and efficiency of the proposed approach compared to the other known methods for error calculation in VLSI circuits.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128723519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-10-01DOI: 10.1109/ICCD.2008.4751888
Carsten Gremzow
One of the main challenges in system design whether for high performance computing or in embedded systems is to partition software for target architectures like multi-core, heterogeneous, or even hardware/software co-design systems. Several compiler techniques handle partitioning and related problems by using static analysis and therefor have no means to capture the global data flow in quantity and its dynamics which is essential for extracting tasks or exploiting coarse grained parallelism. We present a novel solution for capturing and analyzing an applicationpsilas quantitative data flow in this paper. The core part is the LLILA (Low Level Intermediate Language Analyzer) tool set, which automatically generates and augments self-profiling instruction set simulators from assembly level descriptions for a virtual machine. During run-time of the augmented program several properties (frequency, quantity and locality reflecting inter-procedural communication) of data exchange are captured at instruction level and as a consequence in the highest possible degree of accuracy.
无论是高性能计算还是嵌入式系统,系统设计的主要挑战之一是为目标体系结构(如多核、异构甚至硬件/软件协同设计系统)划分软件。一些编译器技术通过使用静态分析来处理分区和相关问题,因此无法大量捕获全局数据流及其动态,而这对于提取任务或利用粗粒度并行性至关重要。本文提出了一种捕获和分析应用程序中定量数据流的新方法。核心部分是LLILA (Low Level Intermediate Language Analyzer)工具集,它根据虚拟机的汇编级描述自动生成和增强自剖析指令集模拟器。在扩充程序的运行期间,在指令级捕获数据交换的几个属性(反映程序间通信的频率、数量和位置),从而达到尽可能高的精度。
{"title":"Quantitative global dataflow analysis on virtual instruction set simulators for hardware/software co-design","authors":"Carsten Gremzow","doi":"10.1109/ICCD.2008.4751888","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751888","url":null,"abstract":"One of the main challenges in system design whether for high performance computing or in embedded systems is to partition software for target architectures like multi-core, heterogeneous, or even hardware/software co-design systems. Several compiler techniques handle partitioning and related problems by using static analysis and therefor have no means to capture the global data flow in quantity and its dynamics which is essential for extracting tasks or exploiting coarse grained parallelism. We present a novel solution for capturing and analyzing an applicationpsilas quantitative data flow in this paper. The core part is the LLILA (Low Level Intermediate Language Analyzer) tool set, which automatically generates and augments self-profiling instruction set simulators from assembly level descriptions for a virtual machine. During run-time of the augmented program several properties (frequency, quantity and locality reflecting inter-procedural communication) of data exchange are captured at instruction level and as a consequence in the highest possible degree of accuracy.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124229998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-10-01DOI: 10.1109/ICCD.2008.4751882
A. Namazi, S. Askari, M. Nourani
Analog and digital circuits are both prone to failure due to transient upsets, variations, etc. Redundancy techniques, such as N-tuple Modular Redundancy, has been widely used to correct faulty behavior of components and achieve high reliability for digital circuits, whereas, not much has been done on the analog side. In this paper, we propose a redundancy based fault-tolerant methodology to design a highly reliable analog to digital converters (ADC). Our methodology employs redundant analog blocks and chooses the best result using an innovative analog voter. Experimental results are reported to verify the concepts, measure the systempsilas reliability and tradeoff reliability versus cost and power.
{"title":"Highly reliable A/D converter using analog voting","authors":"A. Namazi, S. Askari, M. Nourani","doi":"10.1109/ICCD.2008.4751882","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751882","url":null,"abstract":"Analog and digital circuits are both prone to failure due to transient upsets, variations, etc. Redundancy techniques, such as N-tuple Modular Redundancy, has been widely used to correct faulty behavior of components and achieve high reliability for digital circuits, whereas, not much has been done on the analog side. In this paper, we propose a redundancy based fault-tolerant methodology to design a highly reliable analog to digital converters (ADC). Our methodology employs redundant analog blocks and chooses the best result using an innovative analog voter. Experimental results are reported to verify the concepts, measure the systempsilas reliability and tradeoff reliability versus cost and power.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121683278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}