In this paper we address the problem of growing leakage variability through effective dual-threshold voltage assignment. We propose a probabilistic dynamic programming-based method to assign dual-threshold voltages such that the overall expected leakage is minimized under a given probability of violating the timing constraint (timing yield). The key characteristics of our strategy are two pruning criteria that stochastically identify pareto-optimal solutions and prune the sub-optimal ones. Compared to other variability-driven dual-threshold voltage assignment schemes, the main advantages of our approach are 1) considering correlations due to common sources of variation, 2) providing controllable runtime, which in one of the proposed strategies is comparable to the deterministic algorithm, and 3) performing optimization based on all the signal paths simultaneously, as opposed to one path at a time. Experimental results indicate that the proposed probabilistic scheme is significantly better than a comparable deterministic dual-threshold voltage assignment, both in terms of expected leakage and the probability of violating the timing constraint
{"title":"Probabilistic dual-Vth leakage optimization under variability","authors":"A. Davoodi, Ankur Srivastava","doi":"10.1145/1077603.1077641","DOIUrl":"https://doi.org/10.1145/1077603.1077641","url":null,"abstract":"In this paper we address the problem of growing leakage variability through effective dual-threshold voltage assignment. We propose a probabilistic dynamic programming-based method to assign dual-threshold voltages such that the overall expected leakage is minimized under a given probability of violating the timing constraint (timing yield). The key characteristics of our strategy are two pruning criteria that stochastically identify pareto-optimal solutions and prune the sub-optimal ones. Compared to other variability-driven dual-threshold voltage assignment schemes, the main advantages of our approach are 1) considering correlations due to common sources of variation, 2) providing controllable runtime, which in one of the proposed strategies is comparable to the deterministic algorithm, and 3) performing optimization based on all the signal paths simultaneously, as opposed to one path at a time. Experimental results indicate that the proposed probabilistic scheme is significantly better than a comparable deterministic dual-threshold voltage assignment, both in terms of expected leakage and the probability of violating the timing constraint","PeriodicalId":256018,"journal":{"name":"ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005.","volume":" 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113951844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aggressive hardware prefetching often significantly increases energy consumption in the memory system. Experiments show that a major fraction of prefetching energy degradation is due to the hardware history table related energy costs. In this paper, the authors presented PARE, a power-aware prefetching engine that uses a newly designed indexed hardware history table. Compared to the conventional single table design, the new prefetching table consumes 7-11X less power per access. With the help of compiler-based location-set analysis, it is shown that the proposed PARE design improves energy consumption by as much as 40% in the data memory systems in 70nm processor designs.
{"title":"PARE: a power-aware hardware data prefetching engine","authors":"Yao Guo, M. Naser, C. A. Moritz","doi":"10.1145/1077603.1077686","DOIUrl":"https://doi.org/10.1145/1077603.1077686","url":null,"abstract":"Aggressive hardware prefetching often significantly increases energy consumption in the memory system. Experiments show that a major fraction of prefetching energy degradation is due to the hardware history table related energy costs. In this paper, the authors presented PARE, a power-aware prefetching engine that uses a newly designed indexed hardware history table. Compared to the conventional single table design, the new prefetching table consumes 7-11X less power per access. With the help of compiler-based location-set analysis, it is shown that the proposed PARE design improves energy consumption by as much as 40% in the data memory systems in 70nm processor designs.","PeriodicalId":256018,"journal":{"name":"ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005.","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114782292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We investigate and develop energy-efficient strategies for deployment of wireless sensor networks (WSN) for the purpose of monitoring some phenomenon of interest in a coverage region. We first describe a two-level WSN structure where the sensors in the lower level monitor their surrounding environment and the micro-servers in the top level provide connectivity between the sensors and a base station. We then formulate and solve the problem of assigning positions and initial energy levels to the micro-servers and concurrently partitioning the sensors into clusters assigned to individual micro-servers so as maximize the monitoring lifetime of the two-level WSN subject to a total energy budget. This problem, called MDEA, is solved for both collinear deployment and planar deployment situations. Our experimental results show that the design and deployment of such a two-level WSN increase the network lifetime by a factor of two or more compared to a flat WSN with the same total initial energy and quality of monitoring.
{"title":"Energy efficient strategies for deployment of a two-level wireless sensor network","authors":"A. Iranli, M. Maleki, Massoud Pedram","doi":"10.1145/1077603.1077659","DOIUrl":"https://doi.org/10.1145/1077603.1077659","url":null,"abstract":"We investigate and develop energy-efficient strategies for deployment of wireless sensor networks (WSN) for the purpose of monitoring some phenomenon of interest in a coverage region. We first describe a two-level WSN structure where the sensors in the lower level monitor their surrounding environment and the micro-servers in the top level provide connectivity between the sensors and a base station. We then formulate and solve the problem of assigning positions and initial energy levels to the micro-servers and concurrently partitioning the sensors into clusters assigned to individual micro-servers so as maximize the monitoring lifetime of the two-level WSN subject to a total energy budget. This problem, called MDEA, is solved for both collinear deployment and planar deployment situations. Our experimental results show that the design and deployment of such a two-level WSN increase the network lifetime by a factor of two or more compared to a flat WSN with the same total initial energy and quality of monitoring.","PeriodicalId":256018,"journal":{"name":"ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005.","volume":"3 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132805920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As FPGAs enter the nanometer regime, several modifications are needed to reduce the increasing leakage power dissipation. Hence, this work presents some modifications to the FPGAs CAD flow to mitigate leakage power dissipation through the use of multi-threshold CMOS technologies to pack and place logic blocks that exhibit similar idleness close to each other so they can be turned off during their idle time. The modifications are integrated into the VPR flow and tested on several FPGA benchmarks using a CMOS 0.13/spl mu/m dual-V/sub th/ technology, resulting in an average leakage power savings of at least 20%.
{"title":"LAP: a logic activity packing methodology for leakage power-tolerant FPGAs","authors":"Hassan Hassan, M. Anis, M. Elmasry","doi":"10.1145/1077603.1077664","DOIUrl":"https://doi.org/10.1145/1077603.1077664","url":null,"abstract":"As FPGAs enter the nanometer regime, several modifications are needed to reduce the increasing leakage power dissipation. Hence, this work presents some modifications to the FPGAs CAD flow to mitigate leakage power dissipation through the use of multi-threshold CMOS technologies to pack and place logic blocks that exhibit similar idleness close to each other so they can be turned off during their idle time. The modifications are integrated into the VPR flow and tested on several FPGA benchmarks using a CMOS 0.13/spl mu/m dual-V/sub th/ technology, resulting in an average leakage power savings of at least 20%.","PeriodicalId":256018,"journal":{"name":"ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130369360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper discusses a number of circuit techniques which address the DC and AC distortion performance of a low power current steering digital-to-analog converter design. The design provides 14 bit resolution and 200 MSPS conversion rate in a 1P4M 0.18 micron CMOS process, with optional 3.3 volt compatible devices, while operating over a wide 3.6 to 1.8 volt supply range. A power dissipation/conversion rate figure of merit of as low as 0.17 mW/MSPS was achieved for 1.8V operation and as low as 0.28 mW/MSPS at 3.3V. SFDR of 70 dB is achieved at a 50 MHz output frequency.
{"title":"A low power current steering digital to analog converter in 0.18 micron CMOS","authors":"D. Mercer","doi":"10.1145/1077603.1077621","DOIUrl":"https://doi.org/10.1145/1077603.1077621","url":null,"abstract":"This paper discusses a number of circuit techniques which address the DC and AC distortion performance of a low power current steering digital-to-analog converter design. The design provides 14 bit resolution and 200 MSPS conversion rate in a 1P4M 0.18 micron CMOS process, with optional 3.3 volt compatible devices, while operating over a wide 3.6 to 1.8 volt supply range. A power dissipation/conversion rate figure of merit of as low as 0.17 mW/MSPS was achieved for 1.8V operation and as low as 0.28 mW/MSPS at 3.3V. SFDR of 70 dB is achieved at a 50 MHz output frequency.","PeriodicalId":256018,"journal":{"name":"ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005.","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131442279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Ejlali, M. Schmitz, B. Al-Hashimi, S. Miremadi, P. Rosinger
Concerns about the reliability of real-time embedded systems that employ dynamic voltage scaling has recently been highlighted [R. Melhem, D. Mosse, E. Elnozahy (2004), Y. Zhang, K. Chakrabarty (2004) and D. Zhu, R. Melhem, D. Mosse (2004)], focusing on transient-fault-tolerance techniques based on time-redundancy. In this paper we analyze the usage of information redundancy in DVS-enabled systems with the aim of improving both the system tolerance to transient faults as well as the energy consumption. We demonstrate through a case study that it is possible to achieve both higher fault-tolerance and less energy using a combination of information and time redundancy when compared with using time redundancy alone. This even holds despite the impact of the information redundancy hardware overhead and its associated switching activities.
对采用动态电压标度的实时嵌入式系统可靠性的关注最近得到了强调[R]。Melhem, D. Mosse, E. Elnozahy (2004), Y. Zhang, K. Chakrabarty(2004)和D. Zhu, R. Melhem, D. Mosse(2004)],重点研究了基于时间冗余的瞬态容错技术。本文分析了信息冗余在支持dvs系统中的应用,目的是提高系统对暂态故障的容忍度和能量消耗。我们通过一个案例研究证明,与单独使用时间冗余相比,使用信息和时间冗余的组合可以实现更高的容错性和更少的能量。即使不考虑信息冗余硬件开销及其相关的交换活动的影响,这一点也是成立的。
{"title":"Energy efficient SEU-tolerance in DVS-enabled real-time systems through information redundancy","authors":"A. Ejlali, M. Schmitz, B. Al-Hashimi, S. Miremadi, P. Rosinger","doi":"10.1145/1077603.1077669","DOIUrl":"https://doi.org/10.1145/1077603.1077669","url":null,"abstract":"Concerns about the reliability of real-time embedded systems that employ dynamic voltage scaling has recently been highlighted [R. Melhem, D. Mosse, E. Elnozahy (2004), Y. Zhang, K. Chakrabarty (2004) and D. Zhu, R. Melhem, D. Mosse (2004)], focusing on transient-fault-tolerance techniques based on time-redundancy. In this paper we analyze the usage of information redundancy in DVS-enabled systems with the aim of improving both the system tolerance to transient faults as well as the energy consumption. We demonstrate through a case study that it is possible to achieve both higher fault-tolerance and less energy using a combination of information and time redundancy when compared with using time redundancy alone. This even holds despite the impact of the information redundancy hardware overhead and its associated switching activities.","PeriodicalId":256018,"journal":{"name":"ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130975809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Multimedia applications have become a dominant computing workload for computer systems as well as for wireless-based devices. Due to their repetitive computing and memory intensive nature, they can take effective advantage from processor-in-memory (PIM) technology. In this paper, a new low-power PIM-based 32-bit reconfigurable datapath optimized for multimedia applications is presented. The new circuit efficiently performs parallel arithmetic operations on either 8-, 16-, or 32-bit integer data or on 32-bit single precision floating-point data. As a result, high flexibility is provided at a very low hardware cost. When implemented using the UMC 0.18 /spl mu/m 1.8 V CMOS technology, the proposed datapath exhibits a 285 MHz running frequency, dissipates just 0.12 mW/MHz and occupies a silicon area of only 107,323 /spl mu/m/sub 2/. When performing 2D-DCT, proposed architecture consumes 74% less power and is 28% more power efficient compared to top-of-the-line commercial TI DSP.
{"title":"Cost-effective low-power processor-in-memory-based reconfigurable datapath for multimedia applications","authors":"M. Lanuzza, M. Margala, P. Corsonello","doi":"10.1145/1077603.1077645","DOIUrl":"https://doi.org/10.1145/1077603.1077645","url":null,"abstract":"Multimedia applications have become a dominant computing workload for computer systems as well as for wireless-based devices. Due to their repetitive computing and memory intensive nature, they can take effective advantage from processor-in-memory (PIM) technology. In this paper, a new low-power PIM-based 32-bit reconfigurable datapath optimized for multimedia applications is presented. The new circuit efficiently performs parallel arithmetic operations on either 8-, 16-, or 32-bit integer data or on 32-bit single precision floating-point data. As a result, high flexibility is provided at a very low hardware cost. When implemented using the UMC 0.18 /spl mu/m 1.8 V CMOS technology, the proposed datapath exhibits a 285 MHz running frequency, dissipates just 0.12 mW/MHz and occupies a silicon area of only 107,323 /spl mu/m/sub 2/. When performing 2D-DCT, proposed architecture consumes 74% less power and is 28% more power efficient compared to top-of-the-line commercial TI DSP.","PeriodicalId":256018,"journal":{"name":"ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123314732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we propose a novel low-power carry-select adder (CSA) design called cascaded CSA (C/sup 2/SA). Based on the prediction of the critical path delay of current operation, C/sup 2/SA can automatically work with one or two clock-cycle latency and a scaled supply voltage to achieve power improvement. Post-layout simulations of a 64-bit C/sup 2/SA in 180nm technology show that C/sup 2/SA can operate at a lower supply voltage, attaining 40.7% energy saving, while maintaining a similar (average) latency per operation (LPO) compared to standard CSA.
{"title":"Cascaded carry-select adder (C/sup 2/SA): a new structure for low-power CSA design","authors":"Yiran Chen, Hai Helen Li, K. Roy, Cheng-Kok Koh","doi":"10.1145/1077603.1077634","DOIUrl":"https://doi.org/10.1145/1077603.1077634","url":null,"abstract":"In this paper we propose a novel low-power carry-select adder (CSA) design called cascaded CSA (C/sup 2/SA). Based on the prediction of the critical path delay of current operation, C/sup 2/SA can automatically work with one or two clock-cycle latency and a scaled supply voltage to achieve power improvement. Post-layout simulations of a 64-bit C/sup 2/SA in 180nm technology show that C/sup 2/SA can operate at a lower supply voltage, attaining 40.7% energy saving, while maintaining a similar (average) latency per operation (LPO) compared to standard CSA.","PeriodicalId":256018,"journal":{"name":"ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005.","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121466550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper proposes a non-uniform cache architecture for reducing the power consumption of memory systems. The non-uniform cache allows having different associativity values (i.e., the number of cache-ways) for different cache-sets. An algorithm determines the optimum number of cache-ways for each cache-set and generates object code suitable for the non-uniform cache memory. The paper also proposes a compiler technique for reducing redundant cache-way accesses and cache-tag accesses. Experiments demonstrate that the technique can reduce the power consumption of memory systems by up to 76% compared to the best result achieved by the conventional method.
{"title":"A non-uniform cache architecture for low power system design","authors":"T. Ishihara, F. Fallah","doi":"10.1145/1077603.1077690","DOIUrl":"https://doi.org/10.1145/1077603.1077690","url":null,"abstract":"This paper proposes a non-uniform cache architecture for reducing the power consumption of memory systems. The non-uniform cache allows having different associativity values (i.e., the number of cache-ways) for different cache-sets. An algorithm determines the optimum number of cache-ways for each cache-set and generates object code suitable for the non-uniform cache memory. The paper also proposes a compiler technique for reducing redundant cache-way accesses and cache-tag accesses. Experiments demonstrate that the technique can reduce the power consumption of memory systems by up to 76% compared to the best result achieved by the conventional method.","PeriodicalId":256018,"journal":{"name":"ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005.","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124351461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper addresses the problem of low-power fanout optimization with multiple threshold voltage inverters. Introducing splitting and merging conversions that preserve delay, power, and input capacitance, the fanout tree is converted to a set of inverter chains and for each chain the optimal sizes and threshold voltages are determined. Experimental results show that using this technique, the power dissipation of fanout tree is reduced by an average of 33% for a state-of-the-art CMOS technology.
{"title":"Low-power fanout optimization using multiple threshold voltage inverters","authors":"B. Amelifard, F. Fallah, Massoud Pedram","doi":"10.1145/1077603.1077628","DOIUrl":"https://doi.org/10.1145/1077603.1077628","url":null,"abstract":"This paper addresses the problem of low-power fanout optimization with multiple threshold voltage inverters. Introducing splitting and merging conversions that preserve delay, power, and input capacitance, the fanout tree is converted to a set of inverter chains and for each chain the optimal sizes and threshold voltages are determined. Experimental results show that using this technique, the power dissipation of fanout tree is reduced by an average of 33% for a state-of-the-art CMOS technology.","PeriodicalId":256018,"journal":{"name":"ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005.","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126909659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}