Disk power management is becoming increasingly important in high-end server and cluster type of environments that execute data-intensive applications. While hardware-only approaches (e.g., low-power modes supported by current disks) are successful to a certain extent, one also needs to consider the software side to achieve further energy savings. This paper first demonstrates that conventional data locality oriented code transformations are not sufficient for minimizing disk power consumption. The reason is that these optimizations do not take into account how disk-resident array data are laid out on the disk system, and consequently, fail to increase idle periods of disks, which is the primary metric using which disk power can be reduced. Instead, we propose a disk layout aware application optimization strategy that uses both code restructuring and data layout optimization. Our experimental evaluation with several benchmark codes reveal that the proposed strategy is very successful in reducing disk energy consumption without performing much worse than a pure data locality oriented scheme, as far as execution cycles are concerned. The experiments also show that the benefits coming from our approach increase with the increased number of disks; i.e., it scales very well.
{"title":"An evaluation of code and data optimizations in the context of disk power reduction","authors":"M. Kandemir, S. Son, Guangyu Chen","doi":"10.1145/1077603.1077655","DOIUrl":"https://doi.org/10.1145/1077603.1077655","url":null,"abstract":"Disk power management is becoming increasingly important in high-end server and cluster type of environments that execute data-intensive applications. While hardware-only approaches (e.g., low-power modes supported by current disks) are successful to a certain extent, one also needs to consider the software side to achieve further energy savings. This paper first demonstrates that conventional data locality oriented code transformations are not sufficient for minimizing disk power consumption. The reason is that these optimizations do not take into account how disk-resident array data are laid out on the disk system, and consequently, fail to increase idle periods of disks, which is the primary metric using which disk power can be reduced. Instead, we propose a disk layout aware application optimization strategy that uses both code restructuring and data layout optimization. Our experimental evaluation with several benchmark codes reveal that the proposed strategy is very successful in reducing disk energy consumption without performing much worse than a pure data locality oriented scheme, as far as execution cycles are concerned. The experiments also show that the benefits coming from our approach increase with the increased number of disks; i.e., it scales very well.","PeriodicalId":256018,"journal":{"name":"ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115995674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper addresses the problem of power dissipation during the buffer insertion phase of interconnect performance optimization in nanometer scale designs taking all significant parameter variations into account. The relative effect of different device, interconnect and environmental variations on delay and different components of power has been studied. A probabilistic framework to optimize buffer-interconnect designs under variations has been presented and results are compared with those obtained through simple deterministic optimization. Also, statistical models for delay and power under parameter variations have been developed using linear regression techniques. Under statistical analysis, both power and performance of buffer-interconnect designs are shown to degrade with increasing amount of variations. Also, % error in power estimation for power-optimal repeater designs is shown to be significant if variations are not taken into account. Furthermore, it has been shown that due to variations, significantly higher penalties in delay are needed to operate at power levels similar to those under no variations. Finally, the percentage savings in total power for a given penalty in delay are shown to improve with increasing amount of parameter variations.
{"title":"A probabilistic framework for power-optimal repeater insertion in global interconnects under parameter variations","authors":"V. Wason, K. Banerjee","doi":"10.1145/1077603.1077639","DOIUrl":"https://doi.org/10.1145/1077603.1077639","url":null,"abstract":"This paper addresses the problem of power dissipation during the buffer insertion phase of interconnect performance optimization in nanometer scale designs taking all significant parameter variations into account. The relative effect of different device, interconnect and environmental variations on delay and different components of power has been studied. A probabilistic framework to optimize buffer-interconnect designs under variations has been presented and results are compared with those obtained through simple deterministic optimization. Also, statistical models for delay and power under parameter variations have been developed using linear regression techniques. Under statistical analysis, both power and performance of buffer-interconnect designs are shown to degrade with increasing amount of variations. Also, % error in power estimation for power-optimal repeater designs is shown to be significant if variations are not taken into account. Furthermore, it has been shown that due to variations, significantly higher penalties in delay are needed to operate at power levels similar to those under no variations. Finally, the percentage savings in total power for a given penalty in delay are shown to improve with increasing amount of parameter variations.","PeriodicalId":256018,"journal":{"name":"ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005.","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122616958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dynamic voltage/frequency scaling (DVFS) has been shown to be an efficient power/energy reduction technique. Various runtime DVFS policies have been proposed to utilize runtime DVFS opportunities. However, it is hard to know if runtime DVFS opportunities have been fully exploited by a DVFS policy without knowing the upper bounds of possible energy savings. We propose an exact but exponential algorithm to determine the upper bound of energy savings. The algorithm takes into consideration the switching costs, discrete voltage/frequency voltage levels and different program states. We then show a fast linear time heuristic can provide a very close approximate to this bound.
{"title":"Bounds on power savings using runtime dynamic voltage scaling: an exact algorithm and a linear-time heuristic approximation","authors":"Fen Xie, M. Martonosi, S. Malik","doi":"10.1145/1077603.1077670","DOIUrl":"https://doi.org/10.1145/1077603.1077670","url":null,"abstract":"Dynamic voltage/frequency scaling (DVFS) has been shown to be an efficient power/energy reduction technique. Various runtime DVFS policies have been proposed to utilize runtime DVFS opportunities. However, it is hard to know if runtime DVFS opportunities have been fully exploited by a DVFS policy without knowing the upper bounds of possible energy savings. We propose an exact but exponential algorithm to determine the upper bound of energy savings. The algorithm takes into consideration the switching costs, discrete voltage/frequency voltage levels and different program states. We then show a fast linear time heuristic can provide a very close approximate to this bound.","PeriodicalId":256018,"journal":{"name":"ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005.","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121339400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Heat is a main concern for processors in deep sub-micron technologies. The chip temperature is affected by both the power consumption of processor components and the chip layout. Therefore, for thermal-aware design it is crucial to consider the thermal effects of different floorplans during micro-architectural design space exploration. In this paper, we propose a thermal-aware architectural floorplanning framework. With the aid of this framework, an architect can explore both physical and architectural design spaces simultaneously to find an architecture and the corresponding chip layout that maximizes performance under a thermal limitation.
{"title":"Joint exploration of architectural and physical design spaces with thermal consideration","authors":"Yen-Wei Wu, Chia-Lin Yang, Ping-Hung Yuh, Yao-Wen Chang","doi":"10.1145/1077603.1077636","DOIUrl":"https://doi.org/10.1145/1077603.1077636","url":null,"abstract":"Heat is a main concern for processors in deep sub-micron technologies. The chip temperature is affected by both the power consumption of processor components and the chip layout. Therefore, for thermal-aware design it is crucial to consider the thermal effects of different floorplans during micro-architectural design space exploration. In this paper, we propose a thermal-aware architectural floorplanning framework. With the aid of this framework, an architect can explore both physical and architectural design spaces simultaneously to find an architecture and the corresponding chip layout that maximizes performance under a thermal limitation.","PeriodicalId":256018,"journal":{"name":"ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005.","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125976936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The authors presented an on-chip, fully-digital, power-supply control system. The scheme consists of two independent control loops that regulate power supply variations due to semiconductor process spread, temperature, and chip's workload. Smart power-switches working as linear voltage regulators are used to adjust the local power supply. The smart power-switch allows us to keep the global power network unchanged. It offers an integrated standby mode and has a fast dynamic response, i.e. low transition times between voltage steps at the cost of the reduced power conversion efficiency when compared to complex DC-DC converters.
{"title":"On-chip digital power supply control for system-on-chip applications","authors":"M. Meijer, J. P. D. Gyvez, R. Otten","doi":"10.1145/1077603.1077677","DOIUrl":"https://doi.org/10.1145/1077603.1077677","url":null,"abstract":"The authors presented an on-chip, fully-digital, power-supply control system. The scheme consists of two independent control loops that regulate power supply variations due to semiconductor process spread, temperature, and chip's workload. Smart power-switches working as linear voltage regulators are used to adjust the local power supply. The smart power-switch allows us to keep the global power network unchanged. It offers an integrated standby mode and has a fast dynamic response, i.e. low transition times between voltage steps at the cost of the reduced power conversion efficiency when compared to complex DC-DC converters.","PeriodicalId":256018,"journal":{"name":"ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005.","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124067927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Cong, Ashok Jagannathan, Glenn D. Reinman, Y. Tamir
In this paper we study the energy efficiency of SMT and CMP with multiclustering. Through a detailed design space exploration, we show that clustering closes the energy efficiency gap between SMT and CMP at equal performance points. Specifically, we show that the energy efficiency of CMP compared to SMT at a given performance decreases from a maximum of 25% in a monolithic processor case to 6% when the processor resources are clustered. By carefully considering floorplans, we show that this is, in part, enabled by the small energy consumption (less than 3%) of the interconnection buses required for clustering, even with SMT. As the gap narrows, we show that the efficiency of SMT versus CMP depends on the contribution of leakage energy: at lower leakage, the CMP tends to be better than the SMT, while the SMT outperforms the CMP at higher leakage levels. We demonstrate these results over a wide range of performance and machine configurations.
{"title":"Understanding the energy efficiency of SMT and CMP with multiclustering","authors":"J. Cong, Ashok Jagannathan, Glenn D. Reinman, Y. Tamir","doi":"10.1145/1077603.1077616","DOIUrl":"https://doi.org/10.1145/1077603.1077616","url":null,"abstract":"In this paper we study the energy efficiency of SMT and CMP with multiclustering. Through a detailed design space exploration, we show that clustering closes the energy efficiency gap between SMT and CMP at equal performance points. Specifically, we show that the energy efficiency of CMP compared to SMT at a given performance decreases from a maximum of 25% in a monolithic processor case to 6% when the processor resources are clustered. By carefully considering floorplans, we show that this is, in part, enabled by the small energy consumption (less than 3%) of the interconnection buses required for clustering, even with SMT. As the gap narrows, we show that the efficiency of SMT versus CMP depends on the contribution of leakage energy: at lower leakage, the CMP tends to be better than the SMT, while the SMT outperforms the CMP at higher leakage levels. We demonstrate these results over a wide range of performance and machine configurations.","PeriodicalId":256018,"journal":{"name":"ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005.","volume":"400 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124552880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper demonstrates a first-order, linear power estimation model that uses performance counters to estimate run-time CPU and memory power consumption of the Intel PXA255 processor. Our model uses a set of power weights that map hardware performance counter values to processor and memory power consumption. Power weights are derived offline once per processor voltage and frequency configuration using parameter estimation techniques. They can be applied in a dynamic voltage/frequency scaling environment by setting six descriptive parameters. We have tested our model using a wide selection of benchmarks including SPEC2000, Java CDC and Java CLDC programming environments. The accuracy is quite good; average estimated power consumption is within 4% of the measured average CPU power consumption. We believe such power estimation schemes can serve as a foundation for intelligent, power-aware embedded systems that dynamically adapt to the device's power consumption.
{"title":"Power prediction for Intel XScale/spl reg/ processors using performance monitoring unit events","authors":"Gilberto Contreras, M. Martonosi","doi":"10.1145/1077603.1077657","DOIUrl":"https://doi.org/10.1145/1077603.1077657","url":null,"abstract":"This paper demonstrates a first-order, linear power estimation model that uses performance counters to estimate run-time CPU and memory power consumption of the Intel PXA255 processor. Our model uses a set of power weights that map hardware performance counter values to processor and memory power consumption. Power weights are derived offline once per processor voltage and frequency configuration using parameter estimation techniques. They can be applied in a dynamic voltage/frequency scaling environment by setting six descriptive parameters. We have tested our model using a wide selection of benchmarks including SPEC2000, Java CDC and Java CLDC programming environments. The accuracy is quite good; average estimated power consumption is within 4% of the measured average CPU power consumption. We believe such power estimation schemes can serve as a foundation for intelligent, power-aware embedded systems that dynamically adapt to the device's power consumption.","PeriodicalId":256018,"journal":{"name":"ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005.","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128590565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Ma, Janet Roveda, Mohankumar N. Somasundaram, Zongqi Hu
This paper presents an integrated power system for low-power wireless sensor networks with dynamic efficiency optimization technique. By adaptively resizing power transistors and adjusting switching frequency, system efficiency is enhanced significantly. Theoretical analysis is elaborated to support the proposed technique. A prototype integrated power system for self-powered photovoltaic wireless sensing node was designed and simulated with TSMC 0.35/spl mu/m CMOS process. With a power range of 0.5/spl mu/W to 10mW, power efficiency stays above 71%. Tolerance between theoretical and simulated optimal power transistor sizes is less than 6.7%, while that of optimal switching frequencies is less than 5%. The paper gives another solution to minimizing system power in the perspective of power processing.
{"title":"Design and optimization on dynamic power system for self-powered integrated wireless sensing nodes","authors":"D. Ma, Janet Roveda, Mohankumar N. Somasundaram, Zongqi Hu","doi":"10.1145/1077603.1077675","DOIUrl":"https://doi.org/10.1145/1077603.1077675","url":null,"abstract":"This paper presents an integrated power system for low-power wireless sensor networks with dynamic efficiency optimization technique. By adaptively resizing power transistors and adjusting switching frequency, system efficiency is enhanced significantly. Theoretical analysis is elaborated to support the proposed technique. A prototype integrated power system for self-powered photovoltaic wireless sensing node was designed and simulated with TSMC 0.35/spl mu/m CMOS process. With a power range of 0.5/spl mu/W to 10mW, power efficiency stays above 71%. Tolerance between theoretical and simulated optimal power transistor sizes is less than 6.7%, while that of optimal switching frequencies is less than 5%. The paper gives another solution to minimizing system power in the perspective of power processing.","PeriodicalId":256018,"journal":{"name":"ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005.","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116896733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Full-chip verification requires one to check if the power grid is safe, i.e., if the voltage drop on the grid does not exceed a certain threshold. The traditional simulation-based solution to this problem is computationally expensive, because of the large variety of possible circuit behaviors that would need to be simulated; it also has the disadvantage that it requires full knowledge of the details of the circuit attached to the grid, thereby precluding early verification of the grid. We propose a power grid verification technique that can be applied before the complete circuit has been designed and without exact knowledge of the circuit currents. We use current constraints, which are upper bound constraints on the currents that can be drawn from the grid, as a way to capture the uncertainty about the circuit details and activity. Based on this, we propose two solution approaches. One approach gives an upper-bound on the worst-case voltage drop at every node of the grid. Another, less expensive approach, applies a sufficient condition (thus, this becomes a conservative approach) to check if the drop on the grid exceeds a given voltage threshold.
{"title":"Power grid voltage integrity verification","authors":"Maha Nizam, F. Najm, A. Devgan","doi":"10.1145/1077603.1077661","DOIUrl":"https://doi.org/10.1145/1077603.1077661","url":null,"abstract":"Full-chip verification requires one to check if the power grid is safe, i.e., if the voltage drop on the grid does not exceed a certain threshold. The traditional simulation-based solution to this problem is computationally expensive, because of the large variety of possible circuit behaviors that would need to be simulated; it also has the disadvantage that it requires full knowledge of the details of the circuit attached to the grid, thereby precluding early verification of the grid. We propose a power grid verification technique that can be applied before the complete circuit has been designed and without exact knowledge of the circuit currents. We use current constraints, which are upper bound constraints on the currents that can be drawn from the grid, as a way to capture the uncertainty about the circuit details and activity. Based on this, we propose two solution approaches. One approach gives an upper-bound on the worst-case voltage drop at every node of the grid. Another, less expensive approach, applies a sufficient condition (thus, this becomes a conservative approach) to check if the drop on the grid exceeds a given voltage threshold.","PeriodicalId":256018,"journal":{"name":"ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005.","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116231336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Hsu, A. Agarwal, K. Roy, R. Krishnamurthy, S. Borkar
In high performance microprocessors, integer execution cores are one of the hottest thermal spots and peak current/power delivery limiters. This paper describes a dual-supply and dual-threshold optimized 32-bit integer execution ALU and register file loop for 8.3GHz operation in 1.2V, 90nm CMOS technology. Aggressive supply/threshold scaling on the ALU and nominal supply/threshold on the register file enables up to 25% peak energy reduction without sacrificing performance or array bit-cells stability. A hybrid split-output style CVSL sequential level converter at the ALU-register file interface is also described for robust, DC power free dual-V/sub cc/ operation. The proposed sequential occupies 10% smaller area, and saves 11% active leakage power and 14% worst case switching power as compared to conventional CVSL style sequential at the same performance.
{"title":"An 8.3GHz dual supply/threshold optimized 32b integer ALU-register file loop in 90nm CMOS","authors":"S. Hsu, A. Agarwal, K. Roy, R. Krishnamurthy, S. Borkar","doi":"10.1145/1077603.1077630","DOIUrl":"https://doi.org/10.1145/1077603.1077630","url":null,"abstract":"In high performance microprocessors, integer execution cores are one of the hottest thermal spots and peak current/power delivery limiters. This paper describes a dual-supply and dual-threshold optimized 32-bit integer execution ALU and register file loop for 8.3GHz operation in 1.2V, 90nm CMOS technology. Aggressive supply/threshold scaling on the ALU and nominal supply/threshold on the register file enables up to 25% peak energy reduction without sacrificing performance or array bit-cells stability. A hybrid split-output style CVSL sequential level converter at the ALU-register file interface is also described for robust, DC power free dual-V/sub cc/ operation. The proposed sequential occupies 10% smaller area, and saves 11% active leakage power and 14% worst case switching power as compared to conventional CVSL style sequential at the same performance.","PeriodicalId":256018,"journal":{"name":"ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005.","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126816376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}