The power consumption of a sequential circuit can be reduced by decomposing it into subcircuits which can be turned off when inactive. Power can also be reduced by careful state encoding. Modeling a given circuit as a finite-state machine, we formulate its decomposition into submachines as an integer linear programming (ILP) problem, and automatically generate the ILP model with power minimization as the objective. A simple, but powerful state encoding method is used for the submachines to further reduce power consumption. We present experimental results which show that circuits designed by our approach consume 30% to 90% less power than conventional circuits.
{"title":"ILP-based optimization of sequential circuits for low power","authors":"Feng Gao, J. Hayes","doi":"10.1145/871506.871542","DOIUrl":"https://doi.org/10.1145/871506.871542","url":null,"abstract":"The power consumption of a sequential circuit can be reduced by decomposing it into subcircuits which can be turned off when inactive. Power can also be reduced by careful state encoding. Modeling a given circuit as a finite-state machine, we formulate its decomposition into submachines as an integer linear programming (ILP) problem, and automatically generate the ILP model with power minimization as the objective. A simple, but powerful state encoding method is used for the submachines to further reduce power consumption. We present experimental results which show that circuits designed by our approach consume 30% to 90% less power than conventional circuits.","PeriodicalId":355883,"journal":{"name":"Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003. ISLPED '03.","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133589643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Enabled by the continuous advancement in fabrication technology, present day synchronous microprocessors include more than 100 million transistors and have clock speeds well in excess of the 1GHz mark. Distributing a low-skew clock signal in this frequency range to all areas of a large chip is a task of growing complexity. As a solution to this problem, designers have recently suggested the use of frequency islands that are locally clocked and externally communicate using mixed timing communication schemes. Such a design style fits nicely the recently proposed concept of voltage islands that, in addition, can potentially enable fine grain dynamic power management. This paper proposes a design exploration framework for application-adaptive multiple clock processors which provides the means for analyzing and identifying the right inter-domain communication scheme and the proper granularity for the choice of voltage/frequency. In addition, the proposed design exploration framework allows for comparative analysis of newly proposed or existing application-driven dynamic power management strategies. Such a design exploration framework and accompanying results can help designers and computer architects in choosing the right design strategy for achieving better power-performance trade-offs in multiple clock high-end processors.
{"title":"A critical analysis of application-adaptive multiple clock processors","authors":"Emil Talpes, Diana Marculescu","doi":"10.1145/871506.871576","DOIUrl":"https://doi.org/10.1145/871506.871576","url":null,"abstract":"Enabled by the continuous advancement in fabrication technology, present day synchronous microprocessors include more than 100 million transistors and have clock speeds well in excess of the 1GHz mark. Distributing a low-skew clock signal in this frequency range to all areas of a large chip is a task of growing complexity. As a solution to this problem, designers have recently suggested the use of frequency islands that are locally clocked and externally communicate using mixed timing communication schemes. Such a design style fits nicely the recently proposed concept of voltage islands that, in addition, can potentially enable fine grain dynamic power management. This paper proposes a design exploration framework for application-adaptive multiple clock processors which provides the means for analyzing and identifying the right inter-domain communication scheme and the proper granularity for the choice of voltage/frequency. In addition, the proposed design exploration framework allows for comparative analysis of newly proposed or existing application-driven dynamic power management strategies. Such a design exploration framework and accompanying results can help designers and computer architects in choosing the right design strategy for achieving better power-performance trade-offs in multiple clock high-end processors.","PeriodicalId":355883,"journal":{"name":"Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003. ISLPED '03.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130780134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Power dissipation is unevenly distributed in modern microprocessors leading to localized hot spots with significantly greater die temperature than surrounding cooler regions. Excessive junction temperature reduces reliability and can lead to catastrophic failure. We examine the use of activity migration which reduces peak junction temperature by moving computation between multiple replicated units. Using a thermal model that includes the temperature dependence of leakage power, we show that sustainable power dissipation can be increased by nearly a factor of two for a given junction temperature limit. Alternatively, peak die temperature can be reduced by 12.4/spl deg/C at the same clock frequency. The model predicts that migration intervals of around 20-200 /spl mu/s are required to achieve the maximum sustainable power increase. We evaluate several different forms of replication and migration policy control.
{"title":"Reducing power density through activity migration","authors":"Seongmoo Heo, K. Barr, K. Asanović","doi":"10.1145/871506.871561","DOIUrl":"https://doi.org/10.1145/871506.871561","url":null,"abstract":"Power dissipation is unevenly distributed in modern microprocessors leading to localized hot spots with significantly greater die temperature than surrounding cooler regions. Excessive junction temperature reduces reliability and can lead to catastrophic failure. We examine the use of activity migration which reduces peak junction temperature by moving computation between multiple replicated units. Using a thermal model that includes the temperature dependence of leakage power, we show that sustainable power dissipation can be increased by nearly a factor of two for a given junction temperature limit. Alternatively, peak die temperature can be reduced by 12.4/spl deg/C at the same clock frequency. The model predicts that migration intervals of around 20-200 /spl mu/s are required to achieve the maximum sustainable power increase. We evaluate several different forms of replication and migration policy control.","PeriodicalId":355883,"journal":{"name":"Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003. ISLPED '03.","volume":"609 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134332440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Ziesler, Joohee Kim, V. Sathe, M. Papaefthymiou
We have recently designed, fabricated, and successfully tested an experimental chip that validates a novel method for reducing clock dissipation through energy recovery. Our approach includes a single-phase sinusoidal clock signal, an L-C resonant sinusoidal clock generator, and an energy recovering flip-flop. Our chip comprises a dual-mode ASIC with two independent clock systems, one conventional and one energy recovering, and was fabricated in a 0.25 /spl mu/m bulk CMOS process. The ASIC computes a pipelined discrete wavelet transform with self-test and contains over 3500 gates. We have verified correct functionality and obtained power measurements in both modes of operation for frequencies up to 225 MHz. In the energy recovering mode, our power measurements account for all of the dissipation factors, including the operation of the integrated resonant clock generator, and show a net energy savings over the conventional mode of operation. For example, at 115 MHz, measured dissipation is between 60% and 75% of the conventional mode, depending on primary input activity. To our knowledge, this is the first ever published account of a direct experimentally-measured comparison between a complete energy recovering ASIC chip and its conventional implementation correctly operating in silicon at frequencies exceeding 100 MHz.
{"title":"A 225 MHz resonant clocked ASIC chip","authors":"C. Ziesler, Joohee Kim, V. Sathe, M. Papaefthymiou","doi":"10.1145/871506.871523","DOIUrl":"https://doi.org/10.1145/871506.871523","url":null,"abstract":"We have recently designed, fabricated, and successfully tested an experimental chip that validates a novel method for reducing clock dissipation through energy recovery. Our approach includes a single-phase sinusoidal clock signal, an L-C resonant sinusoidal clock generator, and an energy recovering flip-flop. Our chip comprises a dual-mode ASIC with two independent clock systems, one conventional and one energy recovering, and was fabricated in a 0.25 /spl mu/m bulk CMOS process. The ASIC computes a pipelined discrete wavelet transform with self-test and contains over 3500 gates. We have verified correct functionality and obtained power measurements in both modes of operation for frequencies up to 225 MHz. In the energy recovering mode, our power measurements account for all of the dissipation factors, including the operation of the integrated resonant clock generator, and show a net energy savings over the conventional mode of operation. For example, at 115 MHz, measured dissipation is between 60% and 75% of the conventional mode, depending on primary input activity. To our knowledge, this is the first ever published account of a direct experimentally-measured comparison between a complete energy recovering ASIC chip and its conventional implementation correctly operating in silicon at frequencies exceeding 100 MHz.","PeriodicalId":355883,"journal":{"name":"Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003. ISLPED '03.","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114702518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-08-25DOI: 10.1109/LPE.2003.1231919
Alice Wang, A. Chandrakasan
Energy-aware design is highly desirable for systems that encounter a wide diversity of operating scenarios. This is in contrast to traditional low power design for the worst case scenario, which may not be globally energy efficient. Energy-aware design focuses on enabling architectures which scale down-energy as quality requirements are relaxed. A new energy-scalable system design methodology is proposed for a Real-Valued FFT processor which supports variable bit precision (8 and 16-bit precision) and variable FFT length (128-512 point). Two energy-aware architectures, Ensemble of Point Solutions method and Reuse of Point Solutions method, are described and evaluated. Simulated and measured results show a 66% energy savings for 8-bit datapath and 52% savings for 128-point FFT length over a non-scalable approach.
{"title":"Energy-aware architectures for a Real-Valued FFT implementation","authors":"Alice Wang, A. Chandrakasan","doi":"10.1109/LPE.2003.1231919","DOIUrl":"https://doi.org/10.1109/LPE.2003.1231919","url":null,"abstract":"Energy-aware design is highly desirable for systems that encounter a wide diversity of operating scenarios. This is in contrast to traditional low power design for the worst case scenario, which may not be globally energy efficient. Energy-aware design focuses on enabling architectures which scale down-energy as quality requirements are relaxed. A new energy-scalable system design methodology is proposed for a Real-Valued FFT processor which supports variable bit precision (8 and 16-bit precision) and variable FFT length (128-512 point). Two energy-aware architectures, Ensemble of Point Solutions method and Reuse of Point Solutions method, are described and evaluated. Simulated and measured results show a 66% energy savings for 8-bit datapath and 52% savings for 128-point FFT length over a non-scalable approach.","PeriodicalId":355883,"journal":{"name":"Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003. ISLPED '03.","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124233952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-08-25DOI: 10.1109/LPE.2003.1231910
M. Yavari, O. Shoaei
This paper presents a new fully differential operational transconductance amplifier (OTA) for low-voltage and fast-settling switched-capacitor circuits in digital CMOS technology. The proposed two-stage OTA is a hybrid class A/AB that combines a folded cascode as the first stage with active current mirrors as the second stage. It employs a hybrid cascode compensation scheme, merged Ahuja and improved Ahuja style compensations, for fast settling.
{"title":"Low-voltage low-power fast-settling CMOS operational transconductance amplifiers for switched-capacitor applications","authors":"M. Yavari, O. Shoaei","doi":"10.1109/LPE.2003.1231910","DOIUrl":"https://doi.org/10.1109/LPE.2003.1231910","url":null,"abstract":"This paper presents a new fully differential operational transconductance amplifier (OTA) for low-voltage and fast-settling switched-capacitor circuits in digital CMOS technology. The proposed two-stage OTA is a hybrid class A/AB that combines a folded cascode as the first stage with active current mirrors as the second stage. It employs a hybrid cascode compensation scheme, merged Ahuja and improved Ahuja style compensations, for fast settling.","PeriodicalId":355883,"journal":{"name":"Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003. ISLPED '03.","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122656790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The memory subsystem, including address translations and cache accesses, consumes a major portion of the overall energy on a processor. In this paper, we address the memory energy issues by using a streamlined architectural partitioning technique that effectively reduces energy consumption in the memory subsystem without compromising performance. It is achieved by decoupling the d-TLB lookups and the data cache accesses, based on the semantic regions defined by programming languages and software convention, into discrete reference substreams - stack, global static, and heap. Their unique access behaviors and locality characteristics are analyzed and exploited for power reduction. Our results show that an average of 35% energy can be reduced in the d-TLB and the data cache. Furthermore, an average of 46% energy can be saved by selectively multi-porting the semantic-aware d-TLBs and data caches against their monolithic counterparts.
{"title":"Energy efficient D-TLB and data cache using semantic-aware multilateral partitioning","authors":"H. Lee, C. Ballapuram","doi":"10.1145/871506.871583","DOIUrl":"https://doi.org/10.1145/871506.871583","url":null,"abstract":"The memory subsystem, including address translations and cache accesses, consumes a major portion of the overall energy on a processor. In this paper, we address the memory energy issues by using a streamlined architectural partitioning technique that effectively reduces energy consumption in the memory subsystem without compromising performance. It is achieved by decoupling the d-TLB lookups and the data cache accesses, based on the semantic regions defined by programming languages and software convention, into discrete reference substreams - stack, global static, and heap. Their unique access behaviors and locality characteristics are analyzed and exploited for power reduction. Our results show that an average of 35% energy can be reduced in the d-TLB and the data cache. Furthermore, an average of 46% energy can be saved by selectively multi-porting the semantic-aware d-TLBs and data caches against their monolithic counterparts.","PeriodicalId":355883,"journal":{"name":"Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003. ISLPED '03.","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128144128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Traditional pulldown comparators that are used to implement associative addressing logic in superscalar microprocessors dissipate energy on a mismatch in any bit position in the comparands. As mismatches occur much more frequently than matches in many situations, such circuits are extremely energy-inefficient. In recognition of this inefficiency, a series of dissipate-on-match comparator designs have been proposed to address the power considerations. These designs, however, are limited to at most 8 bit long arguments. In this paper, we examine the designs of energy-efficient comparators capable of comparing arguments as long as 32 bits in size. Such long comparands are routinely used in the load-store queues, caches, BTBs and TLBs. We use the actual layout data and the realistic bit patterns of the comparands (obtained from the simulated execution of SPEC 2000 benchmarks) to show the energy impact from the use of the new comparators. In general, a non-trivial combination of traditional and dissipate-on-match 8 bit comparator blocks represents the most energy-efficient and fastest solution. As an example of this general approach, we show how fast and energy-efficient comparators can be designed for comparing addresses within the load-store queue of a superscalar processor.
{"title":"Power efficient comparators for long arguments in superscalar processors","authors":"D. Ponomarev, Gürhan Küçük, O. Ergin, K. Ghose","doi":"10.1145/871506.871601","DOIUrl":"https://doi.org/10.1145/871506.871601","url":null,"abstract":"Traditional pulldown comparators that are used to implement associative addressing logic in superscalar microprocessors dissipate energy on a mismatch in any bit position in the comparands. As mismatches occur much more frequently than matches in many situations, such circuits are extremely energy-inefficient. In recognition of this inefficiency, a series of dissipate-on-match comparator designs have been proposed to address the power considerations. These designs, however, are limited to at most 8 bit long arguments. In this paper, we examine the designs of energy-efficient comparators capable of comparing arguments as long as 32 bits in size. Such long comparands are routinely used in the load-store queues, caches, BTBs and TLBs. We use the actual layout data and the realistic bit patterns of the comparands (obtained from the simulated execution of SPEC 2000 benchmarks) to show the energy impact from the use of the new comparators. In general, a non-trivial combination of traditional and dissipate-on-match 8 bit comparator blocks represents the most energy-efficient and fastest solution. As an example of this general approach, we show how fast and energy-efficient comparators can be designed for comparing addresses within the load-store queue of a superscalar processor.","PeriodicalId":355883,"journal":{"name":"Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003. ISLPED '03.","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130351239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-08-25DOI: 10.1109/LPE.2003.1231846
C. Neau, K. Roy
We present techniques to determine the optimal body bias (forward or reverse) to minimize leakage current and compensate process variations in scaled CMOS technologies. A circuit trades off sub-threshold leakage with band-to-band tunneling leakage at the source/drain junctions to determine the optimal substrate bias for different technology generations and under process variations. Using optimal body bias results in 43% and 42% savings in leakage for predictive 70 nm and 50 nm NMOS devices, respectively. This technique also reduces the effects of die-to-die and intra-die process variations in transistor length and supply voltage by 43% and 60%, respectively, in 50 nm NMOS devices, resulting in improved yield.
{"title":"Optimal body bias selection for leakage improvement and process compensation over different technology generations","authors":"C. Neau, K. Roy","doi":"10.1109/LPE.2003.1231846","DOIUrl":"https://doi.org/10.1109/LPE.2003.1231846","url":null,"abstract":"We present techniques to determine the optimal body bias (forward or reverse) to minimize leakage current and compensate process variations in scaled CMOS technologies. A circuit trades off sub-threshold leakage with band-to-band tunneling leakage at the source/drain junctions to determine the optimal substrate bias for different technology generations and under process variations. Using optimal body bias results in 43% and 42% savings in leakage for predictive 70 nm and 50 nm NMOS devices, respectively. This technique also reduces the effects of die-to-die and intra-die process variations in transistor length and supply voltage by 43% and 60%, respectively, in 50 nm NMOS devices, resulting in improved yield.","PeriodicalId":355883,"journal":{"name":"Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003. ISLPED '03.","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127289215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We introduce and analyze the ground bounce due to power mode transition in power gating structures. To reduce the ground bounce, we propose novel power gating structures in which sleep transistors are turned on in a non-uniform stepwise manner. Our power gating structures reduce the magnitude of peak current and voltage glitches in the power distribution network as well as the minimum time required to stabilize power and ground. Experimental simulation results with PowerSpice fixtured in a package model demonstrate the effectiveness of the proposed power gate switching noise reduction techniques.
{"title":"Understanding and minimizing ground bounce during mode transition of power gating structures","authors":"Suhwan Kim, S. Kosonocky, D. Knebel","doi":"10.1145/871506.871515","DOIUrl":"https://doi.org/10.1145/871506.871515","url":null,"abstract":"We introduce and analyze the ground bounce due to power mode transition in power gating structures. To reduce the ground bounce, we propose novel power gating structures in which sleep transistors are turned on in a non-uniform stepwise manner. Our power gating structures reduce the magnitude of peak current and voltage glitches in the power distribution network as well as the minimum time required to stabilize power and ground. Experimental simulation results with PowerSpice fixtured in a package model demonstrate the effectiveness of the proposed power gate switching noise reduction techniques.","PeriodicalId":355883,"journal":{"name":"Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003. ISLPED '03.","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130612741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}