Pub Date : 2003-09-23DOI: 10.1109/LPE.2003.1231953
F. Gruian, K. Kuchcinski
Energy consumption reduction is today an important design issue for all kinds of digital systems. Offering both flexibility and efficient energy management, variable speed processor architectures are preferred for low energy consumption even in hard real-time systems. For this type of system, the main approach consists in trading speed for lower energy while meeting all deadlines. For tasks with varying execution time, speed scheduling is most efficient if performed at run-time. This paper presents a new ordering technique for such tasks, that reduces the energy consumption resulting from the run-time speed scheduling. Without affecting the real-time behavior, our uncertainty-based scheduling (UBS) is a low complexity but energy-efficient method that can be applied on top of already existent real-time scheduling techniques, such as EDF. These claims are backed up by extensive simulation results accompanied by measurements on a platform based on an Intel i80200 XScale processor.
{"title":"Uncertainty-based scheduling: energy-efficient ordering for tasks with variable execution time [processor scheduling]","authors":"F. Gruian, K. Kuchcinski","doi":"10.1109/LPE.2003.1231953","DOIUrl":"https://doi.org/10.1109/LPE.2003.1231953","url":null,"abstract":"Energy consumption reduction is today an important design issue for all kinds of digital systems. Offering both flexibility and efficient energy management, variable speed processor architectures are preferred for low energy consumption even in hard real-time systems. For this type of system, the main approach consists in trading speed for lower energy while meeting all deadlines. For tasks with varying execution time, speed scheduling is most efficient if performed at run-time. This paper presents a new ordering technique for such tasks, that reduces the energy consumption resulting from the run-time speed scheduling. Without affecting the real-time behavior, our uncertainty-based scheduling (UBS) is a low complexity but energy-efficient method that can be applied on top of already existent real-time scheduling techniques, such as EDF. These claims are backed up by extensive simulation results accompanied by measurements on a platform based on an Intel i80200 XScale processor.","PeriodicalId":355883,"journal":{"name":"Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003. ISLPED '03.","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114283197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-09-23DOI: 10.1109/LPE.2003.1231854
F. Ishihara, F. Sheikh, B. Nikolić
Dual-supply voltage design using a clustered voltage scaling (CVS) scheme is an effective approach to reduce chip power. The optimal CVS design relies on a level converter (LC) implemented in a flip-flop to minimize energy, delay, and area penalties due to level conversion. Novel flip-flops presented in this paper incorporate a half-latch LC and a precharged LC. These flip-flops are optimized in the energy-delay design space to achieve over 30% reduction of energy-delay product and about 10% savings of total power in a CVS design as compared to the conventional flipflop. These benefits are accompanied by 24% robustness improvement and 18% layout area reduction.
{"title":"Level conversion for dual-supply systems [low power logic IC design]","authors":"F. Ishihara, F. Sheikh, B. Nikolić","doi":"10.1109/LPE.2003.1231854","DOIUrl":"https://doi.org/10.1109/LPE.2003.1231854","url":null,"abstract":"Dual-supply voltage design using a clustered voltage scaling (CVS) scheme is an effective approach to reduce chip power. The optimal CVS design relies on a level converter (LC) implemented in a flip-flop to minimize energy, delay, and area penalties due to level conversion. Novel flip-flops presented in this paper incorporate a half-latch LC and a precharged LC. These flip-flops are optimized in the energy-delay design space to achieve over 30% reduction of energy-delay product and about 10% savings of total power in a CVS design as compared to the conventional flipflop. These benefits are accompanied by 24% robustness improvement and 18% layout area reduction.","PeriodicalId":355883,"journal":{"name":"Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003. ISLPED '03.","volume":"5 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124319199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-09-23DOI: 10.1109/LPE.2003.1231933
D. Chaver, L. Piñuel, M. Prieto, F. Tirado, M. Huang
High-end processors typically incorporate complex branch predictors consisting of many large structures that together consume a notable fraction of total chip power (more than 10% in some cases). Depending on the applications, some of these resources may remain underused for long periods of time. We propose a methodology to reduce the energy consumption of the branch predictor by characterizing prediction demand using profiling and dynamically adjusting predictor resources accordingly. Specifically, we disable components of the hybrid direction predictor and resize the branch target buffer. Detailed simulations show that this approach reduces the energy consumption in the branch predictor by an average of 72% and up to 89% with virtually no impact on prediction accuracy and performance.
{"title":"Branch prediction on demand: an energy-efficient solution [microprocessor architecture]","authors":"D. Chaver, L. Piñuel, M. Prieto, F. Tirado, M. Huang","doi":"10.1109/LPE.2003.1231933","DOIUrl":"https://doi.org/10.1109/LPE.2003.1231933","url":null,"abstract":"High-end processors typically incorporate complex branch predictors consisting of many large structures that together consume a notable fraction of total chip power (more than 10% in some cases). Depending on the applications, some of these resources may remain underused for long periods of time. We propose a methodology to reduce the energy consumption of the branch predictor by characterizing prediction demand using profiling and dynamically adjusting predictor resources accordingly. Specifically, we disable components of the hybrid direction predictor and resize the branch target buffer. Detailed simulations show that this approach reduces the energy consumption in the branch predictor by an average of 72% and up to 89% with virtually no impact on prediction accuracy and performance.","PeriodicalId":355883,"journal":{"name":"Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003. ISLPED '03.","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132393492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper addresses the problem of voltage scheduling in unpredictable situations. The voltage scheduling problem assigns voltages to operations such that the power is minimized under a clock cycle constraint. In the presence of unpredictabilities, meeting the clock constraint cannot be guaranteed. This paper proposes a novel risk management based technique to solve this problem. The risk management paradigm assigns a quantified value to the amount of risk the designer is willing to take on the clock cycle constraint. The algorithm then assigns voltages in order to meet the expected value of clock cycle constraint while keeping the maximum delay within the specified 'risk' and minimizing the power.
{"title":"Voltage scheduling under unpredictabilities: a risk management paradigm [logic design]","authors":"A. Davoodi, Ankur Srivastava","doi":"10.1145/1059876.1059884","DOIUrl":"https://doi.org/10.1145/1059876.1059884","url":null,"abstract":"This paper addresses the problem of voltage scheduling in unpredictable situations. The voltage scheduling problem assigns voltages to operations such that the power is minimized under a clock cycle constraint. In the presence of unpredictabilities, meeting the clock constraint cannot be guaranteed. This paper proposes a novel risk management based technique to solve this problem. The risk management paradigm assigns a quantified value to the amount of risk the designer is willing to take on the clock cycle constraint. The algorithm then assigns voltages in order to meet the expected value of clock cycle constraint while keeping the maximum delay within the specified 'risk' and minimizing the power.","PeriodicalId":355883,"journal":{"name":"Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003. ISLPED '03.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114171263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-09-23DOI: 10.1109/LPE.2003.1231885
Jung-Hoon Lee, G. Park, Sung-Bae Park, Shin-Dug Kim
We present a selective filter-bank translation lookaside buffer (TLB) system with low power consumption for embedded processors. The proposed TLB is constructed as multiple banks with a small two-bank buffer, called a filter-bank buffer, located above its associated bank. Either a filter-bank buffer or a main bank TLB can be selectively accessed, based on two bits in the filter-bank buffer. Energy savings are achieved by reducing the number of entries accessed at a time, by using filtering and the bank mechanism. The overhead of the proposed TLB turns out to be negligible compared with other hierarchical structures. Simulation results show that the energy/spl times/delay product can be reduced by about 88% compared with a fully -associative TLB, 75% with respect to a filter-TLB, and 51% relative to a banked-filter TLB.
{"title":"A selective filter-bank TLB system [embedded processor MMU for low power]","authors":"Jung-Hoon Lee, G. Park, Sung-Bae Park, Shin-Dug Kim","doi":"10.1109/LPE.2003.1231885","DOIUrl":"https://doi.org/10.1109/LPE.2003.1231885","url":null,"abstract":"We present a selective filter-bank translation lookaside buffer (TLB) system with low power consumption for embedded processors. The proposed TLB is constructed as multiple banks with a small two-bank buffer, called a filter-bank buffer, located above its associated bank. Either a filter-bank buffer or a main bank TLB can be selectively accessed, based on two bits in the filter-bank buffer. Energy savings are achieved by reducing the number of entries accessed at a time, by using filtering and the bank mechanism. The overhead of the proposed TLB turns out to be negligible compared with other hierarchical structures. Simulation results show that the energy/spl times/delay product can be reduced by about 88% compared with a fully -associative TLB, 75% with respect to a filter-TLB, and 51% relative to a banked-filter TLB.","PeriodicalId":355883,"journal":{"name":"Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003. ISLPED '03.","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127955774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-09-23DOI: 10.1109/LPE.2003.1231836
J. Carballo, J. Burns, Seung-Moon Yoo, I. Vo, V. R. Norman
Supply-voltage reduction is a known technique for reducing CMOS active power. We propose a semi-custom voltage-island approach based on internal regulation and selective custom design. This approach enables transparent embedding, since no additional external power supply is needed. We apply the approach to high-speed serial links, and we show that high performance is retained through targeted application of custom circuit and logic design. A chip is presented that evaluates the presented approach on a 3000 gate 3.2 Gbps multi-protocol serial-link receiver logic core. When reducing the supply from 1.2 V to 0.95 V, the chip demonstrates power savings of over 25%.
{"title":"A semi-custom voltage-island technique and its application to high-speed serial links [CMOS active power reduction]","authors":"J. Carballo, J. Burns, Seung-Moon Yoo, I. Vo, V. R. Norman","doi":"10.1109/LPE.2003.1231836","DOIUrl":"https://doi.org/10.1109/LPE.2003.1231836","url":null,"abstract":"Supply-voltage reduction is a known technique for reducing CMOS active power. We propose a semi-custom voltage-island approach based on internal regulation and selective custom design. This approach enables transparent embedding, since no additional external power supply is needed. We apply the approach to high-speed serial links, and we show that high performance is retained through targeted application of custom circuit and logic design. A chip is presented that evaluates the presented approach on a 3000 gate 3.2 Gbps multi-protocol serial-link receiver logic core. When reducing the supply from 1.2 V to 0.95 V, the chip demonstrates power savings of over 25%.","PeriodicalId":355883,"journal":{"name":"Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003. ISLPED '03.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128746042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dynamic Voltage Scaling (DVS), which adjusts the clock speed and supply voltage dynamically, is an effective technique in reducing the energy consumption of embedded real-time systems. Unlike dynamic-priority real-time scheduling for which highly effective DVS algorithms are available, existing fixed-priority DVS algorithms are less effective in energy efficiency because they are based on inefficient slack estimation methods. This paper describes an efficient on-line slack estimation heuristic for the rate-monotonic (RM) scheduling. The proposed heuristic estimates the slack times using the short term work-demand analysis. The DVS algorithm,based on-the proposed heuristic is also presented. Experimental results show that the proposed DVS algorithm reduces the energy consumption by 25/spl sim/42% over the existing rate-monotonic DVS algorithms.
{"title":"Dynamic voltage scaling algorithm for fixed-priority real-time systems using work-demand analysis","authors":"Woonseok Kim, Jihong Kim, S. Min","doi":"10.1145/871506.871605","DOIUrl":"https://doi.org/10.1145/871506.871605","url":null,"abstract":"Dynamic Voltage Scaling (DVS), which adjusts the clock speed and supply voltage dynamically, is an effective technique in reducing the energy consumption of embedded real-time systems. Unlike dynamic-priority real-time scheduling for which highly effective DVS algorithms are available, existing fixed-priority DVS algorithms are less effective in energy efficiency because they are based on inefficient slack estimation methods. This paper describes an efficient on-line slack estimation heuristic for the rate-monotonic (RM) scheduling. The proposed heuristic estimates the slack times using the short term work-demand analysis. The DVS algorithm,based on-the proposed heuristic is also presented. Experimental results show that the proposed DVS algorithm reduces the energy consumption by 25/spl sim/42% over the existing rate-monotonic DVS algorithms.","PeriodicalId":355883,"journal":{"name":"Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003. ISLPED '03.","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115654085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Multi-threshold CMOS is a popular technique for reducing standby leakage power with low delay overhead. MTCMOS designs typically use large sleep devices to reduce standby leakage at the block level. We provide a formal examination of sneak leakage paths and a design methodology that enables gate-level insertion of sleep devices for sequential and combinational circuits. A fabricated 0.13 /spl mu/m, dual V/sub T/ test chip employs this methodology to implement a low-power FPGA core with gate-level sleep FETs and over 8/spl times/ measured standby current reduction. The methodology allows local sleep regions that reduce leakage in active configurable logic blocks (CLBs) by up to 2.2/spl times/ (measured) for some CLB configurations.
{"title":"Design methodology for fine-grained leakage control in MTCMOS","authors":"B. Calhoun, Frank Honoré, A. Chandrakasan","doi":"10.1145/871506.871535","DOIUrl":"https://doi.org/10.1145/871506.871535","url":null,"abstract":"Multi-threshold CMOS is a popular technique for reducing standby leakage power with low delay overhead. MTCMOS designs typically use large sleep devices to reduce standby leakage at the block level. We provide a formal examination of sneak leakage paths and a design methodology that enables gate-level insertion of sleep devices for sequential and combinational circuits. A fabricated 0.13 /spl mu/m, dual V/sub T/ test chip employs this methodology to implement a low-power FPGA core with gate-level sleep FETs and over 8/spl times/ measured standby current reduction. The methodology allows local sleep regions that reduce leakage in active configurable logic blocks (CLBs) by up to 2.2/spl times/ (measured) for some CLB configurations.","PeriodicalId":355883,"journal":{"name":"Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003. ISLPED '03.","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115767774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eun Jung Kim, K. H. Yum, G. Link, N. Vijaykrishnan, M. Kandemir, M. J. Irwin, Mazin S. Yousif, C. Das
Designing energy-efficient clusters has recently become an important concern to make these systems economically attractive for many applications. Since the links and switch buffers consume the major portion of the power budget of the cluster, the focus of this paper is to optimize the energy consumption in these two components. To minimize power in the links, we propose a novel dynamic link shutdown (DLS) technique. The DLS technique makes use of an appropriate adaptive routing algorithm to shutdown the links intelligently. We also present an optimized buffer design for reducing leakage energy. Our analysis on different networks using a complete system simulator reveals that the proposed DLS technique can provide optimized performance-energy behavior (up to 40% energy savings with less than 5% performance degradation in the best case) for the cluster interconnects.
{"title":"Energy optimization techniques in cluster interconnects","authors":"Eun Jung Kim, K. H. Yum, G. Link, N. Vijaykrishnan, M. Kandemir, M. J. Irwin, Mazin S. Yousif, C. Das","doi":"10.1145/871506.871620","DOIUrl":"https://doi.org/10.1145/871506.871620","url":null,"abstract":"Designing energy-efficient clusters has recently become an important concern to make these systems economically attractive for many applications. Since the links and switch buffers consume the major portion of the power budget of the cluster, the focus of this paper is to optimize the energy consumption in these two components. To minimize power in the links, we propose a novel dynamic link shutdown (DLS) technique. The DLS technique makes use of an appropriate adaptive routing algorithm to shutdown the links intelligently. We also present an optimized buffer design for reducing leakage energy. Our analysis on different networks using a complete system simulator reveals that the proposed DLS technique can provide optimized performance-energy behavior (up to 40% energy savings with less than 5% performance degradation in the best case) for the cluster interconnects.","PeriodicalId":355883,"journal":{"name":"Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003. ISLPED '03.","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122768173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ever shrinking device sizes and innovative micro-architectural and circuit design techniques have made it possible to have multi-million transistor systems running at multi-Gigahertz speeds. However, such a tremendous computational capability comes at a high price in terms of power consumption and design effort in distributing a global clock signal across the chip. One of the most promising strategies that addresses these issues is the globally asynchronous, locally synchronous (GALS) design style where multiple domains are governed by different, locally generated clocks. Due to its inherent complexity, a possible driver application for such a design style is the case of superscalar, out-of-order processors. While micro-architectural evaluations for GALS microprocessors have been made available recently, no concrete implementations have been analyzed in a detailed way. In this paper we propose a mixed-clock issue queue design for high-end, out-of-order superscalar processors, able to sustain different clock rates and speeds for the incoming and out going traffic. We compare and contrast our implementation with existing synchronous versions of issue queues used stand-alone or in conjunction with mixed-clock FIFOs for inter-domain synchronization.
{"title":"A mixed-clock issue queue design for globally asynchronous, locally synchronous processor cores","authors":"V. Rapaka, Diana Marculescu","doi":"10.1145/871506.871600","DOIUrl":"https://doi.org/10.1145/871506.871600","url":null,"abstract":"Ever shrinking device sizes and innovative micro-architectural and circuit design techniques have made it possible to have multi-million transistor systems running at multi-Gigahertz speeds. However, such a tremendous computational capability comes at a high price in terms of power consumption and design effort in distributing a global clock signal across the chip. One of the most promising strategies that addresses these issues is the globally asynchronous, locally synchronous (GALS) design style where multiple domains are governed by different, locally generated clocks. Due to its inherent complexity, a possible driver application for such a design style is the case of superscalar, out-of-order processors. While micro-architectural evaluations for GALS microprocessors have been made available recently, no concrete implementations have been analyzed in a detailed way. In this paper we propose a mixed-clock issue queue design for high-end, out-of-order superscalar processors, able to sustain different clock rates and speeds for the incoming and out going traffic. We compare and contrast our implementation with existing synchronous versions of issue queues used stand-alone or in conjunction with mixed-clock FIFOs for inter-domain synchronization.","PeriodicalId":355883,"journal":{"name":"Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003. ISLPED '03.","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129630429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}