This paper presents a novel power optimization technique for ultra-low power (ULP) RFICs. A new figure of merit, namely the gm fT- to-current ratio (gmfT/ID), is defined for a MOS transistor, which accounts for both the unity-gain frequency and current consumption. It is demonstrated both analytically and experimentally that the gmfT/ID reaches its maximum value in moderate inversion region. Next, using the proposed method, a power optimized common-gate low-noise amplifier (LNA) with active load has been designed and fabricated in a CMOS 0.18mum process operating at 950MHz. Measurement results show a noise-figure (NF) of 4.9dB and a small signal gain of 15.6dB with a record-breaking power dissipation of only 100muW
{"title":"A Novel Power Optimization Technique for Ultra-Low Power RFICs","authors":"A. Shameli, P. Heydari","doi":"10.1145/1165573.1165639","DOIUrl":"https://doi.org/10.1145/1165573.1165639","url":null,"abstract":"This paper presents a novel power optimization technique for ultra-low power (ULP) RFICs. A new figure of merit, namely the g<sub>m </sub>f<sub>T</sub>- to-current ratio (g<sub>m</sub>f<sub>T</sub>/I<sub>D</sub>), is defined for a MOS transistor, which accounts for both the unity-gain frequency and current consumption. It is demonstrated both analytically and experimentally that the g<sub>m</sub>f<sub>T</sub>/I<sub>D</sub> reaches its maximum value in moderate inversion region. Next, using the proposed method, a power optimized common-gate low-noise amplifier (LNA) with active load has been designed and fabricated in a CMOS 0.18mum process operating at 950MHz. Measurement results show a noise-figure (NF) of 4.9dB and a small signal gain of 15.6dB with a record-breaking power dissipation of only 100muW","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125419060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dynamic instruction scheduling logic is quite complex and dissipates significant energy in microprocessors that support superscalar and out-of-order execution. We propose a novel microarchitectural technique to reduce the complexity and energy consumption of the dynamic instruction scheduling logic. The proposed method groups several instructions as a single issue unit and reduces the required number of ports and the size of the structure for dispatch, wakeup, select, and issue. The present paper describes the microarchitecture mechanisms and shows evaluation results for energy savings and performance. These results reveal that the proposed technique can greatly reduce energy with almost no performance degradation, compared to the conventional dynamic instruction scheduling logic
{"title":"Energy-Efficient Dynamic Instruction Scheduling Logic through Instruction Grouping","authors":"Hiroshi Sasaki, Masaaki Kondo, Hiroshi Nakamura","doi":"10.1145/1165573.1165585","DOIUrl":"https://doi.org/10.1145/1165573.1165585","url":null,"abstract":"Dynamic instruction scheduling logic is quite complex and dissipates significant energy in microprocessors that support superscalar and out-of-order execution. We propose a novel microarchitectural technique to reduce the complexity and energy consumption of the dynamic instruction scheduling logic. The proposed method groups several instructions as a single issue unit and reduces the required number of ports and the size of the structure for dispatch, wakeup, select, and issue. The present paper describes the microarchitecture mechanisms and shows evaluation results for energy savings and performance. These results reveal that the proposed technique can greatly reduce energy with almost no performance degradation, compared to the conventional dynamic instruction scheduling logic","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132365737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents an energy-conservation method for multiple disks and their cache memory. Our method periodically resizes the cache memory and controls the rotation speeds under performance constraints. The cache memory stores the data from the disks for reuse. Enlarging the cache memory reduces disk accesses and disk utilization. This allows the disks to reduce their speeds and conserve energy because the disks' power consumption is quadratic to their speeds. However, the cache memory itself consumes power to retain data. Shrinking cache memory can save memory power while increasing disk accesses and degrading performance. Choosing proper cache sizes and rotation speeds can reduce the energy consumption of both memory and disks with satisfactory performance. We model cache resizing and speed setting as an optimization problem with minimizing the power consumption as objective and limiting disk utilization as constraints. We compare our method with the methods resizing cache based on request rates. The simulation results show that our method achieves better energy-savings while limiting disk access latency
{"title":"Power Reduction of Multiple Disks Using Dynamic Cache Resizing and Speed Control","authors":"Le Cai, Yung-Hsiang Lu","doi":"10.1145/1165573.1165617","DOIUrl":"https://doi.org/10.1145/1165573.1165617","url":null,"abstract":"This paper presents an energy-conservation method for multiple disks and their cache memory. Our method periodically resizes the cache memory and controls the rotation speeds under performance constraints. The cache memory stores the data from the disks for reuse. Enlarging the cache memory reduces disk accesses and disk utilization. This allows the disks to reduce their speeds and conserve energy because the disks' power consumption is quadratic to their speeds. However, the cache memory itself consumes power to retain data. Shrinking cache memory can save memory power while increasing disk accesses and degrading performance. Choosing proper cache sizes and rotation speeds can reduce the energy consumption of both memory and disks with satisfactory performance. We model cache resizing and speed setting as an optimization problem with minimizing the power consumption as objective and limiting disk utilization as constraints. We compare our method with the methods resizing cache based on request rates. The simulation results show that our method achieves better energy-savings while limiting disk access latency","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121274108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the use of faster clocks and larger instruction windows in high-end superscalar processors, the physical register files (RFs) can no longer be accessed in a single cycle. To combat the consequential performance penalty, the RFs employ multiple levels of bypassing. Register file caching, which caches a small subset of the registers in a faster, smaller structure called the register file cache (RFC) has also been proposed as a remedy for this problem. We introduce a relatively simple RFC design that partitions the RFC into two separate components: a FIFO queue for holding register values that are used over a short duration following their writeback and another small set-associative cache holding values that are likely to be used over a longer duration. Results written to the RFC are easily classified into these categories and the classification bit is also used to predict the nature of the result for the next execution of the same instruction. We show that significant energy savings - about 38% on the average - occurs in accessing register operands when a 28-entry RFC is used, together with a 96-entry RF with no additional bypassing when compared with a base case design that has 128 registers with a 2 cycle access time and having one additional level of bypassing. The performance drop compared against the base case is also negligible (0.3% drop)
{"title":"Register File Caching for Energy Efficiency","authors":"Hui Zeng, K. Ghose","doi":"10.1145/1165573.1165633","DOIUrl":"https://doi.org/10.1145/1165573.1165633","url":null,"abstract":"With the use of faster clocks and larger instruction windows in high-end superscalar processors, the physical register files (RFs) can no longer be accessed in a single cycle. To combat the consequential performance penalty, the RFs employ multiple levels of bypassing. Register file caching, which caches a small subset of the registers in a faster, smaller structure called the register file cache (RFC) has also been proposed as a remedy for this problem. We introduce a relatively simple RFC design that partitions the RFC into two separate components: a FIFO queue for holding register values that are used over a short duration following their writeback and another small set-associative cache holding values that are likely to be used over a longer duration. Results written to the RFC are easily classified into these categories and the classification bit is also used to predict the nature of the result for the next execution of the same instruction. We show that significant energy savings - about 38% on the average - occurs in accessing register operands when a 28-entry RFC is used, together with a 96-entry RF with no additional bypassing when compared with a base case design that has 128 registers with a 2 cycle access time and having one additional level of bypassing. The performance drop compared against the base case is also negligible (0.3% drop)","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133031217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Operating temperatures have become an important concern in high performance microprocessors. Floorplanning or block-level placement offers excellent potential for thermal optimization through better heat spreading between the blocks, but these optimizations can also impact the throughput of a microarchitecture, measured in terms of the number of instructions per cycle (IPC). In nanometer technologies, global buses can have multicycle delays that depend on the positions of the blocks, and it is important for a floorplanner to be microarchitecturally-aware to be sure that thermal and IPC considerations are appropriately balanced. This paper proposes a methodology for thermally-aware microarchitecture floorplanning. The approach models the interactions between the IPC and the temperature distribution, and incorporates both factors in the floorplanning cost function. Our approach uses transient modeling and optimizes both the peak and the average temperatures, and employs a design of experiments (DOE) based strategy, which effectively captures the huge exponential search space with a small number of cycle-accurate simulations. A comparison with a technique based on previous work indicates that the proposed approach results in good reductions both in the average and the peak temperatures for a range of SPEC benchmarks
{"title":"Temperature-Aware Floorplanning of Microarchitecture Blocks with IPC-Power Dependence Modeling and Transient Analysis","authors":"Vidyasagar Nookala, D. Lilja, S. Sapatnekar","doi":"10.1145/1165573.1165644","DOIUrl":"https://doi.org/10.1145/1165573.1165644","url":null,"abstract":"Operating temperatures have become an important concern in high performance microprocessors. Floorplanning or block-level placement offers excellent potential for thermal optimization through better heat spreading between the blocks, but these optimizations can also impact the throughput of a microarchitecture, measured in terms of the number of instructions per cycle (IPC). In nanometer technologies, global buses can have multicycle delays that depend on the positions of the blocks, and it is important for a floorplanner to be microarchitecturally-aware to be sure that thermal and IPC considerations are appropriately balanced. This paper proposes a methodology for thermally-aware microarchitecture floorplanning. The approach models the interactions between the IPC and the temperature distribution, and incorporates both factors in the floorplanning cost function. Our approach uses transient modeling and optimizes both the peak and the average temperatures, and employs a design of experiments (DOE) based strategy, which effectively captures the huge exponential search space with a small number of cycle-accurate simulations. A comparison with a technique based on previous work indicates that the proposed approach results in good reductions both in the average and the peak temperatures for a range of SPEC benchmarks","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123199120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alok Garg, Fernando Castro, Michael C. Huang, D. Chaver, L. Piñuel, M. Prieto
Buffering more in-flight instructions in an out-of-order microprocessor is a straightforward and effective method to help tolerate the long latencies generally associated with off-chip memory accesses. One of the main challenges of buffering a large number of instructions, however, is the implementation of a scalable and efficient mechanism to detect memory access order violations as a result of out-of-order scheduling of load and store instructions. Traditional CAM-based associative queues can be very slow and energy consuming. In this paper, instead of using the traditional age-based load queue to record load addresses, we explicitly record age information in address-indexed hash tables to achieve the same functionality of detecting premature loads. This alternative design eliminates associative searches and significantly reduces the energy consumption of the load queue. With simple techniques to reduce the number of false positives, performance degradation is kept at a minimum
{"title":"Substituting Associative Load Queue with Simple Hash Tables in Out-of-Order Microprocessors","authors":"Alok Garg, Fernando Castro, Michael C. Huang, D. Chaver, L. Piñuel, M. Prieto","doi":"10.1145/1165573.1165637","DOIUrl":"https://doi.org/10.1145/1165573.1165637","url":null,"abstract":"Buffering more in-flight instructions in an out-of-order microprocessor is a straightforward and effective method to help tolerate the long latencies generally associated with off-chip memory accesses. One of the main challenges of buffering a large number of instructions, however, is the implementation of a scalable and efficient mechanism to detect memory access order violations as a result of out-of-order scheduling of load and store instructions. Traditional CAM-based associative queues can be very slow and energy consuming. In this paper, instead of using the traditional age-based load queue to record load addresses, we explicitly record age information in address-indexed hash tables to achieve the same functionality of detecting premature loads. This alternative design eliminates associative searches and significantly reduces the energy consumption of the load queue. With simple techniques to reduce the number of false positives, performance degradation is kept at a minimum","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122886031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, a novel device optimization methodology is presented that is constrained by the total leakage and the ON current of the device. The devised technique locates a maximum yield rectangular cube in a three-dimensional feasible space composed by oxide thickness, halo peak doping, and halo characteristic length parameters. The center of this cube is considered as the maximum yield design point with the highest immunity against variations. Monte Carlo simulations show that the optimized bulk-MOS device for 45 nm gate length satisfies the on current and leakage constraints under a variability of up to 30% in the three parameters
{"title":"Variability-Aware Device Optimization under ION and Leakage Current Constraints","authors":"J. Jaffari, M. Anis","doi":"10.1145/1165573.1165601","DOIUrl":"https://doi.org/10.1145/1165573.1165601","url":null,"abstract":"In this paper, a novel device optimization methodology is presented that is constrained by the total leakage and the ON current of the device. The devised technique locates a maximum yield rectangular cube in a three-dimensional feasible space composed by oxide thickness, halo peak doping, and halo characteristic length parameters. The center of this cube is considered as the maximum yield design point with the highest immunity against variations. Monte Carlo simulations show that the optimized bulk-MOS device for 45 nm gate length satisfies the on current and leakage constraints under a variability of up to 30% in the three parameters","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"135 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122906631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Although there has been extensive research on controlling leakage power, the fact that leaky transistors can act as a damping element for supply noise has been long ignored or unnoticed in the design community. This paper investigates the leakage induced damping effect that helps suppress the supply noise. By developing physics-based impedance models for active and leakage currents, we show that leakage, particularly gate tunneling leakage, provides more damping than strong-inversion current. Simulations were performed in a 32nm CMOS technology to validate our models under PVT variations and to explore the voltage dependent behavior of this phenomenon. Design example utilizing leakage induced damping such as decap assignment is discussed with results showing 15.6% saving in decap area
{"title":"Modeling and Analysis of Leakage Induced Damping Effect in Low Voltage LSIs","authors":"Jie Gu, J. Keane, C. Kim","doi":"10.1145/1165573.1165668","DOIUrl":"https://doi.org/10.1145/1165573.1165668","url":null,"abstract":"Although there has been extensive research on controlling leakage power, the fact that leaky transistors can act as a damping element for supply noise has been long ignored or unnoticed in the design community. This paper investigates the leakage induced damping effect that helps suppress the supply noise. By developing physics-based impedance models for active and leakage currents, we show that leakage, particularly gate tunneling leakage, provides more damping than strong-inversion current. Simulations were performed in a 32nm CMOS technology to validate our models under PVT variations and to explore the voltage dependent behavior of this phenomenon. Design example utilizing leakage induced damping such as decap assignment is discussed with results showing 15.6% saving in decap area","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129261622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The traditional virtual memory system is designed for decades assuming a magnetic disk as the secondary storage. Recently, flash memory becomes a popular storage alternative for many portable devices with the continuing improvements on its capacity, reliability and much lower power consumption than mechanical hard drives. The NAND flash memory is organized with blocks, and each block contains a set of pages. The characteristics of flash memory are quite different from a magnetic disk. Therefore, in this paper, we revisit virtual memory system design considering limitations imposed by flash memory. In particular, we study the effects of the subpaging technique and storage cache management. In the traditional virtual memory system, a full page is written back to the secondary storage on a page fault. We found that this could result in unnecessary writes thereby wasting energy. The subpaging technique that partitions a page into subunits, and only dirty subpages are written to flash memory is beneficial to the energy efficiency. For the storage cache management, unlike traditional disk cache management, care needs to be taken to guarantee that the flash pages of a main memory page are replaced from the cache in sequence. Experimental results show that the average energy reduction of combined subpaging and caching techniques is 35.6%
{"title":"An Energy-Efficient Virtual Memory System with Flash Memory as the Secondary Storage","authors":"Hung-Wei Tseng, Han-Lin Li, Chia-Lin Yang","doi":"10.1145/1165573.1165675","DOIUrl":"https://doi.org/10.1145/1165573.1165675","url":null,"abstract":"The traditional virtual memory system is designed for decades assuming a magnetic disk as the secondary storage. Recently, flash memory becomes a popular storage alternative for many portable devices with the continuing improvements on its capacity, reliability and much lower power consumption than mechanical hard drives. The NAND flash memory is organized with blocks, and each block contains a set of pages. The characteristics of flash memory are quite different from a magnetic disk. Therefore, in this paper, we revisit virtual memory system design considering limitations imposed by flash memory. In particular, we study the effects of the subpaging technique and storage cache management. In the traditional virtual memory system, a full page is written back to the secondary storage on a page fault. We found that this could result in unnecessary writes thereby wasting energy. The subpaging technique that partitions a page into subunits, and only dirty subpages are written to flash memory is beneficial to the energy efficiency. For the storage cache management, unlike traditional disk cache management, care needs to be taken to guarantee that the flash pages of a main memory page are replaced from the cache in sequence. Experimental results show that the average energy reduction of combined subpaging and caching techniques is 35.6%","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130893619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Hanson, Bo Zhai, D. Blaauw, D. Sylvester, A. Bryant, Xinlin Wang
Recent progress in the development of subthreshold circuit design techniques has created the opportunity for dramatic energy reductions in many applications. However, energy efficiency comes at the price of timing and energy variability due to process variations. We explore energy optimality in the subthreshold regime, discuss variability in this region, and highlight the energy and variability characteristics of a real subthreshold design
{"title":"Energy Optimality and Variability in Subthreshold Design","authors":"S. Hanson, Bo Zhai, D. Blaauw, D. Sylvester, A. Bryant, Xinlin Wang","doi":"10.1145/1165573.1165660","DOIUrl":"https://doi.org/10.1145/1165573.1165660","url":null,"abstract":"Recent progress in the development of subthreshold circuit design techniques has created the opportunity for dramatic energy reductions in many applications. However, energy efficiency comes at the price of timing and energy variability due to process variations. We explore energy optimality in the subthreshold regime, discuss variability in this region, and highlight the energy and variability characteristics of a real subthreshold design","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131300087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}