A new mismatch-dependent low-power technique is presented for content-addressable memories (CAMs). With a novel shadow match-line voltage-detecting scheme, the word circuits realize fast self-disable of the charging paths in case of mismatches. Since the majority of CAMs words are mismatched, a significant power is reduced with a high search speed. Simulation results show the proposed 256-word times 144-bit ternary CAM, using 0.13-mum 1.2-V CMOS process, achieves 0.51 fJ/bit/search for the word circuit with less than 900 ps search time. The achievement illustrates a 77% energy-delay-product (EDP) reduction as compared to the speed-optimized current-saving scheme
提出了一种新的基于失匹配的低功耗内容寻址存储器技术。该电路采用了一种新颖的阴影匹配线电压检测方案,实现了充电路径在不匹配情况下的快速自禁用。由于大多数CAMs单词是不匹配的,因此在高搜索速度的同时显著降低了功率。仿真结果表明,采用0.13 μ m 1.2 v CMOS工艺的256字144位三元制CAM,在小于900 ps的搜索时间下,实现了字电路的0.51 fJ/bit/搜索。这一成果表明,与速度优化的节电方案相比,能量延迟积(EDP)降低了77%
{"title":"A New Mismatch-Dependent Low Power Technique with Shadow Match-Line Voltage-Detecting Scheme for CAMs","authors":"Jianwei Zhang, Y. Ye, Bin-Da Liu","doi":"10.1145/1165573.1165605","DOIUrl":"https://doi.org/10.1145/1165573.1165605","url":null,"abstract":"A new mismatch-dependent low-power technique is presented for content-addressable memories (CAMs). With a novel shadow match-line voltage-detecting scheme, the word circuits realize fast self-disable of the charging paths in case of mismatches. Since the majority of CAMs words are mismatched, a significant power is reduced with a high search speed. Simulation results show the proposed 256-word times 144-bit ternary CAM, using 0.13-mum 1.2-V CMOS process, achieves 0.51 fJ/bit/search for the word circuit with less than 900 ps search time. The achievement illustrates a 77% energy-delay-product (EDP) reduction as compared to the speed-optimized current-saving scheme","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130223543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Power and power density are now primary design constraints for modern high performance microprocessors. Up to 70% of the dynamic power consumed can be attributed to the clocking system. A consequence of this trend is that clock gating has emerged as both a necessary and efficient method to significantly reduce dynamic power. Transparent pipelining, a recently proposed fine-grain clock gating technique, has the potential to significantly reduce clock power above and beyond conventional pipestage-level clock gating. Previous studies of transparent pipelining have focused on the circuit and implementation-related issues of this approach, while neglecting the broader microarchitectural implications. This paper aims to quantify the microarchitectural opportunities that are afforded by the use of transparent pipelining in a processor's fetch pipeline. We develop a technique, based on stall cycle redistribution, designed to improve the performance of transparent pipelining on fetch and other high utilization pipelines. We show that stall cycle redistribution can dramatically reduce the clocking overhead of an aggressively pipelined cell-like microprocessor
{"title":"Stall Cycle Redistribution in a Transparent Fetch Pipeline","authors":"Eric L. Hill, Mikko H. Lipasti","doi":"10.1145/1165573.1165583","DOIUrl":"https://doi.org/10.1145/1165573.1165583","url":null,"abstract":"Power and power density are now primary design constraints for modern high performance microprocessors. Up to 70% of the dynamic power consumed can be attributed to the clocking system. A consequence of this trend is that clock gating has emerged as both a necessary and efficient method to significantly reduce dynamic power. Transparent pipelining, a recently proposed fine-grain clock gating technique, has the potential to significantly reduce clock power above and beyond conventional pipestage-level clock gating. Previous studies of transparent pipelining have focused on the circuit and implementation-related issues of this approach, while neglecting the broader microarchitectural implications. This paper aims to quantify the microarchitectural opportunities that are afforded by the use of transparent pipelining in a processor's fetch pipeline. We develop a technique, based on stall cycle redistribution, designed to improve the performance of transparent pipelining on fetch and other high utilization pipelines. We show that stall cycle redistribution can dramatically reduce the clocking overhead of an aggressively pipelined cell-like microprocessor","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122469229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The large supply voltage difference between sub-threshold core logic and I/O makes it extremely challenging to convert signals from core circuit to I/O circuit. In this paper, we propose two novel circuits, clock synchronizer and reduced swing inverter to design dynamic and static level converters for sub-threshold logic. Circuit simulations shows that our level converters work at frequency > 500kHz between 20degC and 40degC with a supply voltage of 0.25V
{"title":"Robust Level Converter Design for Sub-threshold Logic","authors":"I. Chang, Jae-Joon Kim, K. Roy","doi":"10.1145/1165573.1165579","DOIUrl":"https://doi.org/10.1145/1165573.1165579","url":null,"abstract":"The large supply voltage difference between sub-threshold core logic and I/O makes it extremely challenging to convert signals from core circuit to I/O circuit. In this paper, we propose two novel circuits, clock synchronizer and reduced swing inverter to design dynamic and static level converters for sub-threshold logic. Circuit simulations shows that our level converters work at frequency > 500kHz between 20degC and 40degC with a supply voltage of 0.25V","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130373609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a low power Viterbi decoder design based on scarce state transition (SST). We propose an approach which seamlessly integrates the path pruning techniques with the SST decoding to reduce the average add-compare-select (ACS) computation. The scheme has very low overhead and is practical for implementation. We also propose an uneven-partitioned memory architecture for the survivor memory unit to reduce the memory access power during the trace back operation. The proposed decoder is implemented in SMIC 0.18mum CMOS process. Simulation results show that significant power consumption reduction can be achieved for high throughput wireless systems such as MB-OFDM ultra-wide-band applications
提出了一种基于稀缺状态转换(SST)的低功耗维特比译码器设计。我们提出了一种将路径修剪技术与SST解码无缝集成的方法,以减少平均添加比较选择(ACS)计算。该方案开销很低,易于实现。我们还提出了幸存者内存单元的非均匀分区内存架构,以减少追溯操作期间的内存访问功率。该解码器采用中芯0.18 μ m CMOS工艺实现。仿真结果表明,对于MB-OFDM超宽带应用等高吞吐量无线系统,可以实现显著的功耗降低
{"title":"A Low Power Viterbi Decoder Implementation using Scarce State Transition and Path Pruning Scheme for High Throughput Wireless Applications","authors":"Jie Jin, C. Tsui","doi":"10.1145/1165573.1165673","DOIUrl":"https://doi.org/10.1145/1165573.1165673","url":null,"abstract":"This paper presents a low power Viterbi decoder design based on scarce state transition (SST). We propose an approach which seamlessly integrates the path pruning techniques with the SST decoding to reduce the average add-compare-select (ACS) computation. The scheme has very low overhead and is practical for implementation. We also propose an uneven-partitioned memory architecture for the survivor memory unit to reduce the memory access power during the trace back operation. The proposed decoder is implemented in SMIC 0.18mum CMOS process. Simulation results show that significant power consumption reduction can be achieved for high throughput wireless systems such as MB-OFDM ultra-wide-band applications","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116895458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Sarrafzadeh, F. Dabiri, R. Jafari, T. Massey, A. Nahapetian
Light-weight embedded systems are now gaining more popularity due to the recent technological advances in fabrication that have resulted in more powerful tiny processors with greater communication capabilities that pose various scientific challenges for researchers. Perhaps the most significant challenge is the energy consumption concern and reliability, mainly due to the small size of batteries. In this tutorial, we portray a brief description of low-power, light-weight embedded systems, depict several power profiling studies previously conducted, and present several research challenges that require low-power consumption in embedded systems. For each challenge, we highlight how low-power designs may enhance the overall performance of the system. Finally, we present a several techniques that minimize the power consumption in such systems
{"title":"Low Power Light-weight Embedded Systems","authors":"M. Sarrafzadeh, F. Dabiri, R. Jafari, T. Massey, A. Nahapetian","doi":"10.1145/1165573.1165623","DOIUrl":"https://doi.org/10.1145/1165573.1165623","url":null,"abstract":"Light-weight embedded systems are now gaining more popularity due to the recent technological advances in fabrication that have resulted in more powerful tiny processors with greater communication capabilities that pose various scientific challenges for researchers. Perhaps the most significant challenge is the energy consumption concern and reliability, mainly due to the small size of batteries. In this tutorial, we portray a brief description of low-power, light-weight embedded systems, depict several power profiling studies previously conducted, and present several research challenges that require low-power consumption in embedded systems. For each challenge, we highlight how low-power designs may enhance the overall performance of the system. Finally, we present a several techniques that minimize the power consumption in such systems","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124436715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In a few technology generations, limitations of fabrication processes have made accurate design time power estimates a daunting challenge. Static leakage current which comprises a significant fraction of total power due to large on-chip caches, is exponentially dependent on widely varying physical parameters such as gate length, gate oxide thickness, and dopant ion concentration. In large structures like on-chip caches, this may mean that one portion of a cache may consume an order of magnitude larger static power than equivalently sized regions. Under this climate, egalitarian management of physical resources is clearly untenable. In this paper, we analyze the effects of within-die and die-to-die leakage variation for on-chip caches. We then propose way prioritization, a manufacturing variation aware scheme that minimizes cache leakage energy. Our results show that significant average power reductions are possible without undue hardware complexity or performance compromise
{"title":"Process Variation Aware Cache Leakage Management","authors":"Ke Meng, R. Joseph","doi":"10.1145/1165573.1165636","DOIUrl":"https://doi.org/10.1145/1165573.1165636","url":null,"abstract":"In a few technology generations, limitations of fabrication processes have made accurate design time power estimates a daunting challenge. Static leakage current which comprises a significant fraction of total power due to large on-chip caches, is exponentially dependent on widely varying physical parameters such as gate length, gate oxide thickness, and dopant ion concentration. In large structures like on-chip caches, this may mean that one portion of a cache may consume an order of magnitude larger static power than equivalently sized regions. Under this climate, egalitarian management of physical resources is clearly untenable. In this paper, we analyze the effects of within-die and die-to-die leakage variation for on-chip caches. We then propose way prioritization, a manufacturing variation aware scheme that minimizes cache leakage energy. Our results show that significant average power reductions are possible without undue hardware complexity or performance compromise","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"20 9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124544651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The H.264 video coding standard can achieve considerably higher coding efficiency than previous video coding standards. The keys to this high coding efficiency are the two prediction modes (intra & inter) provided by H.264. Unfortunately, these result in a considerably higher encoder complexity that adversely affects speed and power, which are both significant for the mobile multimedia applications targeted by the standard. Therefore, it is of high importance to design architectures that minimize the speed and power overhead of the prediction modes. In this paper we present a new algorithm, and the logic transformations that enable it, that can replace the standard sum of absolute differences (SAD) approach in the two main prediction modes, and provide a power efficient hardware implementation without perceivable degradation in coding efficiency or video quality
{"title":"Power Reduction in an H.264 Encoder Through Algorithmic and Logic Transformations","authors":"M. Koziri, G. Stamoulis, I. Katsavounidis","doi":"10.1145/1165573.1165598","DOIUrl":"https://doi.org/10.1145/1165573.1165598","url":null,"abstract":"The H.264 video coding standard can achieve considerably higher coding efficiency than previous video coding standards. The keys to this high coding efficiency are the two prediction modes (intra & inter) provided by H.264. Unfortunately, these result in a considerably higher encoder complexity that adversely affects speed and power, which are both significant for the mobile multimedia applications targeted by the standard. Therefore, it is of high importance to design architectures that minimize the speed and power overhead of the prediction modes. In this paper we present a new algorithm, and the logic transformations that enable it, that can replace the standard sum of absolute differences (SAD) approach in the two main prediction modes, and provide a power efficient hardware implementation without perceivable degradation in coding efficiency or video quality","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114728468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Today's superscalar microprocessors use large, heavily-ported physical register files (RFs) to increase the instruction throughput. The high complexity and power dissipation of such RFs mainly stem from the need to maintain each and every result for a large number of cycles after the result generation. We observed that a significant fraction (about 45%) of the result values are delivered to their consumers via the bypass network (consumed "on-the-fly") and are never read out from the destination registers. In this paper, we first formulate conditions for identifying such transient values and describe their microarchitectural implementation; then we propose a technique to avoid the writeback of such transient values into the RF. With 64-entry integer and floating point register files, our technique achieves an 11% performance improvement and 29% reduction in the RF energy consumption compared to the baseline machine with the same number of registers. Furthermore, for the same performance target, the selective writeback scheme results in a 38% reduction in the energy consumption of the RF compared to the baseline machine
{"title":"Selective Writeback: Exploiting Transient Values for Energy-Efficiency and Performance","authors":"D. Balkan, J. Sharkey, D. Ponomarev, K. Ghose","doi":"10.1145/1165573.1165584","DOIUrl":"https://doi.org/10.1145/1165573.1165584","url":null,"abstract":"Today's superscalar microprocessors use large, heavily-ported physical register files (RFs) to increase the instruction throughput. The high complexity and power dissipation of such RFs mainly stem from the need to maintain each and every result for a large number of cycles after the result generation. We observed that a significant fraction (about 45%) of the result values are delivered to their consumers via the bypass network (consumed \"on-the-fly\") and are never read out from the destination registers. In this paper, we first formulate conditions for identifying such transient values and describe their microarchitectural implementation; then we propose a technique to avoid the writeback of such transient values into the RF. With 64-entry integer and floating point register files, our technique achieves an 11% performance improvement and 29% reduction in the RF energy consumption compared to the baseline machine with the same number of registers. Furthermore, for the same performance target, the selective writeback scheme results in a 38% reduction in the energy consumption of the RF compared to the baseline machine","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126133933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Lu, N. Cao, L. Sigal, P. Woltgens, R. Robertazzi, D. Heidel
We have reported previously (Pong-Fei Lu et al., 2004) a low-swing latch (LSL) with superior performance-power tradeoff compared to the conventional pass-gate master-slave latch. In this paper, hardware results are presented for the proposed LSL with pulsed clock waveforms. The motivation is to combine low-voltage swing with pulsed signals to further reduce overall system power in high-frequency microprocessors. We have designed a 65-bit accumulator loop experiment to mimic a microprocessor pipeline stage. The local clock buffer design features a mode switch to toggle between two-phase (c1/c2) master-slave clocking and one-phase pulsed (c2 only) clocking. Our data show that 15-25% system power saving can be achieved in pulsed mode compared to non-pulsed mode. Power contribution from individual components is also presented
我们之前报道过(Pong-Fei Lu et al., 2004)一种低摆幅锁存器(LSL),与传统的通闸主从锁存器相比,具有优越的性能-功率权衡。本文给出了采用脉冲时钟波形的LSL的硬件结果。其动机是将低压摆幅与脉冲信号相结合,以进一步降低高频微处理器的整体系统功率。我们设计了一个65位累加器环路实验来模拟微处理器流水线阶段。本地时钟缓冲器设计的特点是模式切换,可以在两相(c1/c2)主从时钟和单相脉冲(仅c2)时钟之间切换。我们的数据表明,与非脉冲模式相比,脉冲模式可以节省15-25%的系统功率。同时给出了各个部件的功率贡献
{"title":"A Pulsed Low-Voltage Swing Latch for Reduced Power Dissipation in High-Frequency Microprocessors","authors":"P. Lu, N. Cao, L. Sigal, P. Woltgens, R. Robertazzi, D. Heidel","doi":"10.1145/1165573.1165593","DOIUrl":"https://doi.org/10.1145/1165573.1165593","url":null,"abstract":"We have reported previously (Pong-Fei Lu et al., 2004) a low-swing latch (LSL) with superior performance-power tradeoff compared to the conventional pass-gate master-slave latch. In this paper, hardware results are presented for the proposed LSL with pulsed clock waveforms. The motivation is to combine low-voltage swing with pulsed signals to further reduce overall system power in high-frequency microprocessors. We have designed a 65-bit accumulator loop experiment to mimic a microprocessor pipeline stage. The local clock buffer design features a mode switch to toggle between two-phase (c1/c2) master-slave clocking and one-phase pulsed (c2 only) clocking. Our data show that 15-25% system power saving can be achieved in pulsed mode compared to non-pulsed mode. Power contribution from individual components is also presented","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125173432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As transistors continue to scale down into the nanometer regime, device leakage currents are becoming the dominant cause of power dissipation in nanometer caches, making it essential to model these leakage effects properly. Moreover, typical microprocessor caches are pipelined to keep up with the speed of the processor, and the effects of pipelining overhead need to be properly accounted for. In this paper, we present a detailed study of pipelined nanometer caches with detailed energy/power dissipation breakdowns showing where and how the power is dissipated within a nanometer cache. We explore a three-dimensional pipelined cache design space that includes cache size (16kB to 512kB), cache associativity (direct-mapped to 16-way) and process technology (90nm, 65nm, 45nm and 32nm). Among our findings, we show that cache bitline leakage is increasingly becoming the dominant cause of power dissipation in nanometer technology nodes. We show that subthreshold leakage is the main cause of static power dissipation, and that gate leakage is, surprisingly, not a significant contributor to total cache power, even for 32nm caches. We also show that accounting for cache pipelining overhead is necessary, as power dissipated by the pipeline elements is a significant part of cache power
{"title":"Energy/Power Breakdown of Pipelined Nanometer Caches (90nm/65nm/45nm/32nm)","authors":"Samuel Rodríguez, B. Jacob","doi":"10.1145/1165573.1165581","DOIUrl":"https://doi.org/10.1145/1165573.1165581","url":null,"abstract":"As transistors continue to scale down into the nanometer regime, device leakage currents are becoming the dominant cause of power dissipation in nanometer caches, making it essential to model these leakage effects properly. Moreover, typical microprocessor caches are pipelined to keep up with the speed of the processor, and the effects of pipelining overhead need to be properly accounted for. In this paper, we present a detailed study of pipelined nanometer caches with detailed energy/power dissipation breakdowns showing where and how the power is dissipated within a nanometer cache. We explore a three-dimensional pipelined cache design space that includes cache size (16kB to 512kB), cache associativity (direct-mapped to 16-way) and process technology (90nm, 65nm, 45nm and 32nm). Among our findings, we show that cache bitline leakage is increasingly becoming the dominant cause of power dissipation in nanometer technology nodes. We show that subthreshold leakage is the main cause of static power dissipation, and that gate leakage is, surprisingly, not a significant contributor to total cache power, even for 32nm caches. We also show that accounting for cache pipelining overhead is necessary, as power dissipated by the pipeline elements is a significant part of cache power","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"150 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124666362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}