首页 > 最新文献

Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758)最新文献

英文 中文
Mitigating inductive noise in SMT processors 减小SMT处理器中的感应噪声
W. El-Essawy, D. Albonesi
Simultaneous multi-threading, although effective in increasing processor throughput, exacerbates the inductive noise problem such that more expensive electronic solutions are required even with the use of previously proposed microarchitectural approaches. We use detailed microarchitectural simulation together with the Pentium 4 power delivery model to demonstrate the impact of SMT on inductive noise, and to identify thread-specific microarchitectural reasons for high noise occurrences. We make the key observation that the presence of multiple threads actually provides an opportunity to mitigate the cyclical current fluctuations that cause noise, and propose the use of a prior performance enhancement technique to achieve this purpose.
同步多线程虽然能有效地提高处理器吞吐量,但却加剧了感应噪声问题,因此即使使用先前提出的微架构方法,也需要更昂贵的电子解决方案。我们使用详细的微架构模拟和Pentium 4功率传输模型来演示SMT对感应噪声的影响,并确定高噪声发生的线程特定微架构原因。我们做出了关键的观察,即多线程的存在实际上提供了一个机会来减轻引起噪声的周期性电流波动,并建议使用先前的性能增强技术来实现这一目的。
{"title":"Mitigating inductive noise in SMT processors","authors":"W. El-Essawy, D. Albonesi","doi":"10.1109/LPE.2004.241160","DOIUrl":"https://doi.org/10.1109/LPE.2004.241160","url":null,"abstract":"Simultaneous multi-threading, although effective in increasing processor throughput, exacerbates the inductive noise problem such that more expensive electronic solutions are required even with the use of previously proposed microarchitectural approaches. We use detailed microarchitectural simulation together with the Pentium 4 power delivery model to demonstrate the impact of SMT on inductive noise, and to identify thread-specific microarchitectural reasons for high noise occurrences. We make the key observation that the presence of multiple threads actually provides an opportunity to mitigate the cyclical current fluctuations that cause noise, and propose the use of a prior performance enhancement technique to achieve this purpose.","PeriodicalId":120002,"journal":{"name":"Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114005723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Balanced energy optimization 平衡能量优化
J. Cornish
Summary form only given. Energy efficiency is now the number one issue for many applications, determining weight and cost, and constraining system performance. Many techniques have been developed to minimize the dynamic and static power consumed by digital designs without any impact on functionality. To achieve further savings it is necessary to employ methods that do constrain functionality in some way. The designer must then balance increased energy efficiency with the functional implications of those techniques. In communications systems non-zero error rates are accommodated and corrected in order to reduce power. In digital designs it is also possible to accept and correct errors generated when worst case timing paths exceed the clock interval. This allows the design to be operated beyond the worst case point at a reduced voltage to save energy. The increased energy efficiency must then be balanced against a decrease in determinism and the addition of error detection and correction structures. Processing scalability can also be employed to increase energy efficiency for workloads which vary dynamically. In single processor system this can be achieved using voltage and frequency scaling, and in multi-processor systems this can be supplemented with adaptive shutdown of unused processors. Scalability does imply a loss of system responsiveness when workloads transition from low to high levels, and this must be balanced against the increased energy efficiency achieved. Power efficiency can also be increased by optimising a processor for the application it is intended to run. By analyzing the algorithms to be executed it is possible to create a processor tailored to its workload. This loss of generality and flexibility must be balanced against the increased energy efficiency of a customized implementation. This talk describes work which ARM and its partners are doing to balance energy efficiency with functionality to create optimized designs.
只提供摘要形式。能源效率现在是许多应用的首要问题,它决定了重量和成本,并限制了系统性能。已经开发了许多技术,以尽量减少数字设计所消耗的动态和静态功率,而不会对功能产生任何影响。为了实现进一步的节省,有必要采用以某种方式约束功能的方法。然后,设计师必须在提高能源效率和这些技术的功能含义之间取得平衡。在通信系统中,非零错误率被调节和校正以降低功率。在数字设计中,也可以接受和纠正当最坏情况下时序路径超过时钟间隔时产生的错误。这使得该设计可以在降低电压的情况下运行,从而节省能源。能源效率的提高必须与确定性的降低以及错误检测和纠正结构的增加相平衡。处理可伸缩性还可以用于提高动态变化的工作负载的能源效率。在单处理器系统中,这可以通过电压和频率缩放来实现,而在多处理器系统中,这可以通过自适应关闭未使用的处理器来补充。当工作负载从低级别转换到高级别时,可伸缩性确实意味着系统响应性的损失,这必须与所实现的提高的能源效率相平衡。电源效率也可以通过优化处理器来提高它要运行的应用程序。通过分析要执行的算法,可以创建适合其工作负载的处理器。这种通用性和灵活性的损失必须与定制实现所提高的能源效率相平衡。本演讲介绍了ARM及其合作伙伴在平衡能源效率和功能以创建优化设计方面所做的工作。
{"title":"Balanced energy optimization","authors":"J. Cornish","doi":"10.1145/1023833.1023835","DOIUrl":"https://doi.org/10.1145/1023833.1023835","url":null,"abstract":"Summary form only given. Energy efficiency is now the number one issue for many applications, determining weight and cost, and constraining system performance. Many techniques have been developed to minimize the dynamic and static power consumed by digital designs without any impact on functionality. To achieve further savings it is necessary to employ methods that do constrain functionality in some way. The designer must then balance increased energy efficiency with the functional implications of those techniques. In communications systems non-zero error rates are accommodated and corrected in order to reduce power. In digital designs it is also possible to accept and correct errors generated when worst case timing paths exceed the clock interval. This allows the design to be operated beyond the worst case point at a reduced voltage to save energy. The increased energy efficiency must then be balanced against a decrease in determinism and the addition of error detection and correction structures. Processing scalability can also be employed to increase energy efficiency for workloads which vary dynamically. In single processor system this can be achieved using voltage and frequency scaling, and in multi-processor systems this can be supplemented with adaptive shutdown of unused processors. Scalability does imply a loss of system responsiveness when workloads transition from low to high levels, and this must be balanced against the increased energy efficiency achieved. Power efficiency can also be increased by optimising a processor for the application it is intended to run. By analyzing the algorithms to be executed it is possible to create a processor tailored to its workload. This loss of generality and flexibility must be balanced against the increased energy efficiency of a customized implementation. This talk describes work which ARM and its partners are doing to balance energy efficiency with functionality to create optimized designs.","PeriodicalId":120002,"journal":{"name":"Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114396711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Design and implementation of correlating caches 关联缓存的设计和实现
A. Mallik, M. Wildrick, G. Memik
We introduce a new cache architecture that can be used to increase performance and reduce energy consumption in Network Processors. This new architecture is based on the observation that there is a strong correlation between different memory accesses. In other words, if load X and load Y are two consecutively executed load instructions, the offset between the source addresses of these instructions remain usually constant between different iterations. We utilize this information by building a correlating cache architecture. This architecture consists of a Dynamic Correlation Extractor, a Correlation History Table, and a Correlation Buffer. We first show simulation results investigating the frequency of correlating loads. Then, we evaluate our architecture using SimpleScalar/ARM. For a set of representative applications, the correlating cache architecture is able to reduce the average data access time by as much as 52.7% and 36.1/% on average, while reducing the energy consumption of the caches by as much as 49.2% and 25.7% on average.
我们介绍了一种新的缓存架构,可用于提高网络处理器的性能并降低能耗。这种新的体系结构是基于观察到不同的内存访问之间存在很强的相关性。换句话说,如果加载X和加载Y是两个连续执行的加载指令,则这些指令的源地址之间的偏移量在不同迭代之间通常保持不变。我们通过构建相关的缓存架构来利用这些信息。该体系结构由一个动态关联提取器、一个关联历史表和一个关联缓冲区组成。我们首先展示了研究相关负载频率的模拟结果。然后,我们使用SimpleScalar/ARM来评估我们的架构。对于一组具有代表性的应用程序,相关缓存架构能够将平均数据访问时间分别减少52.7%和36.1% /%,同时将缓存的能耗平均降低49.2%和25.7%。
{"title":"Design and implementation of correlating caches","authors":"A. Mallik, M. Wildrick, G. Memik","doi":"10.1145/1013235.1013255","DOIUrl":"https://doi.org/10.1145/1013235.1013255","url":null,"abstract":"We introduce a new cache architecture that can be used to increase performance and reduce energy consumption in Network Processors. This new architecture is based on the observation that there is a strong correlation between different memory accesses. In other words, if load X and load Y are two consecutively executed load instructions, the offset between the source addresses of these instructions remain usually constant between different iterations. We utilize this information by building a correlating cache architecture. This architecture consists of a Dynamic Correlation Extractor, a Correlation History Table, and a Correlation Buffer. We first show simulation results investigating the frequency of correlating loads. Then, we evaluate our architecture using SimpleScalar/ARM. For a set of representative applications, the correlating cache architecture is able to reduce the average data access time by as much as 52.7% and 36.1/% on average, while reducing the energy consumption of the caches by as much as 49.2% and 25.7% on average.","PeriodicalId":120002,"journal":{"name":"Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115507249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Integrated adaptive DC/DC conversion with adaptive pulse-train technique for low-ripple fast-response regulation 集成自适应DC/DC转换与自适应脉冲序列技术,实现低纹波快速响应调节
Chuang Zhang, D. Ma, A. Srivastava
Dynamic voltage scaling (DVS) is a very effective low-power design technique in modem digital IC systems. On-chip adaptive DC/DC converter, which provides adjustable output voltage, is a key component in implementing DVS-enabled system. This paper presents a new adaptive DC/DC converter design, which adopts a delay-line controller for voltage regulation. With a proposed adaptive pulse-train technique, ripple voltages are reduced by 50%, while the converter still maintains satisfying transient response. With a supply voltage of 3.3V, the output of the converter is well regulated from 1.7 to 3.0V. Power consumption of the controller is below 100 /spl mu/W. Maximum efficiency of 92% is achieved with output power of 125mW. Chip area is 0.8 /spl times/ 1.2mm/sup 2/ in 1.5 /spl mu/m standard CMOS process.
动态电压缩放(DVS)是现代数字集成电路系统中一种非常有效的低功耗设计技术。片上自适应DC/DC变换器是实现dvs系统的关键部件,其输出电压可调。本文提出了一种新的自适应DC/DC变换器设计,该变换器采用延迟线控制器进行电压调节。采用自适应脉冲串技术,纹波电压降低了50%,同时变换器仍能保持满意的瞬态响应。电源电压为3.3V,变换器的输出在1.7到3.0V之间调节良好。控制器功耗低于100 /spl mu/W。当输出功率为125mW时,最高效率可达92%。在1.5 /spl mu/m标准CMOS工艺中,芯片面积为0.8 /spl倍/ 1.2mm/sup 2。
{"title":"Integrated adaptive DC/DC conversion with adaptive pulse-train technique for low-ripple fast-response regulation","authors":"Chuang Zhang, D. Ma, A. Srivastava","doi":"10.1145/1013235.1013301","DOIUrl":"https://doi.org/10.1145/1013235.1013301","url":null,"abstract":"Dynamic voltage scaling (DVS) is a very effective low-power design technique in modem digital IC systems. On-chip adaptive DC/DC converter, which provides adjustable output voltage, is a key component in implementing DVS-enabled system. This paper presents a new adaptive DC/DC converter design, which adopts a delay-line controller for voltage regulation. With a proposed adaptive pulse-train technique, ripple voltages are reduced by 50%, while the converter still maintains satisfying transient response. With a supply voltage of 3.3V, the output of the converter is well regulated from 1.7 to 3.0V. Power consumption of the controller is below 100 /spl mu/W. Maximum efficiency of 92% is achieved with output power of 125mW. Chip area is 0.8 /spl times/ 1.2mm/sup 2/ in 1.5 /spl mu/m standard CMOS process.","PeriodicalId":120002,"journal":{"name":"Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123061378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Reducing pipeline energy demands with local DVS and dynamic retiming 利用局部DVS和动态重定时降低管道能量需求
Seokwoo Lee, Shidhartha Das, Toan Pham, T. Austin, D. Blaauw, T. Mudge
The quadratic relationship between voltage and energy has made dynamic voltage scaling (DVS) one of the most powerful techniques to reduce system power demands. Recently, techniques such as Razor DVS, voltage overscaling, and intelligent energy management have emerged as approaches to further reduce voltage by eliminating costly voltage margins inserted into traditional designs to ensure always-correct operation. The degree to which a global voltage controller can shave voltage margins is limited by imbalances in pipeline stage latency. Since all pipeline stages share the same voltage, the stage exercising the longest critical path will define the overall voltage of the system, even if other stages could potentially run at lower voltages. In this paper, we evaluate two local tuning mechanisms in the context of Razor DVS, a local voltage controller scheme that allows each pipeline stage its own voltage level, and a lower cost dynamic retiming scheme that incorporates per-stage clock delay elements to allow longer-latency pipeline stages to "borrow" time from shorter-latency stages. Using simulation, we draw two key insights from our study. First, mitigating pipeline stage imbalances render additional DVS energy savings. A Razor pipeline design with dynamic retiming finds an additional 12% energy savings over global voltage control (resulting in overall energy savings of more than 28% compared to fully-margined DVS). Second, we demonstrate that imbalances arise not only from design factors, but also from run-time characteristics. As the program (or program phase) changes, we see different logic paths in multiple stages exercised frequently, necessitating a dynamic fine-tuning of local control. This result suggests that even well-balanced pipelines could benefit from dynamic retiming.
电压与能量之间的二次关系使动态电压缩放(DVS)成为降低系统功率需求的最有效技术之一。最近,诸如Razor分布式交换机、电压过刻度和智能能量管理等技术已经出现,通过消除插入传统设计中的昂贵电压余量来进一步降低电压,以确保始终正确运行。全局电压控制器可以削减电压余量的程度受到管道阶段延迟的不平衡的限制。由于所有管道级共享相同的电压,即使其他级可能在较低的电压下运行,行使最长关键路径的级也将定义系统的总体电压。在本文中,我们评估了Razor分布式交换机背景下的两种本地调谐机制,一种是允许每个管道阶段拥有自己的电压水平的本地电压控制器方案,另一种是成本较低的动态重定时方案,该方案包含每级时钟延迟元素,允许较长延迟的管道阶段从较短延迟的阶段“借用”时间。通过模拟,我们从研究中得出了两个关键的见解。首先,减轻管道级不平衡可以节省额外的DVS能源。采用动态重定时的Razor管道设计,可以比全局电压控制额外节省12%的能源(与全边际分布式交换机相比,总体节能超过28%)。其次,我们证明不平衡不仅来自设计因素,也来自运行时特征。随着程序(或程序阶段)的变化,我们看到不同的逻辑路径在多个阶段频繁运行,需要对局部控制进行动态微调。这一结果表明,即使是平衡良好的管道也可以从动态重定时中受益。
{"title":"Reducing pipeline energy demands with local DVS and dynamic retiming","authors":"Seokwoo Lee, Shidhartha Das, Toan Pham, T. Austin, D. Blaauw, T. Mudge","doi":"10.1145/1013235.1013313","DOIUrl":"https://doi.org/10.1145/1013235.1013313","url":null,"abstract":"The quadratic relationship between voltage and energy has made dynamic voltage scaling (DVS) one of the most powerful techniques to reduce system power demands. Recently, techniques such as Razor DVS, voltage overscaling, and intelligent energy management have emerged as approaches to further reduce voltage by eliminating costly voltage margins inserted into traditional designs to ensure always-correct operation. The degree to which a global voltage controller can shave voltage margins is limited by imbalances in pipeline stage latency. Since all pipeline stages share the same voltage, the stage exercising the longest critical path will define the overall voltage of the system, even if other stages could potentially run at lower voltages. In this paper, we evaluate two local tuning mechanisms in the context of Razor DVS, a local voltage controller scheme that allows each pipeline stage its own voltage level, and a lower cost dynamic retiming scheme that incorporates per-stage clock delay elements to allow longer-latency pipeline stages to \"borrow\" time from shorter-latency stages. Using simulation, we draw two key insights from our study. First, mitigating pipeline stage imbalances render additional DVS energy savings. A Razor pipeline design with dynamic retiming finds an additional 12% energy savings over global voltage control (resulting in overall energy savings of more than 28% compared to fully-margined DVS). Second, we demonstrate that imbalances arise not only from design factors, but also from run-time characteristics. As the program (or program phase) changes, we see different logic paths in multiple stages exercised frequently, necessitating a dynamic fine-tuning of local control. This result suggests that even well-balanced pipelines could benefit from dynamic retiming.","PeriodicalId":120002,"journal":{"name":"Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127238521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
Technology exploration for adaptive power and frequency scaling in 90nm CMOS 90纳米CMOS自适应功率和频率缩放技术探索
M. Meijer, F. Pessolano, J. P. D. Gyvez
In this paper we examine the expectations and limitations of design technologies such as adaptive voltage scaling (AVS) and adaptive body biasing (ABB) in a modem deep sub-micron process. To serve this purpose, a set of ring oscillators was fabricated in a 90nm triple-well CMOS technology. The analysis hereby presented is based on two ring oscillators running at 822MHz and 93MHz, respectively. Measurement results indicate that it is possible to reach 13.8/spl times/ power savings by 3.4/spl times/ frequency downscaling using AVS, /spl plusmn/11% power and /spl plusmn/8% frequency tuning at nominal conditions using ABB only, 22/spl times/ power savings with 5/spl times/ frequency downscaling by combining AVS and ABB, as well as 22/spl times/ leakage reduction.
在本文中,我们研究了自适应电压缩放(AVS)和自适应体偏置(ABB)等设计技术在现代深亚微米工艺中的期望和局限性。为了实现这一目的,采用90nm三孔CMOS技术制造了一组环形振荡器。本文给出的分析是基于两个分别工作在822MHz和93MHz的环形振荡器。测量结果表明,在标称条件下,仅使用ABB就可以实现/spl plusmn/11%的功率和/spl plusmn/8%的频率调谐,通过AVS和ABB的组合,可以实现/spl倍/ 5/spl倍/频率的功率节省22/spl倍/功率节省22/spl倍/频率减少22/spl倍/泄漏减少。
{"title":"Technology exploration for adaptive power and frequency scaling in 90nm CMOS","authors":"M. Meijer, F. Pessolano, J. P. D. Gyvez","doi":"10.1145/1013235.1013245","DOIUrl":"https://doi.org/10.1145/1013235.1013245","url":null,"abstract":"In this paper we examine the expectations and limitations of design technologies such as adaptive voltage scaling (AVS) and adaptive body biasing (ABB) in a modem deep sub-micron process. To serve this purpose, a set of ring oscillators was fabricated in a 90nm triple-well CMOS technology. The analysis hereby presented is based on two ring oscillators running at 822MHz and 93MHz, respectively. Measurement results indicate that it is possible to reach 13.8/spl times/ power savings by 3.4/spl times/ frequency downscaling using AVS, /spl plusmn/11% power and /spl plusmn/8% frequency tuning at nominal conditions using ABB only, 22/spl times/ power savings with 5/spl times/ frequency downscaling by combining AVS and ABB, as well as 22/spl times/ leakage reduction.","PeriodicalId":120002,"journal":{"name":"Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127540200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Post-layout leakage power minimization based on distributed sleep transistor insertion 基于分布式睡眠晶体管插入的布局后漏功率最小化
P. Babighian, L. Benini, A. Macii, E. Macii
This paper introduces a new approach to sub-threshold leakage power reduction in CMOS circuits. Our technique is based on automatic insertion of sleep transistors for cutting sub-threshold current when CMOS gates are in stand-by mode. Area and speed overhead caused by sleep transistor insertion are tightly controlled thanks to: (i) a post-layout incremental modification step that inserts sleep transistors in an existing row-based layout; (ii) an innovative algorithm that selects the subset of cells that can be gated for maximal leakage power reduction, while meeting user-provided constraints on area and delay increase. The presented technique is highly effective and fully compatible with industrial back-end flows, as demonstrated by post-layout analysts on several benchmarks placed and routed with state-of-the art commercial tools for physical design.
本文介绍了一种降低CMOS电路亚阈值泄漏功率的新方法。我们的技术是基于在CMOS门处于待机模式时自动插入休眠晶体管以切断亚阈值电流。由于:(i)在现有的基于行的布局中插入休眠晶体管的布局后增量修改步骤,可以严格控制由休眠晶体管插入引起的面积和速度开销;(ii)一种创新的算法,该算法选择可以门控的单元子集,以最大限度地降低泄漏功率,同时满足用户提供的对面积和延迟增加的约束。所提出的技术非常有效,并且与工业后端流程完全兼容,正如布局后分析人员在使用最先进的物理设计商业工具放置和路由的几个基准测试中所证明的那样。
{"title":"Post-layout leakage power minimization based on distributed sleep transistor insertion","authors":"P. Babighian, L. Benini, A. Macii, E. Macii","doi":"10.1145/1013235.1013275","DOIUrl":"https://doi.org/10.1145/1013235.1013275","url":null,"abstract":"This paper introduces a new approach to sub-threshold leakage power reduction in CMOS circuits. Our technique is based on automatic insertion of sleep transistors for cutting sub-threshold current when CMOS gates are in stand-by mode. Area and speed overhead caused by sleep transistor insertion are tightly controlled thanks to: (i) a post-layout incremental modification step that inserts sleep transistors in an existing row-based layout; (ii) an innovative algorithm that selects the subset of cells that can be gated for maximal leakage power reduction, while meeting user-provided constraints on area and delay increase. The presented technique is highly effective and fully compatible with industrial back-end flows, as demonstrated by post-layout analysts on several benchmarks placed and routed with state-of-the art commercial tools for physical design.","PeriodicalId":120002,"journal":{"name":"Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121882980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
Energy-aware demand paging on NAND flash-based embedded storages 基于NAND闪存的嵌入式存储的能源感知需求分页
Chanik Park, Jeong-Uk Kang, Seon-Yeong Park, Jinsoo Kim
The ever-increasing requirement for high-performance and huge-capacity memories of emerging embedded applications has led to the widespread adoption of SDRAM and NAND flash memory as main and secondary memories, respectively. In particular, the use of energy consuming memory, SDRAM, has become burdensome in battery-powered embedded systems. Intuitively, though demand paging can be used to mitigate the increasing requirement of main memory size, its applicability should be deliberately elaborated since NAND flash memory has asymmetric operation characteristics in terms of performance and energy consumption. In this paper, we present an energy-aware demand paging technique to lower the energy consumption of embedded systems considering the characteristics of interactive embedded applications with large memory footprints. We also propose a flash memory-aware page replacement policy that can reduce the number of write and erase operations in NAND flash memory. With real-life workloads, we show the system-wide energy-delay product can be reduced by 15/spl sim/30% compared to the traditional shadowing architecture.
新兴嵌入式应用对高性能和大容量存储器的需求不断增长,导致SDRAM和NAND闪存分别被广泛采用为主存储器和辅助存储器。特别是,在电池供电的嵌入式系统中,使用耗能存储器SDRAM已经成为负担。直观地说,虽然需求分页可以用来缓解对主存大小不断增加的需求,但由于NAND闪存在性能和能耗方面具有不对称的操作特性,因此应该刻意阐述其适用性。在本文中,我们提出了一种能量感知的需求分页技术,以降低嵌入式系统的能量消耗,并考虑到具有大内存占用的交互式嵌入式应用的特点。我们还提出了一种闪存感知的页面替换策略,可以减少NAND闪存中的写和擦除操作的数量。通过实际工作负载,我们展示了与传统阴影架构相比,系统范围的能量延迟产品可以减少15/spl sim/30%。
{"title":"Energy-aware demand paging on NAND flash-based embedded storages","authors":"Chanik Park, Jeong-Uk Kang, Seon-Yeong Park, Jinsoo Kim","doi":"10.1145/1013235.1013317","DOIUrl":"https://doi.org/10.1145/1013235.1013317","url":null,"abstract":"The ever-increasing requirement for high-performance and huge-capacity memories of emerging embedded applications has led to the widespread adoption of SDRAM and NAND flash memory as main and secondary memories, respectively. In particular, the use of energy consuming memory, SDRAM, has become burdensome in battery-powered embedded systems. Intuitively, though demand paging can be used to mitigate the increasing requirement of main memory size, its applicability should be deliberately elaborated since NAND flash memory has asymmetric operation characteristics in terms of performance and energy consumption. In this paper, we present an energy-aware demand paging technique to lower the energy consumption of embedded systems considering the characteristics of interactive embedded applications with large memory footprints. We also propose a flash memory-aware page replacement policy that can reduce the number of write and erase operations in NAND flash memory. With real-life workloads, we show the system-wide energy-delay product can be reduced by 15/spl sim/30% compared to the traditional shadowing architecture.","PeriodicalId":120002,"journal":{"name":"Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121668247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 90
Managing standby and active mode leakage power in deep sub-micron design 管理深亚微米设计中的待机和有源模式泄漏功率
L. Clark, Rakesh J. Patel, T. Beatty
Scaling has allowed rising transistor counts per die and increases leakage at an exponential rate, making power a primary constraint in all integrated circuit designs. Future designs must address emerging leakage components due to direct band to band tunneling, through MOSFET oxides and at steep junction doping gradients. In this paper, we describe circuit design techniques for managing leakage power, both during standby and for limiting the leakage power contribution during active operation. The efficacy, design effort, and process ramifications of different approaches are examined. The schemes are primarily aimed at hand-held devices such as cell phones, since the needs for low power are most acute in these markets due to limited battery capacity.
缩放使得每个芯片的晶体管数量不断增加,漏率呈指数级增长,使得功率成为所有集成电路设计的主要限制因素。未来的设计必须解决由于直接带到带隧道,通过MOSFET氧化物和陡峭的结掺杂梯度而出现的泄漏元件。在本文中,我们描述了管理待机期间泄漏功率和限制主动工作期间泄漏功率贡献的电路设计技术。研究了不同方法的功效、设计努力和过程分支。这些方案主要针对手机等手持设备,因为由于电池容量有限,这些市场对低功耗的需求最为迫切。
{"title":"Managing standby and active mode leakage power in deep sub-micron design","authors":"L. Clark, Rakesh J. Patel, T. Beatty","doi":"10.1145/1013235.1013239","DOIUrl":"https://doi.org/10.1145/1013235.1013239","url":null,"abstract":"Scaling has allowed rising transistor counts per die and increases leakage at an exponential rate, making power a primary constraint in all integrated circuit designs. Future designs must address emerging leakage components due to direct band to band tunneling, through MOSFET oxides and at steep junction doping gradients. In this paper, we describe circuit design techniques for managing leakage power, both during standby and for limiting the leakage power contribution during active operation. The efficacy, design effort, and process ramifications of different approaches are examined. The schemes are primarily aimed at hand-held devices such as cell phones, since the needs for low power are most acute in these markets due to limited battery capacity.","PeriodicalId":120002,"journal":{"name":"Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114020602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Location cache: a low-power L2 cache system 位置缓存:一个低功耗的二级缓存系统
Rui Min, W. Jone, Yimin Hu
While set-associative caches incur fewer misses than direct-mapped caches, they typically have slower hit times and higher power consumption, when multiple tag and data banks are probed in parallel. This paper presents the location cache structure which significantly reduces the power consumption for large set-associative caches. We propose to use a small cache, called location cache to store the location of future cache references. If there is a hit in the location cache, the supported cache is accessed as a direct-mapped cache. Otherwise, the supported cache is referenced as a conventional set-associative cache. The worst case access latency of the location cache system is the same as that of a conventional cache. The location cache is virtually indexed so that operations on it can be performed in parallel with the TLB address translation. These advantages make it ideal for L2 cache systems where traditional way-predication strategies perform poorly. We used the CACTI cache model to evaluate the power consumption and access latency of proposed cache architecture. Simplescalar CPU simulator was used to produce final results. It is shown that the proposed location cache architecture is power-efficient. In the simulated cache configurations, up-to 47% of cache accessing energy and 25% of average cache access latency can be reduced.
虽然集合关联缓存比直接映射缓存产生更少的错误,但当并行探测多个标记和数据库时,它们通常具有更慢的命中时间和更高的功耗。本文提出了一种能够显著降低大型集合关联缓存功耗的位置缓存结构。我们建议使用一个小缓存,称为位置缓存来存储未来缓存引用的位置。如果在位置缓存中有命中,则支持的缓存将作为直接映射的缓存访问。否则,支持的缓存被引用为常规的集关联缓存。位置缓存系统的最坏情况访问时延与传统缓存相同。位置缓存是虚拟索引的,因此对它的操作可以与TLB地址转换并行执行。这些优点使其成为L2缓存系统的理想选择,在这些系统中,传统的方式预测策略表现不佳。我们使用CACTI缓存模型来评估所提出的缓存架构的功耗和访问延迟。使用Simplescalar CPU模拟器生成最终结果。实验结果表明,所提出的位置缓存架构是高效节能的。在模拟的缓存配置中,最多可以减少47%的缓存访问能量和25%的平均缓存访问延迟。
{"title":"Location cache: a low-power L2 cache system","authors":"Rui Min, W. Jone, Yimin Hu","doi":"10.1145/1013235.1013271","DOIUrl":"https://doi.org/10.1145/1013235.1013271","url":null,"abstract":"While set-associative caches incur fewer misses than direct-mapped caches, they typically have slower hit times and higher power consumption, when multiple tag and data banks are probed in parallel. This paper presents the location cache structure which significantly reduces the power consumption for large set-associative caches. We propose to use a small cache, called location cache to store the location of future cache references. If there is a hit in the location cache, the supported cache is accessed as a direct-mapped cache. Otherwise, the supported cache is referenced as a conventional set-associative cache. The worst case access latency of the location cache system is the same as that of a conventional cache. The location cache is virtually indexed so that operations on it can be performed in parallel with the TLB address translation. These advantages make it ideal for L2 cache systems where traditional way-predication strategies perform poorly. We used the CACTI cache model to evaluate the power consumption and access latency of proposed cache architecture. Simplescalar CPU simulator was used to produce final results. It is shown that the proposed location cache architecture is power-efficient. In the simulated cache configurations, up-to 47% of cache accessing energy and 25% of average cache access latency can be reduced.","PeriodicalId":120002,"journal":{"name":"Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133881562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
期刊
Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1