首页 > 最新文献

ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005.最新文献

英文 中文
Systematic power reduction and performance analysis of mismatch limited ADC designs 系统的功率降低和失配限制ADC设计的性能分析
P. Scholtens, D. Smola, M. Vertregt
This paper focuses on several methods to save power consumption in mismatch limited ADC designs, like flash and folding architectures. Migrating existing designs to a next submicron technology helps to reduce the power consumption significantly. It is shown that decreasing bandwidth and sample rate creates a more than linear reduction of the power consumption. Both of these methods are addressed in this paper. Also the balance between power consumption of the analog and digital circuitry is examined. An existing 6-bit 1.6GS/s ADC in 0.18/spl mu/m CMOS is transferred to a 0.12/spl mu/m technology. The sampling rate is reduced to 260MS/s, the measured ERBW to 124MHz while running at only 32mW. As the bandwidth is downscaled 5/spl times/, the power consumption is reduced by 10/spl times/, which results in an improved conversion efficiency. As the design topology is unaltered, the implemented design sets a reference for evaluation of any low-power technique.
本文重点介绍了几种在限制失配的ADC设计中节省功耗的方法,如闪存和折叠架构。将现有设计迁移到下一个亚微米技术有助于显着降低功耗。结果表明,降低带宽和采样率可以使功耗降低超过线性。本文讨论了这两种方法。同时对模拟电路和数字电路的功耗进行了平衡分析。现有的0.18/spl mu/m CMOS的6位1.6GS/s ADC被转换为0.12/spl mu/m技术。采样率降至260MS/s,测量的ERBW降至124MHz,而运行功率仅为32mW。由于带宽减小了5/spl倍,功耗降低了10/spl倍,从而提高了转换效率。由于设计拓扑不变,因此实现的设计为评估任何低功耗技术提供了参考。
{"title":"Systematic power reduction and performance analysis of mismatch limited ADC designs","authors":"P. Scholtens, D. Smola, M. Vertregt","doi":"10.1145/1077603.1077622","DOIUrl":"https://doi.org/10.1145/1077603.1077622","url":null,"abstract":"This paper focuses on several methods to save power consumption in mismatch limited ADC designs, like flash and folding architectures. Migrating existing designs to a next submicron technology helps to reduce the power consumption significantly. It is shown that decreasing bandwidth and sample rate creates a more than linear reduction of the power consumption. Both of these methods are addressed in this paper. Also the balance between power consumption of the analog and digital circuitry is examined. An existing 6-bit 1.6GS/s ADC in 0.18/spl mu/m CMOS is transferred to a 0.12/spl mu/m technology. The sampling rate is reduced to 260MS/s, the measured ERBW to 124MHz while running at only 32mW. As the bandwidth is downscaled 5/spl times/, the power consumption is reduced by 10/spl times/, which results in an improved conversion efficiency. As the design topology is unaltered, the implemented design sets a reference for evaluation of any low-power technique.","PeriodicalId":256018,"journal":{"name":"ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005.","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115876700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Energy-efficient and high-performance instruction fetch using a block-aware ISA 使用块感知ISA的节能和高性能指令获取
Ahmad Zmily, C. Kozyrakis
The front-end in superscalar processors must deliver high application performance in an energy-effective manner. Impediments such as multi-cycle instruction accesses, instruction-cache misses, and mispredictions reduce performance by 48% and increase energy consumption by 21%. This paper presents a block-aware instruction set architecture (BLISS) that defines basic block descriptors in addition to the actual instructions in a program. BLISS allows for a decoupled front-end that reduces the time and energy spent on misspeculated instructions. It also allows for accurate instruction prefetching and energy efficient instruction access. A BLISS-based front-end leads to 14% IPC, 16% total energy, and 83% energy-delay-squared product improvements for wide-issue processors.
超标量处理器的前端必须以节能的方式提供高应用性能。诸如多周期指令访问、指令缓存丢失和错误预测等障碍会使性能降低48%,并使能耗增加21%。本文提出了一种块感知指令集体系结构(BLISS),它除了定义程序中的实际指令外,还定义了基本的块描述符。BLISS允许一个解耦的前端,减少了在错误推测指令上花费的时间和精力。它还允许精确的指令预取和节能指令访问。基于bliss的前端可以为宽问题处理器带来14%的IPC、16%的总能耗和83%的能耗延迟平方产品改进。
{"title":"Energy-efficient and high-performance instruction fetch using a block-aware ISA","authors":"Ahmad Zmily, C. Kozyrakis","doi":"10.1145/1077603.1077614","DOIUrl":"https://doi.org/10.1145/1077603.1077614","url":null,"abstract":"The front-end in superscalar processors must deliver high application performance in an energy-effective manner. Impediments such as multi-cycle instruction accesses, instruction-cache misses, and mispredictions reduce performance by 48% and increase energy consumption by 21%. This paper presents a block-aware instruction set architecture (BLISS) that defines basic block descriptors in addition to the actual instructions in a program. BLISS allows for a decoupled front-end that reduces the time and energy spent on misspeculated instructions. It also allows for accurate instruction prefetching and energy efficient instruction access. A BLISS-based front-end leads to 14% IPC, 16% total energy, and 83% energy-delay-squared product improvements for wide-issue processors.","PeriodicalId":256018,"journal":{"name":"ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005.","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130270746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Peak temperature control and leakage reduction during binding in high level synthesis 高水平合成中结合过程中的峰值温度控制和泄漏减少
R. Mukherjee, S. Memik, G. Memik
Temperature is becoming a first rate design criterion in ASICs due to its negative impact on leakage power, reliability, performance, and packaging cost. Incorporating awareness of such lower level physical phenomenon in high level synthesis algorithms help to achieve better designs. In this work, we developed a temperature aware binding algorithm. Switching power of a module correlates with its operating temperature. The goal of our binding algorithm is to distribute the activity evenly across functional units. This approach avoids steep temperature differences between modules on a chip, hence, the occurrence of hot spots. Starting with a switching optimal binding solution, our algorithm iteratively minimizes the maximum temperature reached by the hottest functional unit. Our algorithm does not change the number of resources used in the original binding. We have used HotSpot, a temperature modeling tool, to simulate temperature of a number ASIC designs. Our binding algorithm reduces temperature reached by the hottest resource by 12.21/spl deg/C on average. Reducing the peak temperature has a positive impact on leakage as well. Our binding technique improves leakage power by 11.89%, and overall power by 3.32% on average at 130nm technology node compared to a switching optimal binding.
由于温度对泄漏功率、可靠性、性能和封装成本的负面影响,它正在成为asic的一流设计标准。在高级合成算法中加入这种低级物理现象的意识有助于实现更好的设计。在这项工作中,我们开发了一种温度感知绑定算法。模块的开关功率与其工作温度有关。我们绑定算法的目标是在各个功能单元之间均匀地分配活动。这种方法避免了芯片上模块之间的巨大温差,从而避免了热点的出现。我们的算法从切换最优绑定解开始,迭代最小化最热功能单元所达到的最高温度。我们的算法不会改变原始绑定中使用的资源数量。我们使用温度建模工具HotSpot对多个ASIC设计的温度进行了模拟。我们的绑定算法平均降低了最热资源达到的温度12.21/spl℃。降低峰值温度对泄漏也有积极的影响。我们的结合技术在130nm技术节点上的泄漏功率比切换最佳结合技术提高了11.89%,总功率平均提高了3.32%。
{"title":"Peak temperature control and leakage reduction during binding in high level synthesis","authors":"R. Mukherjee, S. Memik, G. Memik","doi":"10.1145/1077603.1077663","DOIUrl":"https://doi.org/10.1145/1077603.1077663","url":null,"abstract":"Temperature is becoming a first rate design criterion in ASICs due to its negative impact on leakage power, reliability, performance, and packaging cost. Incorporating awareness of such lower level physical phenomenon in high level synthesis algorithms help to achieve better designs. In this work, we developed a temperature aware binding algorithm. Switching power of a module correlates with its operating temperature. The goal of our binding algorithm is to distribute the activity evenly across functional units. This approach avoids steep temperature differences between modules on a chip, hence, the occurrence of hot spots. Starting with a switching optimal binding solution, our algorithm iteratively minimizes the maximum temperature reached by the hottest functional unit. Our algorithm does not change the number of resources used in the original binding. We have used HotSpot, a temperature modeling tool, to simulate temperature of a number ASIC designs. Our binding algorithm reduces temperature reached by the hottest resource by 12.21/spl deg/C on average. Reducing the peak temperature has a positive impact on leakage as well. Our binding technique improves leakage power by 11.89%, and overall power by 3.32% on average at 130nm technology node compared to a switching optimal binding.","PeriodicalId":256018,"journal":{"name":"ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005.","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134074791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Instruction packing: reducing power and delay of the dynamic scheduling logic 指令封装:减少动态调度逻辑的功耗和延迟
J. Sharkey, D. Ponomarev, K. Ghose, O. Ergin
The instruction scheduling logic used in modern superscalar microprocessors often relies on associative searching of the issue queue entries to dynamically wakeup instructions for the execution. Traditional designs use one issue queue entry for each instruction, regardless of the actual number of operands actively used in the wakeup process. In this paper we propose instruction packing - a novel microarchitectural technique that reduces both the delay and the power consumption of the issue queue by sharing the associative part of an issue queue entry between two instructions, each with at most one nonready register source operand at the time of dispatch. Our results show that instruction packing provides a 39% reduction of the whole issue queue power and 21.6% reduction in the wakeup delay with as little as 0.4% IPC degradation on the average across the simulated SPEC benchmarks.
现代超标量微处理器中使用的指令调度逻辑通常依赖于问题队列条目的关联搜索来动态唤醒指令以执行。传统的设计为每条指令使用一个问题队列条目,而不考虑唤醒过程中实际使用的操作数的数量。在本文中,我们提出了一种新的微架构技术——指令打包,它通过在两个指令之间共享一个问题队列条目的关联部分来降低问题队列的延迟和功耗,每个指令在调度时最多有一个非就绪寄存器源操作数。我们的结果表明,在模拟的SPEC基准测试中,指令打包使整个问题队列功率降低了39%,唤醒延迟降低了21.6%,IPC平均降低了0.4%。
{"title":"Instruction packing: reducing power and delay of the dynamic scheduling logic","authors":"J. Sharkey, D. Ponomarev, K. Ghose, O. Ergin","doi":"10.1145/1077603.1077613","DOIUrl":"https://doi.org/10.1145/1077603.1077613","url":null,"abstract":"The instruction scheduling logic used in modern superscalar microprocessors often relies on associative searching of the issue queue entries to dynamically wakeup instructions for the execution. Traditional designs use one issue queue entry for each instruction, regardless of the actual number of operands actively used in the wakeup process. In this paper we propose instruction packing - a novel microarchitectural technique that reduces both the delay and the power consumption of the issue queue by sharing the associative part of an issue queue entry between two instructions, each with at most one nonready register source operand at the time of dispatch. Our results show that instruction packing provides a 39% reduction of the whole issue queue power and 21.6% reduction in the wakeup delay with as little as 0.4% IPC degradation on the average across the simulated SPEC benchmarks.","PeriodicalId":256018,"journal":{"name":"ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131221854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Dataflow analysis for energy-efficient scratch-pad memory management 节能刮刮板内存管理的数据流分析
Guangyu Chen, M. Kandemir
Scratch-pad memories (SPMs) are a serious alternative to conventional cache memories in embedded computing since they allow software to manage data flowing from and into memory components, resulting in a predictable behavior at runtime. The prior studies considered compiler-directed SPM management using both static and dynamic approaches. One of the assumptions under which most of the proposed approaches to data SPM management operate is that the application code is structured with regular loop nests with little or no control flow within the loops. This assumption, while it makes data SPM management relatively easy to implement, limits the applicability of those approaches to the codes involve conditional execution and complex control flows. To address this problem, this paper proposes a novel data SPM management strategy based on dataflow analysis. This analysis operates on a representation that reflects the conditional execution flow of the application and, consequently, it is applicable to a large class of embedded applications, including those with complex control flows.
在嵌入式计算中,刮擦板存储器(spm)是传统缓存存储器的重要替代品,因为它们允许软件管理进出内存组件的数据流,从而在运行时产生可预测的行为。先前的研究考虑了使用静态和动态方法的编译器导向的SPM管理。大多数提出的数据SPM管理方法所依据的一个假设是,应用程序代码由规则的循环巢构成,循环中很少或没有控制流。这种假设虽然使数据SPM管理相对容易实现,但限制了这些方法对涉及条件执行和复杂控制流的代码的适用性。针对这一问题,本文提出了一种基于数据流分析的数据SPM管理策略。这种分析对反映应用程序条件执行流的表示进行操作,因此,它适用于大量嵌入式应用程序,包括那些具有复杂控制流的应用程序。
{"title":"Dataflow analysis for energy-efficient scratch-pad memory management","authors":"Guangyu Chen, M. Kandemir","doi":"10.1145/1077603.1077682","DOIUrl":"https://doi.org/10.1145/1077603.1077682","url":null,"abstract":"Scratch-pad memories (SPMs) are a serious alternative to conventional cache memories in embedded computing since they allow software to manage data flowing from and into memory components, resulting in a predictable behavior at runtime. The prior studies considered compiler-directed SPM management using both static and dynamic approaches. One of the assumptions under which most of the proposed approaches to data SPM management operate is that the application code is structured with regular loop nests with little or no control flow within the loops. This assumption, while it makes data SPM management relatively easy to implement, limits the applicability of those approaches to the codes involve conditional execution and complex control flows. To address this problem, this paper proposes a novel data SPM management strategy based on dataflow analysis. This analysis operates on a representation that reflects the conditional execution flow of the application and, consequently, it is applicable to a large class of embedded applications, including those with complex control flows.","PeriodicalId":256018,"journal":{"name":"ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005.","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132402633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Improving energy efficiency by making DRAM less randomly accessed 通过减少随机存取DRAM来提高能源效率
Hai Huang, K. Shin, C. Lefurgy, T. Keller
Existing techniques manage power for the main memory by passively monitoring the memory traffic, and based on which, predict when to power down and into which low-power state to transition. However, passively monitoring the memory traffic can be far from being effective as idle periods between consecutive memory accesses are often too short for existing power-management techniques to take full advantage of the deeper power-saving state implemented in modem DRAM architectures. In this paper, the authors proposed a new technique that will actively reshape the memory traffic to coalesce short idle periods - which were previously unusable for power management - into longer ones, thus enabling existing techniques to effectively exploit idleness in the memory.
现有技术通过被动监控内存流量来管理主存储器的电源,并在此基础上预测何时断电以及转换到哪种低功耗状态。然而,被动地监控内存流量可能远远不够有效,因为连续内存访问之间的空闲时间通常太短,现有的电源管理技术无法充分利用现代DRAM体系结构中实现的更深层次的节能状态。在本文中,作者提出了一种新技术,该技术将主动重塑内存流量,将以前无法用于电源管理的短空闲时间合并为较长的空闲时间,从而使现有技术能够有效地利用内存中的空闲时间。
{"title":"Improving energy efficiency by making DRAM less randomly accessed","authors":"Hai Huang, K. Shin, C. Lefurgy, T. Keller","doi":"10.1145/1077603.1077696","DOIUrl":"https://doi.org/10.1145/1077603.1077696","url":null,"abstract":"Existing techniques manage power for the main memory by passively monitoring the memory traffic, and based on which, predict when to power down and into which low-power state to transition. However, passively monitoring the memory traffic can be far from being effective as idle periods between consecutive memory accesses are often too short for existing power-management techniques to take full advantage of the deeper power-saving state implemented in modem DRAM architectures. In this paper, the authors proposed a new technique that will actively reshape the memory traffic to coalesce short idle periods - which were previously unusable for power management - into longer ones, thus enabling existing techniques to effectively exploit idleness in the memory.","PeriodicalId":256018,"journal":{"name":"ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005.","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115249193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 105
A simple mechanism to adapt leakage-control policies to temperature 一个简单的机制,使泄漏控制策略适应温度
S. Kaxiras, Polychronis Xekalakis, G. Keramidas
Leakage power reduction in cache memories continues to be a critical area of research because of the promise of a significant pay-off. Various techniques have been developed so far that can be broadly categorized into state-preserving (e.g., drowsy caches) and nonstate preserving (e.g., cache decay). Decay saves more leakage but also incurs dynamic power overhead in the form of induced misses. Previous work has shown that depending on the leakage vs. dynamic power trade-off, one or the other technique can be better. Several factors such as cache architecture, technology parameters and temperature, affect this trade-off. Our work proposes the first mechanism - to the best of our knowledge - that takes into account temperature in adjusting the leakage control policy at run time. At very low temperatures, leakage is relatively weak so the need to tightly control it is not as important as the need to minimize extra dynamic power (e.g., decay-induced misses) or performance loss. We use a hybrid decay+drowsy policy where the main benefit comes from decaying cache lines while the drowsy mode is used to save leakage in long decay intervals. To adapt the decay mode to temperature, we propose a simple triggering mechanism that is based on the principles of decaying 4T thermal sensors and, as such, tied to temperature. The hotter the cache is, the faster cache lines are decayed since it is beneficial to do so with very high leakage currents. Conversely, when the cache temperature is low, our mechanism defers putting cache lines in decay mode to avoid dynamic power overhead but still saves a significant amount of leakage using the drowsy mode. Our study shows that across a wide range of temperatures, the simple adaptability of our proposal yields consistently better results than either the decay mode, or drowsy mode alone, improving over the best by as much as 33%.
降低高速缓存存储器的泄漏功率一直是一个关键的研究领域,因为它有望带来显著的回报。到目前为止,已经开发了各种各样的技术,可以大致分为状态保持(例如,休眠缓存)和非状态保持(例如,缓存衰减)。衰减可以节省更多的泄漏,但也会以诱导缺失的形式产生动态功率开销。先前的工作表明,根据泄漏与动态功率的权衡,一种或另一种技术可能更好。有几个因素,如缓存架构、技术参数和温度,会影响这种权衡。我们的工作提出了第一种机制-据我们所知-在运行时考虑温度来调整泄漏控制策略。在非常低的温度下,泄漏相对较弱,因此严格控制泄漏的必要性并不像最小化额外动态功率(例如,衰减引起的缺失)或性能损失那样重要。我们使用混合衰减+嗜睡策略,其主要好处来自于衰减缓存线,而嗜睡模式用于在长衰减间隔中节省泄漏。为了使衰减模式适应温度,我们提出了一种简单的触发机制,该机制基于4T热传感器的衰减原理,因此与温度相关联。高速缓存温度越高,高速缓存线路的衰减速度越快,因为在泄漏电流非常大的情况下这样做是有益的。相反,当缓存温度较低时,我们的机制延迟将缓存线置于衰减模式以避免动态功率开销,但使用休眠模式仍然可以节省大量泄漏。我们的研究表明,在广泛的温度范围内,我们的提议的简单适应性始终比衰变模式或单独的嗜睡模式产生更好的结果,比最佳模式提高了33%。
{"title":"A simple mechanism to adapt leakage-control policies to temperature","authors":"S. Kaxiras, Polychronis Xekalakis, G. Keramidas","doi":"10.1145/1077603.1077617","DOIUrl":"https://doi.org/10.1145/1077603.1077617","url":null,"abstract":"Leakage power reduction in cache memories continues to be a critical area of research because of the promise of a significant pay-off. Various techniques have been developed so far that can be broadly categorized into state-preserving (e.g., drowsy caches) and nonstate preserving (e.g., cache decay). Decay saves more leakage but also incurs dynamic power overhead in the form of induced misses. Previous work has shown that depending on the leakage vs. dynamic power trade-off, one or the other technique can be better. Several factors such as cache architecture, technology parameters and temperature, affect this trade-off. Our work proposes the first mechanism - to the best of our knowledge - that takes into account temperature in adjusting the leakage control policy at run time. At very low temperatures, leakage is relatively weak so the need to tightly control it is not as important as the need to minimize extra dynamic power (e.g., decay-induced misses) or performance loss. We use a hybrid decay+drowsy policy where the main benefit comes from decaying cache lines while the drowsy mode is used to save leakage in long decay intervals. To adapt the decay mode to temperature, we propose a simple triggering mechanism that is based on the principles of decaying 4T thermal sensors and, as such, tied to temperature. The hotter the cache is, the faster cache lines are decayed since it is beneficial to do so with very high leakage currents. Conversely, when the cache temperature is low, our mechanism defers putting cache lines in decay mode to avoid dynamic power overhead but still saves a significant amount of leakage using the drowsy mode. Our study shows that across a wide range of temperatures, the simple adaptability of our proposal yields consistently better results than either the decay mode, or drowsy mode alone, improving over the best by as much as 33%.","PeriodicalId":256018,"journal":{"name":"ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005.","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115408536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Runtime identification of microprocessor energy saving opportunities 运行时识别微处理器的节能机会
W. Bircher, M. Valluri, J. Law, L. John
High power consumption and low energy efficiency have become significant impediments to future performance improvements in modern microprocessors. This paper contributes to the solution of these problems by presenting: linear regression models for power consumption and a detailed study of energy efficiency in a modern out-of-order superscalar microprocessor. These simple (2-input) yet accurate (2.6% error) models provide a valuable tool for identifying opportunities to apply power saving techniques such as clock throttling and dynamic voltage scaling (DVS). Also, future work in improving energy efficiency is motivated by a detailed analysis of SPEC CPU 2000 workloads. The vast majority of workloads are found to yield very low energy efficiency due to the frequency of level two (L2) cache misses and misspeculated instructions.
高功耗和低能源效率已经成为现代微处理器未来性能改进的重大障碍。本文通过对现代无序超标量微处理器的功耗线性回归模型和能效的详细研究,为解决这些问题做出了贡献。这些简单(2输入)但准确(2.6%误差)的模型为识别应用时钟节流和动态电压缩放(DVS)等节能技术的机会提供了有价值的工具。此外,对SPEC CPU 2000工作负载的详细分析将推动未来在提高能源效率方面的工作。由于二级(L2)缓存丢失和错误推测指令的频率,发现绝大多数工作负载产生非常低的能源效率。
{"title":"Runtime identification of microprocessor energy saving opportunities","authors":"W. Bircher, M. Valluri, J. Law, L. John","doi":"10.1145/1077603.1077668","DOIUrl":"https://doi.org/10.1145/1077603.1077668","url":null,"abstract":"High power consumption and low energy efficiency have become significant impediments to future performance improvements in modern microprocessors. This paper contributes to the solution of these problems by presenting: linear regression models for power consumption and a detailed study of energy efficiency in a modern out-of-order superscalar microprocessor. These simple (2-input) yet accurate (2.6% error) models provide a valuable tool for identifying opportunities to apply power saving techniques such as clock throttling and dynamic voltage scaling (DVS). Also, future work in improving energy efficiency is motivated by a detailed analysis of SPEC CPU 2000 workloads. The vast majority of workloads are found to yield very low energy efficiency due to the frequency of level two (L2) cache misses and misspeculated instructions.","PeriodicalId":256018,"journal":{"name":"ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005.","volume":"310 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115913053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 112
Replacing global wires with an on-chip network: a power analysis 用片上网络代替全球电线:功率分析
Seongmoo Heo, K. Asanović
This paper explores the power implications of replacing global chip wires with an on-chip network. The authors optimized the network links by varying repeater spacing, link pipelining, and voltage scaling, to significantly reduce the energy to send a bit across chip. An analytic model of large chip designs with an on-chip two-dimensional mesh network was developed and the power savings possible in a 70 nm process for two different design points: a circuit-switched ASIC or FPGA design, and a dynamic packet-switched tiled architecture were estimated. For circuit-switched networks, achievable power savings are 35-50% for a mesh with 1 mm links. The packet switched designs use multiplexing and signal encoding to reduce the number of link wires required, but the router overhead limits peak wire power savings to around 20% with optimal tile sizes of around 2 mm.
本文探讨了用片内网络取代全球芯片线的功率含义。作者通过改变中继器间距、链路管道和电压缩放来优化网络链路,以显着减少跨芯片发送比特的能量。建立了基于片上二维网格网络的大型芯片设计的分析模型,并对电路交换ASIC或FPGA设计和动态分组交换平铺结构两种不同设计点在70 nm工艺中可能节省的功耗进行了估计。对于电路交换网络,对于1毫米链路的网格,可实现的功耗节省为35-50%。分组交换设计使用多路复用和信号编码来减少所需的链路数量,但路由器开销限制了峰值线功率节约约20%,最佳瓦片尺寸约为2毫米。
{"title":"Replacing global wires with an on-chip network: a power analysis","authors":"Seongmoo Heo, K. Asanović","doi":"10.1145/1077603.1077692","DOIUrl":"https://doi.org/10.1145/1077603.1077692","url":null,"abstract":"This paper explores the power implications of replacing global chip wires with an on-chip network. The authors optimized the network links by varying repeater spacing, link pipelining, and voltage scaling, to significantly reduce the energy to send a bit across chip. An analytic model of large chip designs with an on-chip two-dimensional mesh network was developed and the power savings possible in a 70 nm process for two different design points: a circuit-switched ASIC or FPGA design, and a dynamic packet-switched tiled architecture were estimated. For circuit-switched networks, achievable power savings are 35-50% for a mesh with 1 mm links. The packet switched designs use multiplexing and signal encoding to reduce the number of link wires required, but the router overhead limits peak wire power savings to around 20% with optimal tile sizes of around 2 mm.","PeriodicalId":256018,"journal":{"name":"ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005.","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128819244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 68
Modeling and analysis of total leakage currents in nanoscale double gate devices and circuits 纳米双栅器件和电路中总泄漏电流的建模与分析
S. Mukhopadhyay, Keunwoo Kim, C. Chuang, K. Roy
In this paper we model (numerically and analytically) and analyze sub-threshold, gate-to-channel tunneling, and edge direct tunneling leakage in double gate (DG) devices. We compare the leakage of different DG structures, namely, doped body symmetric device with polysilicon gates, intrinsic body symmetric device with metal gates and intrinsic body asymmetric device with different front and back gate material. It is observed that, use of (near-mid-gap) metal gate and intrinsic body devices significantly reduces both the total leakage and its sensitivity to parametric variations in DG circuits.
本文对双栅极(DG)器件的亚阈值泄漏、栅极-通道隧道泄漏和边缘直接隧道泄漏进行了数值模拟和分析。我们比较了不同DG结构,即掺杂多晶硅栅极的掺杂体对称器件、金属栅极的本构体对称器件和不同前后栅极材料的本构体非对称器件的漏量。可以观察到,在DG电路中,使用(近中隙)金属栅极和本然体器件可以显著降低总泄漏及其对参数变化的灵敏度。
{"title":"Modeling and analysis of total leakage currents in nanoscale double gate devices and circuits","authors":"S. Mukhopadhyay, Keunwoo Kim, C. Chuang, K. Roy","doi":"10.1145/1077603.1077608","DOIUrl":"https://doi.org/10.1145/1077603.1077608","url":null,"abstract":"In this paper we model (numerically and analytically) and analyze sub-threshold, gate-to-channel tunneling, and edge direct tunneling leakage in double gate (DG) devices. We compare the leakage of different DG structures, namely, doped body symmetric device with polysilicon gates, intrinsic body symmetric device with metal gates and intrinsic body asymmetric device with different front and back gate material. It is observed that, use of (near-mid-gap) metal gate and intrinsic body devices significantly reduces both the total leakage and its sensitivity to parametric variations in DG circuits.","PeriodicalId":256018,"journal":{"name":"ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005.","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130815723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
期刊
ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005.
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1