首页 > 最新文献

2007 25th International Conference on Computer Design最新文献

英文 中文
Voltage drop reduction for on-chip power delivery considering leakage current variations 考虑泄漏电流变化的片上供电电压降降低
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601883
Jeffrey Fan, N. Mi, S. Tan
In this paper, we propose a novel on-chip voltage drop reduction technique for on-chip power delivery networks of VLSI systems in the presence of variational leakage current sources. The new method inserts decoupling capacitors (decaps) into the power grid networks to reduce the voltage fluctuation. The optimization is based on sensitivity-based conjugate gradientmethod and sequence of linear programming approach. Different from existing power grid noise reduction methods, the new approach considers the impacts of inter-die and intra-die variational leakage current sources due to unavoidable process variability during the decap optimization process for the first time. Leakage currents, which although are static in nature typically, can still add to the total voltage drops and dynamic voltage reduction thus must consider the leakage-induced voltage variations. The proposed algorithm exploits the relative constant variations for different decap configurations of power grid circuits to speed up the statistical optimization process. Decaps can be inserted in such a way that the resulting circuits have much higher probability to meet the voltage drop constraints in the presence of leakage current variations. Experimental results demonstrate the effectiveness of the proposed approach and show that the new method has 100X to 1,000X of speedup over the Monte Carlo based statistical decap optimization method.
在本文中,我们提出了一种新的片上压降降低技术,用于存在变漏电流源的超大规模集成电路系统的片上供电网络。该方法通过在电网中插入去耦电容器来减小电压波动。优化方法采用基于灵敏度的共轭梯度法和序列线性规划方法。与现有电网降噪方法不同的是,该方法首次考虑了封装优化过程中不可避免的工艺变异性对模间和模内变漏电流源的影响。泄漏电流虽然通常是静态的,但仍然会增加总电压降和动态电压降低,因此必须考虑泄漏引起的电压变化。该算法利用了电网电路不同电容配置的相对常数变化,加快了统计优化过程。可以这样一种方式插入deccap,使得在所得到的电路在存在漏电流变化的情况下具有更高的概率满足压降约束。实验结果证明了该方法的有效性,并表明该方法比基于蒙特卡罗的统计decap优化方法的速度提高了100到1000倍。
{"title":"Voltage drop reduction for on-chip power delivery considering leakage current variations","authors":"Jeffrey Fan, N. Mi, S. Tan","doi":"10.1109/ICCD.2007.4601883","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601883","url":null,"abstract":"In this paper, we propose a novel on-chip voltage drop reduction technique for on-chip power delivery networks of VLSI systems in the presence of variational leakage current sources. The new method inserts decoupling capacitors (decaps) into the power grid networks to reduce the voltage fluctuation. The optimization is based on sensitivity-based conjugate gradientmethod and sequence of linear programming approach. Different from existing power grid noise reduction methods, the new approach considers the impacts of inter-die and intra-die variational leakage current sources due to unavoidable process variability during the decap optimization process for the first time. Leakage currents, which although are static in nature typically, can still add to the total voltage drops and dynamic voltage reduction thus must consider the leakage-induced voltage variations. The proposed algorithm exploits the relative constant variations for different decap configurations of power grid circuits to speed up the statistical optimization process. Decaps can be inserted in such a way that the resulting circuits have much higher probability to meet the voltage drop constraints in the presence of leakage current variations. Experimental results demonstrate the effectiveness of the proposed approach and show that the new method has 100X to 1,000X of speedup over the Monte Carlo based statistical decap optimization method.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"79 1","pages":"78-83"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73319654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
VIZOR: Virtually zero margin adaptive RF for ultra low power wireless communication VIZOR:用于超低功耗无线通信的几乎零边际自适应射频
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601956
R. Senguttuvan, Shreyas Sen, A. Chatterjee
Modern wireless transceiver systems are often overdesigned to meet the requirements of low bit error rate values at high data rates under worst-case channel operating conditions (interference, noise, multi-path effects). This results in circuits being designed with ldquosufficientrdquo margins leading to lower efficiency and high power consumption. In this paper, we develop an adaptive power management strategy for RF systems that optimally trades-off power vs. performance for the RF front-end to maintain operation at or below a specified maximum bit error rate (BER) across temporally changing operating conditions. As the communication channel degrades, more power is consumed by the RF front end and vice versa. Since the maximum bit-error rate specification is not violated, minimum voice or video quality through the wireless channel is always guaranteed.
现代无线收发器系统通常被过度设计,以满足在最坏的信道工作条件下(干扰、噪声、多径效应)在高数据速率下的低误码率值的要求。这导致电路设计的余量不足,导致效率降低和功耗高。在本文中,我们为射频系统开发了一种自适应电源管理策略,该策略可以在射频前端的功率与性能之间进行最佳权衡,从而在临时变化的操作条件下保持在指定的最大误码率(BER)或以下的运行。随着通信信道的退化,射频前端消耗更多的功率,反之亦然。由于不违反最大误码率规范,通过无线信道的最低语音或视频质量始终得到保证。
{"title":"VIZOR: Virtually zero margin adaptive RF for ultra low power wireless communication","authors":"R. Senguttuvan, Shreyas Sen, A. Chatterjee","doi":"10.1109/ICCD.2007.4601956","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601956","url":null,"abstract":"Modern wireless transceiver systems are often overdesigned to meet the requirements of low bit error rate values at high data rates under worst-case channel operating conditions (interference, noise, multi-path effects). This results in circuits being designed with ldquosufficientrdquo margins leading to lower efficiency and high power consumption. In this paper, we develop an adaptive power management strategy for RF systems that optimally trades-off power vs. performance for the RF front-end to maintain operation at or below a specified maximum bit error rate (BER) across temporally changing operating conditions. As the communication channel degrades, more power is consumed by the RF front end and vice versa. Since the maximum bit-error rate specification is not violated, minimum voice or video quality through the wireless channel is always guaranteed.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"61 1","pages":"580-586"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78313110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
CAP: Criticality analysis for power-efficient speculative multithreading CAP:高能效推测多线程的临界性分析
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601932
James Tuck, Wei Liu, J. Torrellas
While speculative multithreading (SM) on a chip multiprocessor (CMP) has the ability to speed-up hard-to- parallelize applications, the power inefficiency of aggressive speculation is a concern. To improve SMs power effeciency, we note that not all the tasks that are running in a SM environment are equally critical. To leverage this insight, this paper develops a novel, widely-applicable task-criticality model for SM. It also proposes CAP, a novel architecture that builds a task-criticality graph dynamically and uses it to make scheduling decisions in a SM CMP. Experiments with SPECint, SPECfp, and Olden applications show that, in a CMP with one fast core and three slow ones, the E D2 with CAP is, on average, 91-95% of that without. Moreover, it is only 77-91% of the E D2 of a CMP with four fast cores and no CAP. Overall, we argue that scheduling for task criticality is beneficial.
虽然芯片多处理器(CMP)上的推测性多线程(SM)能够加速难以并行化的应用程序,但积极推测的功率低效率是一个问题。为了提高SMs电源效率,我们注意到并非在SMs环境中运行的所有任务都同样重要。为了利用这一见解,本文为SM开发了一个新颖的、广泛适用的任务临界性模型。本文还提出了一种新的体系结构CAP,它可以动态地构建任务关键度图,并用它来制定SM CMP中的调度决策。对SPECint, SPECfp和Olden应用程序的实验表明,在具有一个快核和三个慢核的CMP中,有CAP的D2平均是没有CAP的91-95%。此外,它仅为具有四个快速核心且没有CAP的CMP的77-91%的E D2。总的来说,我们认为任务临界调度是有益的。
{"title":"CAP: Criticality analysis for power-efficient speculative multithreading","authors":"James Tuck, Wei Liu, J. Torrellas","doi":"10.1109/ICCD.2007.4601932","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601932","url":null,"abstract":"While speculative multithreading (SM) on a chip multiprocessor (CMP) has the ability to speed-up hard-to- parallelize applications, the power inefficiency of aggressive speculation is a concern. To improve SMs power effeciency, we note that not all the tasks that are running in a SM environment are equally critical. To leverage this insight, this paper develops a novel, widely-applicable task-criticality model for SM. It also proposes CAP, a novel architecture that builds a task-criticality graph dynamically and uses it to make scheduling decisions in a SM CMP. Experiments with SPECint, SPECfp, and Olden applications show that, in a CMP with one fast core and three slow ones, the E D2 with CAP is, on average, 91-95% of that without. Moreover, it is only 77-91% of the E D2 of a CMP with four fast cores and no CAP. Overall, we argue that scheduling for task criticality is beneficial.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"41 1","pages":"409-416"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73711647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Power-aware mapping for reconfigurable NoC architectures 可重构NoC架构的功率感知映射
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601933
M. Modarressi, H. Sarbazi-Azad
A core mapping method for reconfigurable network-on-chip (NoC) architectures is presented in this paper. In most of the existing methods, mapping is carried out based on the traffic characteristics of a single application. However, several different applications are implemented and integrated in the modern complex system-on-chips which should be considered by mapping methods. In the proposed method, the reconfiguration (which is achieved by embedding programmable switches between routers of a mesh-based NoC) allows us to dynamically change the network topology in order to adapt it with the running application and optimize the power and performance metrics. The presented network architecture can be configured as an application- specific topology, while it still holds the benefits of the regular NoC topologies such as modularity and predictable electrical properties. The experimental results show that this method can effectively adapt the NoC to the running application and improve the power consumption and performance of the system.
提出了一种可重构片上网络(NoC)体系结构的核心映射方法。在现有的大多数方法中,映射是基于单个应用的流量特征进行的。然而,在现代复杂的片上系统中实现和集成了几种不同的应用,这应该通过映射方法来考虑。在提出的方法中,重构(通过在基于网格的NoC的路由器之间嵌入可编程交换机实现)允许我们动态改变网络拓扑,以适应运行的应用程序并优化功耗和性能指标。所提出的网络体系结构可以配置为特定于应用程序的拓扑,同时它仍然具有常规NoC拓扑的优点,例如模块化和可预测的电气特性。实验结果表明,该方法能有效地使NoC适应运行应用,提高系统功耗和性能。
{"title":"Power-aware mapping for reconfigurable NoC architectures","authors":"M. Modarressi, H. Sarbazi-Azad","doi":"10.1109/ICCD.2007.4601933","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601933","url":null,"abstract":"A core mapping method for reconfigurable network-on-chip (NoC) architectures is presented in this paper. In most of the existing methods, mapping is carried out based on the traffic characteristics of a single application. However, several different applications are implemented and integrated in the modern complex system-on-chips which should be considered by mapping methods. In the proposed method, the reconfiguration (which is achieved by embedding programmable switches between routers of a mesh-based NoC) allows us to dynamically change the network topology in order to adapt it with the running application and optimize the power and performance metrics. The presented network architecture can be configured as an application- specific topology, while it still holds the benefits of the regular NoC topologies such as modularity and predictable electrical properties. The experimental results show that this method can effectively adapt the NoC to the running application and improve the power consumption and performance of the system.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"1 1","pages":"417-422"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79920797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Hybrid resistor/FET-logic demultiplexer architecture design for hybrid CMOS/nanodevice circuits CMOS/纳米器件混合电路的混合电阻/场效应晶体管逻辑解复用器架构设计
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601955
Shu Li, Tong Zhang
Hybrid nanoelectronics are emerging as one viable option to sustain the Moorepsilas Law after the CMOS scaling limit is reached. One main design challenge in hybrid nanoelectronics is the interface (named as demux) between the highly dense nanowires in nanodevice crossbars and relatively coarse microwires in CMOS domain. The prior work on demux design use a single type of devices to realize the demultiplexing function, but hardly provides a satisfactory solution. This work proposes to combine resistor with FET to implement the demux, leading to the so-called hybrid resistor/FET-logic demux. Such hybrid demux architecture can make these two types of devices well complement each other to improve the overall demux design effectiveness. Furthermore, the effects of resistor conductance variability are analyzed and evaluated based on computer simulations.
在CMOS达到尺度限制后,混合纳米电子学正在成为维持摩尔塞拉斯定律的可行选择。混合纳米电子学的一个主要设计挑战是纳米器件交叉棒中高密度纳米线与CMOS领域中相对粗糙的微线之间的界面(称为demux)。以往的解复用设计都是使用单一类型的器件来实现解复用功能,但很难提供令人满意的解决方案。这项工作提出结合电阻与场效应管来实现demux,导致所谓的混合电阻/场效应管逻辑demux。这种混合demux架构可以使这两类器件很好地互补,提高整体demux设计效率。此外,在计算机模拟的基础上,对电阻器电导变化的影响进行了分析和评价。
{"title":"Hybrid resistor/FET-logic demultiplexer architecture design for hybrid CMOS/nanodevice circuits","authors":"Shu Li, Tong Zhang","doi":"10.1109/ICCD.2007.4601955","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601955","url":null,"abstract":"Hybrid nanoelectronics are emerging as one viable option to sustain the Moorepsilas Law after the CMOS scaling limit is reached. One main design challenge in hybrid nanoelectronics is the interface (named as demux) between the highly dense nanowires in nanodevice crossbars and relatively coarse microwires in CMOS domain. The prior work on demux design use a single type of devices to realize the demultiplexing function, but hardly provides a satisfactory solution. This work proposes to combine resistor with FET to implement the demux, leading to the so-called hybrid resistor/FET-logic demux. Such hybrid demux architecture can make these two types of devices well complement each other to improve the overall demux design effectiveness. Furthermore, the effects of resistor conductance variability are analyzed and evaluated based on computer simulations.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"27 1","pages":"574-579"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83690749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Improving cache efficiency via resizing + remapping 通过调整大小和重新映射来提高缓存效率
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601879
Subramanian Ramaswamy, S. Yalamanchili
In this paper we propose techniques to dynamically downsize or upsize a cache accompanied by cache set/line shutdown to produce efficient caches. Unlike previous approaches, resizing is accompanied by a non-uniform remapping of memory into the resized cache, thus avoiding misses to sets/lines that are shut off. The paper first provides an analysis into the causes of energy inefficiencies revealing a simple model for improving efficiency. Based on this model we propose the concept of "folding" - memory regions mapping to disjoint cache resources are combined to share cache sets producing a new placement function. Folding enables powering down cache sets at the expense of possibly increasing conflict misses. Effective folding heuristics can substantially increase energy efficiency at the expense of acceptable increase in execution time. We target the 12 cache because of its larger size and greater energy consumption. Our techniques increase cache energy efficiency by 20%, and reduce the EDP (energy delay product) by up to 45% with an IPC degradation of less than 4%. The results also indicate opportunity for improving cache efficiencies further via cooperative compiler interactions.
在本文中,我们提出了动态缩小或增大缓存的技术,同时关闭缓存集/行以产生高效的缓存。与以前的方法不同,调整大小伴随着将内存重新映射到调整大小的缓存中,从而避免错过关闭的集/行。本文首先对能源效率低下的原因进行了分析,揭示了一个提高效率的简单模型。基于该模型,我们提出了“折叠”的概念——将映射到不相交的缓存资源的存储区域组合在一起以共享缓存集,从而产生新的放置函数。折叠可以关闭缓存集,但代价是可能增加冲突丢失。有效的折叠启发式可以在可接受的执行时间增加的代价下大幅提高能源效率。我们的目标是12缓存,因为它的大小和能耗更大。我们的技术将缓存能源效率提高了20%,并将EDP(能量延迟产品)降低了45%,IPC退化低于4%。结果还指出了通过协作编译器交互进一步提高缓存效率的机会。
{"title":"Improving cache efficiency via resizing + remapping","authors":"Subramanian Ramaswamy, S. Yalamanchili","doi":"10.1109/ICCD.2007.4601879","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601879","url":null,"abstract":"In this paper we propose techniques to dynamically downsize or upsize a cache accompanied by cache set/line shutdown to produce efficient caches. Unlike previous approaches, resizing is accompanied by a non-uniform remapping of memory into the resized cache, thus avoiding misses to sets/lines that are shut off. The paper first provides an analysis into the causes of energy inefficiencies revealing a simple model for improving efficiency. Based on this model we propose the concept of \"folding\" - memory regions mapping to disjoint cache resources are combined to share cache sets producing a new placement function. Folding enables powering down cache sets at the expense of possibly increasing conflict misses. Effective folding heuristics can substantially increase energy efficiency at the expense of acceptable increase in execution time. We target the 12 cache because of its larger size and greater energy consumption. Our techniques increase cache energy efficiency by 20%, and reduce the EDP (energy delay product) by up to 45% with an IPC degradation of less than 4%. The results also indicate opportunity for improving cache efficiencies further via cooperative compiler interactions.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"12 1","pages":"47-54"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83623566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Distributed voting for fault-tolerant nanoscale systems 容错纳米级系统的分布式投票
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601954
A. Namazi, M. Nourani
In this paper, we propose a distributed voting strategy to design a robust NMR system. We show that using inexpensive current-based drivers and buffers, we can completely eliminate the centralized voter unit and do the majority voting among N modules in a distributed fashion. Our strategy achieves high reliability that is vital for future nano systems in which high defect rate is expected. Experimental results are also reported to verify the concept, clarify the design procedure and measure the system's reliability.
在本文中,我们提出了一种分布式投票策略来设计一个鲁棒核磁共振系统。我们表明,使用廉价的基于电流的驱动器和缓冲区,我们可以完全消除集中式选民单元,并以分布式方式在N个模块中进行多数投票。我们的策略实现了高可靠性,这对未来高缺陷率的纳米系统至关重要。实验结果验证了系统的设计思想,阐明了系统的设计过程,并对系统的可靠性进行了测试。
{"title":"Distributed voting for fault-tolerant nanoscale systems","authors":"A. Namazi, M. Nourani","doi":"10.1109/ICCD.2007.4601954","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601954","url":null,"abstract":"In this paper, we propose a distributed voting strategy to design a robust NMR system. We show that using inexpensive current-based drivers and buffers, we can completely eliminate the centralized voter unit and do the majority voting among N modules in a distributed fashion. Our strategy achieves high reliability that is vital for future nano systems in which high defect rate is expected. Experimental results are also reported to verify the concept, clarify the design procedure and measure the system's reliability.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"1 1","pages":"568-573"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86418302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Reducing leakage power in peripheral circuits of L2 caches 降低L2缓存外围电路漏功率
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601907
H. Homayoun, A. Veidenbaum
Leakage power has grown significantly and is a major challenge in microprocessor design. Leakage is the dominant power component in second-level (L2) caches. This paper presents two architectural techniques to utilize leakage reduction circuits in L2 caches. They primarily target the leakage in the peripheral circuitry of an L2 cache and as such have to be able to cope with longer delays. One technique exploits the fact that processor activity decreases significantly after an L2 cache miss occurs and saves power during L2 miss service time. Two algorithms, a static one and an adaptive one, are proposed for deciding when to apply this leakage reduction technique. Another technique attempts to keep the peripheral circuits in a lower-power state most of the time. The results for SPEC2K benchmarks show that the first technique can achieve a 18 to 22% reduction in L2 power consumption, on average (and up to 63%), depending on the decision algorithm. The second technique can save 25%, on average (and up to 80%). This comes with a negligible 1 to 2% performance impact, on average, depending on the technique used.
泄漏功率已显著增长,是微处理器设计的主要挑战。漏电是二级(L2)缓存中的主要功率成分。本文提出了在L2缓存中利用泄漏减少电路的两种体系结构技术。它们主要针对L2缓存外围电路中的泄漏,因此必须能够应对更长的延迟。一种技术利用了这样一个事实,即处理器活动在L2缓存丢失发生后显著减少,并在L2丢失服务期间节省电力。提出了静态和自适应两种算法来决定何时应用这种泄漏减少技术。另一种技术试图使外围电路大部分时间处于低功耗状态。SPEC2K基准测试的结果表明,根据决策算法的不同,第一种技术平均可以将L2功耗降低18%到22%(最高可达63%)。第二种方法平均可以节省25%(最高可达80%)。根据所使用的技术,这对性能的影响平均可以忽略不计,只有1%到2%。
{"title":"Reducing leakage power in peripheral circuits of L2 caches","authors":"H. Homayoun, A. Veidenbaum","doi":"10.1109/ICCD.2007.4601907","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601907","url":null,"abstract":"Leakage power has grown significantly and is a major challenge in microprocessor design. Leakage is the dominant power component in second-level (L2) caches. This paper presents two architectural techniques to utilize leakage reduction circuits in L2 caches. They primarily target the leakage in the peripheral circuitry of an L2 cache and as such have to be able to cope with longer delays. One technique exploits the fact that processor activity decreases significantly after an L2 cache miss occurs and saves power during L2 miss service time. Two algorithms, a static one and an adaptive one, are proposed for deciding when to apply this leakage reduction technique. Another technique attempts to keep the peripheral circuits in a lower-power state most of the time. The results for SPEC2K benchmarks show that the first technique can achieve a 18 to 22% reduction in L2 power consumption, on average (and up to 63%), depending on the decision algorithm. The second technique can save 25%, on average (and up to 80%). This comes with a negligible 1 to 2% performance impact, on average, depending on the technique used.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"5 1","pages":"230-237"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90554162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Energy-aware co-processor selection for embedded processors on FPGAs fpga上嵌入式处理器的能量感知协处理器选择
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601895
A. H. Gholamipour, E. Bozorgzadeh, Sudarshan Banerjee
In this paper, we present co-processor selection problem for minimum energy consumption in hw/sw co-design on FPGAs with dual power mode. We provide theoretical analysis for the problem under no constraint, resource constraint, and timing constraint. We prove that the complexity of the problem in each case is NP-Hard and we provide a generalized ILP formulation. We compared the result of our approach in minimizing energy to the result of other approaches that had not considered both static and dynamic power during optimization and we showed that we can reduce energy by 63% in some cases.
本文针对双电源模式fpga的软硬件协同设计,提出了功耗最小的协处理器选择问题。对无约束、资源约束和时间约束下的问题进行了理论分析。我们证明了在每种情况下问题的复杂性都是NP-Hard的,并给出了一个广义的ILP公式。我们将我们的方法在最小化能量方面的结果与其他在优化过程中没有考虑静态和动态功率的方法的结果进行了比较,我们表明,在某些情况下,我们可以减少63%的能量。
{"title":"Energy-aware co-processor selection for embedded processors on FPGAs","authors":"A. H. Gholamipour, E. Bozorgzadeh, Sudarshan Banerjee","doi":"10.1109/ICCD.2007.4601895","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601895","url":null,"abstract":"In this paper, we present co-processor selection problem for minimum energy consumption in hw/sw co-design on FPGAs with dual power mode. We provide theoretical analysis for the problem under no constraint, resource constraint, and timing constraint. We prove that the complexity of the problem in each case is NP-Hard and we provide a generalized ILP formulation. We compared the result of our approach in minimizing energy to the result of other approaches that had not considered both static and dynamic power during optimization and we showed that we can reduce energy by 63% in some cases.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"413 1","pages":"158-163"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79214170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Two-level ata prefetching 两级数据预取
Pub Date : 2007-10-01 DOI: 10.1109/ICCD.2007.4601908
Fei Gao, Hanyu Cui, S. Sair
Data prefetching has been shown to be an effective tool in hiding part of the latency associated with cache misses in modern processors. Traditionally, data prefetchers fetch data into a small prefetch buffer near the LI for low latency, or the L2 cache for greater coverage and less cache pollution. However, with the L1-L2 cache speed gap growing, significant performance gains can be obtained if the data pref etcher can operate as aggressively as an L2-level pref etcher but with the fast hit times of an LI-level pref etcher. In this paper, we propose a prefetching framework where an LI-level prefetcher and an L2- level prefetcher work cooperatively to reduce the average access time more than either one alone can. We evaluate several design alternatives suited to perform synergistically under different workloads. From the insight we gather from this analysis, we propose a confidence-based adaptive prefetcher that can improve prefetch efficiency significantly with judicious use of available bus bandwidth. Our results show that for certain prefetcher combinations, two- level prefetching can achieve the cumulative speedup attained from either prefetcher alone. Furthermore, when compared to other two-level prefetching models, the adaptive design provides similar speedups with appreciably less bus traffic.
数据预取已被证明是一种有效的工具,可以隐藏现代处理器中与缓存丢失相关的部分延迟。传统上,数据预取器将数据提取到LI附近的一个小预取缓冲区中以获得低延迟,或者将数据提取到L2缓存中以获得更大的覆盖范围和更少的缓存污染。然而,随着L1-L2高速缓存速度差距的增大,如果数据预取器可以像l2级预取器一样积极地操作,但具有li级预取器的快速命中时间,则可以获得显著的性能提升。在本文中,我们提出了一个预取框架,其中li级预取器和L2级预取器协同工作,以减少平均访问时间。我们评估了几种适合在不同工作负载下协同执行的设计方案。根据我们从该分析中收集的见解,我们提出了一种基于置信度的自适应预取器,通过明智地使用可用总线带宽,可以显着提高预取效率。我们的结果表明,对于某些预取器组合,两级预取可以获得单独使用任一预取器所获得的累积加速。此外,与其他两级预取模型相比,自适应设计提供了相似的速度,但总线流量明显减少。
{"title":"Two-level ata prefetching","authors":"Fei Gao, Hanyu Cui, S. Sair","doi":"10.1109/ICCD.2007.4601908","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601908","url":null,"abstract":"Data prefetching has been shown to be an effective tool in hiding part of the latency associated with cache misses in modern processors. Traditionally, data prefetchers fetch data into a small prefetch buffer near the LI for low latency, or the L2 cache for greater coverage and less cache pollution. However, with the L1-L2 cache speed gap growing, significant performance gains can be obtained if the data pref etcher can operate as aggressively as an L2-level pref etcher but with the fast hit times of an LI-level pref etcher. In this paper, we propose a prefetching framework where an LI-level prefetcher and an L2- level prefetcher work cooperatively to reduce the average access time more than either one alone can. We evaluate several design alternatives suited to perform synergistically under different workloads. From the insight we gather from this analysis, we propose a confidence-based adaptive prefetcher that can improve prefetch efficiency significantly with judicious use of available bus bandwidth. Our results show that for certain prefetcher combinations, two- level prefetching can achieve the cumulative speedup attained from either prefetcher alone. Furthermore, when compared to other two-level prefetching models, the adaptive design provides similar speedups with appreciably less bus traffic.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"36 1","pages":"238-244"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77428284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2007 25th International Conference on Computer Design
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1