首页 > 最新文献

ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design最新文献

英文 中文
Dynamic Thermal Clock Skew Compensation using Tunable Delay Buffers 动态热时钟偏差补偿使用可调延迟缓冲器
A. Chakraborty, K. Duraisami, A. Sathanur, P. Sithambaram, L. Benini, A. Macii, E. Macii, M. Poncino
The thermal gradients existing in high-performance circuits may significantly affect their timing behavior, in particular by increasing the skew of the clock net and/or altering hold/setup constraints, possibly causing the circuit to operate incorrectly. The knowledge of the spatial distribution of temperature can be used to properly design a clock network that is able to compensate such thermal non-uniformities. However, re-design of the clock network is effective only if temperature distribution is stationary, i.e., does not change over time. In this work, we specifically address the problem of dynamically modifying the clock tree in such a way that it can compensate for temporal variations of temperature. This is achieved by exploiting the buffers that are inserted during the clock network generation, by transforming them into tunable delay elements. Temperature-induced delay variations are then compensated by applying the proper tuning to the tunable buffers, which is computed off-line and, stored in a tuning table inserted in the design. We propose an algorithm to minimize the number of inserted tunable buffers, as well as their tunable range (which directly relates to complexity). Results show that clock skew is kept within original bounds with minimum area and power penalty. The maximum increase in power is 23.2% with most benchmarks exhibiting less than 5% increase in power
高性能电路中存在的热梯度可能会显著影响其时序行为,特别是通过增加时钟网的倾斜和/或改变保持/设置约束,可能导致电路工作不正确。温度空间分布的知识可以用来适当地设计一个时钟网络,能够补偿这种热不均匀性。然而,时钟网络的重新设计是有效的,只有当温度分布是平稳的,即不随时间变化。在这项工作中,我们专门解决了动态修改时钟树的问题,这样它就可以补偿温度的时间变化。这是通过利用在时钟网络生成期间插入的缓冲区来实现的,通过将它们转换为可调的延迟元素。然后通过对可调缓冲器进行适当的调整来补偿温度引起的延迟变化,可调缓冲器离线计算并存储在设计中插入的调优表中。我们提出了一种算法来最小化插入的可调缓冲区的数量,以及它们的可调范围(这直接关系到复杂性)。结果表明,时钟偏差保持在原来的范围内,面积和功率损失最小。最大功率增长为23.2%,大多数基准测试显示功率增长不到5%
{"title":"Dynamic Thermal Clock Skew Compensation using Tunable Delay Buffers","authors":"A. Chakraborty, K. Duraisami, A. Sathanur, P. Sithambaram, L. Benini, A. Macii, E. Macii, M. Poncino","doi":"10.1145/1165573.1165612","DOIUrl":"https://doi.org/10.1145/1165573.1165612","url":null,"abstract":"The thermal gradients existing in high-performance circuits may significantly affect their timing behavior, in particular by increasing the skew of the clock net and/or altering hold/setup constraints, possibly causing the circuit to operate incorrectly. The knowledge of the spatial distribution of temperature can be used to properly design a clock network that is able to compensate such thermal non-uniformities. However, re-design of the clock network is effective only if temperature distribution is stationary, i.e., does not change over time. In this work, we specifically address the problem of dynamically modifying the clock tree in such a way that it can compensate for temporal variations of temperature. This is achieved by exploiting the buffers that are inserted during the clock network generation, by transforming them into tunable delay elements. Temperature-induced delay variations are then compensated by applying the proper tuning to the tunable buffers, which is computed off-line and, stored in a tuning table inserted in the design. We propose an algorithm to minimize the number of inserted tunable buffers, as well as their tunable range (which directly relates to complexity). Results show that clock skew is kept within original bounds with minimum area and power penalty. The maximum increase in power is 23.2% with most benchmarks exhibiting less than 5% increase in power","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114153677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 78
A Low-Power Active Substrate-Noise Decoupling Circuit with Feedforward Compensation for Mixed-Signal SoCs 一种用于混合信号soc的具有前馈补偿的低功耗有源衬底噪声解耦电路
Song Guo, Hoi Lee
This paper presents a low-power active-decoupling circuit using feedforward-compensation technique for SoC substrate-noise reduction. The proposed feedforward technique not only generates a left-half-plane zero for the decoupling circuit achieving stability and wider bandwidth in low-power condition, but also increases the dynamic current during transient. As a result, substrate-noise suppression has been significantly improved with larger-amplitude and higher-frequency noise sources. In a standard 0.13mum CMOS process, simulation results show that the proposed feedforward technique enhances both the bandwidth and dynamic current of the decoupling circuit by 2 and 79 times, respectively, without additional static power consumption. The decoupling circuit thus improves the crosstalk noise suppression from 3.6 to 6.6 times with a 1GHz noise-source amplitude increasing from 100mV to 500mV
本文提出了一种采用前馈补偿技术的低功耗有源去耦电路,用于SoC衬底降噪。所提出的前馈技术不仅使去耦电路产生左半平面零,在低功耗条件下实现了稳定性和更宽的带宽,而且增加了暂态时的动态电流。因此,在较大幅度和较高频率的噪声源下,基片噪声抑制得到了显著改善。在标准的0.13 μ m CMOS工艺中,仿真结果表明,该前馈技术在不增加静态功耗的情况下,将去耦电路的带宽和动态电流分别提高了2倍和79倍。因此,去耦电路将串扰噪声抑制从3.6倍提高到6.6倍,1GHz噪声源幅值从100mV增加到500mV
{"title":"A Low-Power Active Substrate-Noise Decoupling Circuit with Feedforward Compensation for Mixed-Signal SoCs","authors":"Song Guo, Hoi Lee","doi":"10.1145/1165573.1165649","DOIUrl":"https://doi.org/10.1145/1165573.1165649","url":null,"abstract":"This paper presents a low-power active-decoupling circuit using feedforward-compensation technique for SoC substrate-noise reduction. The proposed feedforward technique not only generates a left-half-plane zero for the decoupling circuit achieving stability and wider bandwidth in low-power condition, but also increases the dynamic current during transient. As a result, substrate-noise suppression has been significantly improved with larger-amplitude and higher-frequency noise sources. In a standard 0.13mum CMOS process, simulation results show that the proposed feedforward technique enhances both the bandwidth and dynamic current of the decoupling circuit by 2 and 79 times, respectively, without additional static power consumption. The decoupling circuit thus improves the crosstalk noise suppression from 3.6 to 6.6 times with a 1GHz noise-source amplitude increasing from 100mV to 500mV","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132279722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Analysis of Super Cut-off Transistors for Ultralow Power Digital Logic Circuits 用于超低功耗数字逻辑电路的超级截止晶体管分析
A. Raychowdhury, Xuanyao Fong, Qikai Chen, K. Roy
Super cut-off devices with sub-60mV/decade subthreshold swings have recently been demonstrated and being extensively studied. This paper presents a feasibility analysis of such tunneling devices for ultralow power subthreshold logic. Analysis shows that this device can deliver 800times higher performance (@iso-IOFF) compared to a MOSFET. The possible use of this device as a sleep transistor in conjunction with the regular Si MOSFET shows 2000times average improvement in leakage power compared to Si MOSFETs
具有低于60mv / 10年亚阈值振荡的超级截止装置最近被证明并被广泛研究。本文对这种超低功耗亚阈值逻辑隧道器件的可行性进行了分析。分析表明,与MOSFET相比,该器件可以提供800倍的性能(@iso-IOFF)。该器件作为睡眠晶体管与常规Si MOSFET结合使用的可能性显示,与Si MOSFET相比,泄漏功率平均提高了2000倍
{"title":"Analysis of Super Cut-off Transistors for Ultralow Power Digital Logic Circuits","authors":"A. Raychowdhury, Xuanyao Fong, Qikai Chen, K. Roy","doi":"10.1145/1165573.1165577","DOIUrl":"https://doi.org/10.1145/1165573.1165577","url":null,"abstract":"Super cut-off devices with sub-60mV/decade subthreshold swings have recently been demonstrated and being extensively studied. This paper presents a feasibility analysis of such tunneling devices for ultralow power subthreshold logic. Analysis shows that this device can deliver 800times higher performance (@iso-IOFF) compared to a MOSFET. The possible use of this device as a sleep transistor in conjunction with the regular Si MOSFET shows 2000times average improvement in leakage power compared to Si MOSFETs","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126081444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
A Novel Approach for Variation Aware Power Minimization during Gate Sizing 栅极尺寸变化感知功率最小化的新方法
V. Mahalingam, N. Ranganathan, J. Harlow
Increasing dominance of process variations in the nanometer designs is posing significant challenges for circuit design and optimization. The variations in parameters such as channel length and the gate oxide thickness impacts circuit delay and power. In this paper, we propose a new gate sizing algorithm using fuzzy mathematical programming (FMP) in which the uncertainty due to process variations is modeled using fuzzy numbers. The variations in gate delay, which is a function of gate sizes and the fan-outs of the gate, are represented using triangular fuzzy numbers with linear membership functions. The variation aware gate sizing problem is formulated as a fuzzy mathematical program to perform a delay constrained power minimization in the presence of variations. Initially, a deterministic optimization is performed by fixing the fuzzy parameters to the worst and the average case values and the results are used to convert the fuzzy optimization problem into a crisp non-linear problem which is then solved using a non-linear optimization solver. The above model with delay and power as constraints, maximizes the robustness, i.e., the variation resistance of the circuit and thus the yield. The proposed approach was tested on ISCAS'85 benchmarks and the results were validated for timing yield using Monte-Carlo simulations. The fuzzy approach yields significantly better results compared to stochastic programming based gate sizing approach with a comparable runtime
纳米设计中越来越多的工艺变化对电路设计和优化提出了重大挑战。沟道长度和栅极氧化物厚度等参数的变化会影响电路的延迟和功率。在本文中,我们提出了一种新的闸门尺寸算法,该算法采用模糊数学规划(FMP),其中由工艺变化引起的不确定性使用模糊数建模。栅极延迟的变化是栅极尺寸和栅极扇出的函数,用带有线性隶属函数的三角模糊数表示。变化感知栅极尺寸问题被表述为一个模糊数学程序,在存在变化的情况下执行延迟约束的功率最小化。首先,通过确定模糊参数的最坏情况和平均情况值进行确定性优化,并利用结果将模糊优化问题转化为清晰的非线性问题,然后使用非线性优化求解器进行求解。上述模型以延迟和功率为约束,最大限度地提高了鲁棒性,即电路的抗变异能力,从而提高了成品率。该方法在ISCAS'85基准上进行了测试,并通过蒙特卡罗模拟验证了结果的时序良率。与基于随机规划的门尺寸方法相比,模糊方法在可比的运行时间下产生明显更好的结果
{"title":"A Novel Approach for Variation Aware Power Minimization during Gate Sizing","authors":"V. Mahalingam, N. Ranganathan, J. Harlow","doi":"10.1145/1165573.1165614","DOIUrl":"https://doi.org/10.1145/1165573.1165614","url":null,"abstract":"Increasing dominance of process variations in the nanometer designs is posing significant challenges for circuit design and optimization. The variations in parameters such as channel length and the gate oxide thickness impacts circuit delay and power. In this paper, we propose a new gate sizing algorithm using fuzzy mathematical programming (FMP) in which the uncertainty due to process variations is modeled using fuzzy numbers. The variations in gate delay, which is a function of gate sizes and the fan-outs of the gate, are represented using triangular fuzzy numbers with linear membership functions. The variation aware gate sizing problem is formulated as a fuzzy mathematical program to perform a delay constrained power minimization in the presence of variations. Initially, a deterministic optimization is performed by fixing the fuzzy parameters to the worst and the average case values and the results are used to convert the fuzzy optimization problem into a crisp non-linear problem which is then solved using a non-linear optimization solver. The above model with delay and power as constraints, maximizes the robustness, i.e., the variation resistance of the circuit and thus the yield. The proposed approach was tested on ISCAS'85 benchmarks and the results were validated for timing yield using Monte-Carlo simulations. The fuzzy approach yields significantly better results compared to stochastic programming based gate sizing approach with a comparable runtime","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"1997 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121073678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Power Phase Variation in a Commercial Server Workload 商业服务器工作负载中的功率相位变化
W. Bircher, L. John
Many techniques have been developed for adaptive power management of computing systems. These techniques rely on the presence of varying power phases to detect opportunities for adaptation. However, little information is available regarding the extent of power phases in real systems. This paper illustrates available power phases ranging from 1 millisecond to 1 second using a commercial workload running on enterprise class hardware. Data is obtained using a server instrumented for power measurement at the subsystem level. The analysis shows that chipset, memory and disk subsystems have the most homogenous phase behavior with greater than 71% of samples within phases of 100 milliseconds or shorter. In contrast, CPU and I/O subsystems have much more variation with only 26% of samples within phases of 10 milliseconds or shorter
计算系统的自适应电源管理已经发展了许多技术。这些技术依靠不同功率相位的存在来检测适应的机会。然而,关于实际系统中功率相位的范围的信息很少。本文使用运行在企业级硬件上的商业工作负载演示了从1毫秒到1秒的可用功率相位。数据是通过一个用于子系统级功率测量的服务器获得的。分析表明,芯片组、内存和磁盘子系统具有最均匀的相位行为,超过71%的样本在100毫秒或更短的相位内。相比之下,CPU和I/O子系统的变化要大得多,只有26%的样本在10毫秒或更短的阶段内
{"title":"Power Phase Variation in a Commercial Server Workload","authors":"W. Bircher, L. John","doi":"10.1145/1165573.1165656","DOIUrl":"https://doi.org/10.1145/1165573.1165656","url":null,"abstract":"Many techniques have been developed for adaptive power management of computing systems. These techniques rely on the presence of varying power phases to detect opportunities for adaptation. However, little information is available regarding the extent of power phases in real systems. This paper illustrates available power phases ranging from 1 millisecond to 1 second using a commercial workload running on enterprise class hardware. Data is obtained using a server instrumented for power measurement at the subsystem level. The analysis shows that chipset, memory and disk subsystems have the most homogenous phase behavior with greater than 71% of samples within phases of 100 milliseconds or shorter. In contrast, CPU and I/O subsystems have much more variation with only 26% of samples within phases of 10 milliseconds or shorter","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"602 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123326910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Reducing Idle Mode Power in Software Defined Radio Terminals 降低软件无线电终端的空闲模式功率
Hyunseok Lee, T. Mudge, C. Chakrabarti
In this paper, we propose a processor which is optimized for idle mode operation of a software defined radio (SDR) terminal. Since a SDR terminal spends most of its time in the idle mode, reducing the power consumption in this mode directly translates to longer terminal standby time. Workload analysis of idle mode operations of contemporary standards showed that these are dominated by FIR filtering, which can be easily parallelized. This analysis was used in the design of the idle mode processor. The key architectural components are an SIMD unit for the parallel computations that dominate the workload, a conventional scalar unit for the sequential computations, and a control unit which supports efficient data memory access and loop control. The idle mode processor was modeled with Verilog and synthesized using standard cells in 0.13 micron technology. It consumes about 9mW at 1.08V
本文提出了一种针对软件无线电(SDR)终端的空闲工作模式进行优化的处理器。由于SDR终端大部分时间处于空闲模式,因此降低空闲模式下的功耗直接意味着延长终端待机时间。当代标准的空闲模式操作的工作负载分析表明,这些操作主要由FIR滤波控制,可以很容易地并行化。该分析被用于空闲模式处理器的设计。关键的体系结构组件是用于主导工作负载的并行计算的SIMD单元、用于顺序计算的常规标量单元,以及支持高效数据内存访问和循环控制的控制单元。利用Verilog对空闲模式处理器进行建模,并在0.13微米工艺下使用标准细胞进行合成。它在1.08V时消耗约9mW
{"title":"Reducing Idle Mode Power in Software Defined Radio Terminals","authors":"Hyunseok Lee, T. Mudge, C. Chakrabarti","doi":"10.1145/1165573.1165597","DOIUrl":"https://doi.org/10.1145/1165573.1165597","url":null,"abstract":"In this paper, we propose a processor which is optimized for idle mode operation of a software defined radio (SDR) terminal. Since a SDR terminal spends most of its time in the idle mode, reducing the power consumption in this mode directly translates to longer terminal standby time. Workload analysis of idle mode operations of contemporary standards showed that these are dominated by FIR filtering, which can be easily parallelized. This analysis was used in the design of the idle mode processor. The key architectural components are an SIMD unit for the parallel computations that dominate the workload, a conventional scalar unit for the sequential computations, and a control unit which supports efficient data memory access and loop control. The idle mode processor was modeled with Verilog and synthesized using standard cells in 0.13 micron technology. It consumes about 9mW at 1.08V","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117123476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Energy-efficient Motion Estimation using Error-Tolerance 基于容错的节能运动估计
G. Varatkar, Naresh R Shanbhag
Presented is an energy-efficient motion estimation architecture using error-tolerance. The technique employs overscaling of the supply voltage (voltage overscaling (VOS)) to reduce power at the expense of timing errors, which are then corrected using algorithmic noise-tolerance (ANT) techniques. Referred to as input subsampled replica ANT (ISR-ANT), the proposed technique incorporates an input subsampled replica of the main sum of absolute difference (MSAD) block for obtaining the motion vectors in the presence of errors induced by VOS. Simulations show that the proposed technique can save up to 60% power over an optimal error-free present day system in a 130nm CMOS technology. Power savings increase to 79% in a 45nm predictive process technology
提出了一种基于容错的节能运动估计体系结构。该技术采用电源电压的过标度(电压过标度(VOS))来降低功率,代价是时序误差,然后使用算法噪声容限(ANT)技术进行校正。该技术被称为输入下采样复制ANT (ISR-ANT),该技术结合了主绝对差和(MSAD)块的输入下采样复制,用于在存在VOS引起的误差的情况下获得运动向量。仿真结果表明,与目前采用130纳米CMOS技术的最佳无误差系统相比,该技术可节省高达60%的功耗。采用45纳米预测制程技术可节省79%的电力
{"title":"Energy-efficient Motion Estimation using Error-Tolerance","authors":"G. Varatkar, Naresh R Shanbhag","doi":"10.1145/1165573.1165599","DOIUrl":"https://doi.org/10.1145/1165573.1165599","url":null,"abstract":"Presented is an energy-efficient motion estimation architecture using error-tolerance. The technique employs overscaling of the supply voltage (voltage overscaling (VOS)) to reduce power at the expense of timing errors, which are then corrected using algorithmic noise-tolerance (ANT) techniques. Referred to as input subsampled replica ANT (ISR-ANT), the proposed technique incorporates an input subsampled replica of the main sum of absolute difference (MSAD) block for obtaining the motion vectors in the presence of errors induced by VOS. Simulations show that the proposed technique can save up to 60% power over an optimal error-free present day system in a 130nm CMOS technology. Power savings increase to 79% in a 45nm predictive process technology","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115377090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 75
Reducing Cache Traffic and Energy with Macro Data Load 利用宏观数据负载减少缓存流量和能量
Lei Jin, Sangyeun Cho
This paper presents a study on macro data load, an efficient mechanism to enhance loaded value reuse. A macro data load brings into the processor a maximum-width data value the cache port allows, saves it in an internal structure, and facilitates reuse by later loads. A comprehensive limit study using a generalized memory value reuse table (MVRT) shows the significantly increased reuse opportunities provided by macro data load. We also describe a modified load store queue design as an implementation of the proposed concept. Our quantitative study shows that over 35% of L1 cache accesses in the SPEC2k integer and MiBench programs can be eliminated, resulting in a related energy reduction of 24% and 35% on average, respectively
本文对宏数据加载进行了研究,这是一种提高加载值重用的有效机制。宏数据加载将缓存端口允许的最大宽度数据值带入处理器,将其保存在内部结构中,便于以后加载重用。使用广义内存值重用表(MVRT)进行的全面限制研究表明,宏数据负载提供了显著增加的重用机会。我们还描述了一个修改后的负载存储队列设计,作为所提出概念的实现。我们的定量研究表明,在SPEC2k整数和MiBench程序中,可以消除超过35%的L1缓存访问,从而平均分别减少24%和35%的相关能量
{"title":"Reducing Cache Traffic and Energy with Macro Data Load","authors":"Lei Jin, Sangyeun Cho","doi":"10.1145/1165573.1165608","DOIUrl":"https://doi.org/10.1145/1165573.1165608","url":null,"abstract":"This paper presents a study on macro data load, an efficient mechanism to enhance loaded value reuse. A macro data load brings into the processor a maximum-width data value the cache port allows, saves it in an internal structure, and facilitates reuse by later loads. A comprehensive limit study using a generalized memory value reuse table (MVRT) shows the significantly increased reuse opportunities provided by macro data load. We also describe a modified load store queue design as an implementation of the proposed concept. Our quantitative study shows that over 35% of L1 cache accesses in the SPEC2k integer and MiBench programs can be eliminated, resulting in a related energy reduction of 24% and 35% on average, respectively","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128740391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Independent Front-end and Back-end Dynamic Voltage Scaling for a GALS Microarchitecture GALS微架构的独立前端和后端动态电压缩放
G. Magklis, P. Chaparro, José González, Antonio González
In recent years, globally asynchronous locally synchronous (GALS) designs and dynamic voltage scaling (DVS) have emerged as some of the most popular approaches to address the ever increasing microprocessor energy consumption. In this work, we propose two on-line algorithms for adjusting dynamically, and independently, the voltage and frequency of the front-end and back-end domains of a novel two-domain microprocessor. We evaluate our mechanisms for both internal and external voltage regulators, and we present optimal dynamic voltage scaling results for the proposed microarchitecture. Our schemes achieve average improvement of 12% of the energy-delay metric, when using internal voltage regulators
近年来,全球异步本地同步(GALS)设计和动态电压缩放(DVS)已经成为解决不断增加的微处理器能耗的一些最流行的方法。在这项工作中,我们提出了两种在线算法来动态地、独立地调整一种新型双域微处理器的前端和后端域的电压和频率。我们评估了内部和外部电压调节器的机制,并为所提出的微架构提供了最佳的动态电压缩放结果。当使用内部电压调节器时,我们的方案平均提高了12%的能量延迟度量
{"title":"Independent Front-end and Back-end Dynamic Voltage Scaling for a GALS Microarchitecture","authors":"G. Magklis, P. Chaparro, José González, Antonio González","doi":"10.1145/1165573.1165586","DOIUrl":"https://doi.org/10.1145/1165573.1165586","url":null,"abstract":"In recent years, globally asynchronous locally synchronous (GALS) designs and dynamic voltage scaling (DVS) have emerged as some of the most popular approaches to address the ever increasing microprocessor energy consumption. In this work, we propose two on-line algorithms for adjusting dynamically, and independently, the voltage and frequency of the front-end and back-end domains of a novel two-domain microprocessor. We evaluate our mechanisms for both internal and external voltage regulators, and we present optimal dynamic voltage scaling results for the proposed microarchitecture. Our schemes achieve average improvement of 12% of the energy-delay metric, when using internal voltage regulators","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116409926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Synchronization-Driven Dynamic Speed Scaling for MPSoCs 同步驱动的mpsoc动态速度缩放
M. Loghi, M. Poncino, L. Benini
Equalizing the ratios between workloads and speeds of processing elements provides the optimal speed allocation. Based on that principle, this work describes a dynamic speed setting policy for multiprocessor systems-on-chip (MPSoCs) that relies on the estimation of processor idle times specifically due to the synchronization work. The policy provides two advantages: first, it does not rely on any assumption about the communication pattern of the application executed by the system. Second, it is purely architectural; it automatically detects changes in the system workload and sets processors speeds accordingly by means of a custom hardware block. Results on a parallel MPEG video decoding application show an EDP saving above 55%, averaged over several datasets, corresponding to an energy saving above 50%, and a corresponding penalty in performance below 8%
平衡工作负载和处理元素的速度之间的比率可以提供最佳的速度分配。基于该原理,本文描述了多处理器片上系统(mpsoc)的动态速度设置策略,该策略依赖于对处理器空闲时间的估计,特别是由于同步工作。该策略提供了两个优点:首先,它不依赖于系统执行的应用程序的通信模式的任何假设。其次,它是纯粹的建筑;它自动检测系统工作负载的变化,并通过自定义硬件块相应地设置处理器速度。在并行MPEG视频解码应用程序上的结果显示,EDP节省了55%以上,在多个数据集上平均,相当于节省了50%以上的能源,而相应的性能损失低于8%
{"title":"Synchronization-Driven Dynamic Speed Scaling for MPSoCs","authors":"M. Loghi, M. Poncino, L. Benini","doi":"10.1145/1165573.1165655","DOIUrl":"https://doi.org/10.1145/1165573.1165655","url":null,"abstract":"Equalizing the ratios between workloads and speeds of processing elements provides the optimal speed allocation. Based on that principle, this work describes a dynamic speed setting policy for multiprocessor systems-on-chip (MPSoCs) that relies on the estimation of processor idle times specifically due to the synchronization work. The policy provides two advantages: first, it does not rely on any assumption about the communication pattern of the application executed by the system. Second, it is purely architectural; it automatically detects changes in the system workload and sets processors speeds accordingly by means of a custom hardware block. Results on a parallel MPEG video decoding application show an EDP saving above 55%, averaged over several datasets, corresponding to an energy saving above 50%, and a corresponding penalty in performance below 8%","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130592315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1