首页 > 最新文献

2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)最新文献

英文 中文
Having your cake and eating it too: Energy savings without performance loss through resource sharing driven power management 鱼与熊掌兼得:通过资源共享驱动的电源管理节省能源,同时又不会造成性能损失
Pub Date : 2015-07-22 DOI: 10.1109/ISLPED.2015.7273523
Jae-Yeon Won, Paul V. Gratz, S. Shakkottai, Jiang Hu
Typically in computer systems, performance must be traded-off to achieve energy savings or, conversely, performance gains come with significant energy overhead. Here, we present a novel approach that can achieve synergistic energy-savings and performance gain in chip multiprocessors (CMPs). Our key observation is that per-core dynamic voltage/frequency scaling (DVFS) can be used as a client regulation mechanism for shared resources on-die. Based on this observation, we propose a new DVFS technique inspired by TCP Vegas, a congestion control protocol from the IP-networking domain. Full system simulations on PARSEC benchmarks show that our technique reduces total CMP energy dissipation by over 40% with a small performance improvement.
通常在计算机系统中,必须权衡性能以实现节能,或者相反,性能的提高伴随着显著的能源开销。在这里,我们提出了一种新的方法,可以在芯片多处理器(cmp)中实现协同节能和性能提升。我们的主要观察是,每核动态电压/频率缩放(DVFS)可以用作芯片上共享资源的客户端调节机制。基于这一观察,我们提出了一种新的DVFS技术,其灵感来自TCP Vegas,一种来自ip网络域的拥塞控制协议。在PARSEC基准测试上的全系统模拟表明,我们的技术将CMP的总能量耗散降低了40%以上,性能得到了小幅改善。
{"title":"Having your cake and eating it too: Energy savings without performance loss through resource sharing driven power management","authors":"Jae-Yeon Won, Paul V. Gratz, S. Shakkottai, Jiang Hu","doi":"10.1109/ISLPED.2015.7273523","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273523","url":null,"abstract":"Typically in computer systems, performance must be traded-off to achieve energy savings or, conversely, performance gains come with significant energy overhead. Here, we present a novel approach that can achieve synergistic energy-savings and performance gain in chip multiprocessors (CMPs). Our key observation is that per-core dynamic voltage/frequency scaling (DVFS) can be used as a client regulation mechanism for shared resources on-die. Based on this observation, we propose a new DVFS technique inspired by TCP Vegas, a congestion control protocol from the IP-networking domain. Full system simulations on PARSEC benchmarks show that our technique reduces total CMP energy dissipation by over 40% with a small performance improvement.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133810459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Transient voltage noise in charge-recycled power delivery networks for many-layer 3D-IC 多层3D-IC充电再循环输电网络中的瞬态电压噪声
Pub Date : 2015-07-22 DOI: 10.1109/ISLPED.2015.7273506
Runjie Zhang, K. Mazumdar, B. Meyer, Ke Wang, K. Skadron, M. Stan
Aside from the benefits it brings, 3D-IC technology inevitably exacerbates the difficulty of power delivery with volumetrically increasing power consumption. Recent work managed to “recycle” current within the 3D stack by linking the different layers' supply/ground nets into a series connection. This charge-recycled (also known as voltage-stacked, or V-S) scheme provides a scalable solution for 3D-IC's power delivery because it supports an arbitrary number of layers with a constant off-chip current demand. Although prior work has studied the circuit implementation of a V-S power delivery network (PDN) and its current-reduction benefits, a whole-system evaluation of V-S PDNs' transient voltage noise and a noise comparison between the V-S PDN and the traditional PDN are missing. In this paper, we build a system-level model to examine voltage-stacked 3D-ICs' transient noise and explore the impact of different PDN design parameters and workload behaviors. Our results show that compared with the traditional PDN scheme, V-S provides stronger isolation for cross-layer noise interference, which in turn grants higher performance benefits for run-time noise mitigation techniques, such as dynamic margin adaptation. We observe that, compared with traditional PDNs, V-S PDNs provide up to 60% lower transient noise in the worst-case scenario. Furthermore, we show that V-S PDNs significantly reduce the packaging cost, because their noise is almost insensitive to the package impedance (e.g., a 300% impedance increase only raises worst-case noise by less than 0.3% Vdd).
除了它带来的好处之外,3D-IC技术不可避免地加剧了电力输送的困难,功耗也在不断增加。最近的工作通过将不同层的供电/接地网连接成串联连接,成功地在3D堆栈中“回收”电流。这种电荷回收(也称为电压堆叠或V-S)方案为3D-IC的电力输送提供了可扩展的解决方案,因为它支持任意数量的层,并具有恒定的片外电流需求。虽然之前的工作已经研究了V-S输电网络(PDN)的电路实现及其降电流效益,但缺乏对V-S输电网络暂态电压噪声的全系统评估以及V-S输电网络与传统PDN的噪声比较。在本文中,我们建立了一个系统级模型来检测电压堆叠3d - ic的瞬态噪声,并探讨了不同PDN设计参数和工作负载行为的影响。我们的研究结果表明,与传统的PDN方案相比,V-S对跨层噪声干扰提供了更强的隔离,这反过来又为动态边界适应等运行时噪声缓解技术提供了更高的性能优势。我们观察到,与传统pdn相比,V-S pdn在最坏情况下提供高达60%的瞬态噪声。此外,我们发现V-S pdn显著降低了封装成本,因为它们的噪声几乎对封装阻抗不敏感(例如,300%的阻抗增加只会使最坏情况下的噪声增加不到0.3% Vdd)。
{"title":"Transient voltage noise in charge-recycled power delivery networks for many-layer 3D-IC","authors":"Runjie Zhang, K. Mazumdar, B. Meyer, Ke Wang, K. Skadron, M. Stan","doi":"10.1109/ISLPED.2015.7273506","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273506","url":null,"abstract":"Aside from the benefits it brings, 3D-IC technology inevitably exacerbates the difficulty of power delivery with volumetrically increasing power consumption. Recent work managed to “recycle” current within the 3D stack by linking the different layers' supply/ground nets into a series connection. This charge-recycled (also known as voltage-stacked, or V-S) scheme provides a scalable solution for 3D-IC's power delivery because it supports an arbitrary number of layers with a constant off-chip current demand. Although prior work has studied the circuit implementation of a V-S power delivery network (PDN) and its current-reduction benefits, a whole-system evaluation of V-S PDNs' transient voltage noise and a noise comparison between the V-S PDN and the traditional PDN are missing. In this paper, we build a system-level model to examine voltage-stacked 3D-ICs' transient noise and explore the impact of different PDN design parameters and workload behaviors. Our results show that compared with the traditional PDN scheme, V-S provides stronger isolation for cross-layer noise interference, which in turn grants higher performance benefits for run-time noise mitigation techniques, such as dynamic margin adaptation. We observe that, compared with traditional PDNs, V-S PDNs provide up to 60% lower transient noise in the worst-case scenario. Furthermore, we show that V-S PDNs significantly reduce the packaging cost, because their noise is almost insensitive to the package impedance (e.g., a 300% impedance increase only raises worst-case noise by less than 0.3% Vdd).","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129353624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Leveraging emerging nonvolatile memory in high-level synthesis with loop transformations 利用循环转换的高级合成中出现的非易失性存储器
Pub Date : 2015-07-22 DOI: 10.1109/ISLPED.2015.7273491
Shuangchen Li, Ang Li, Yuan Zhe, Yongpan Liu, Peng Li, Guangyu Sun, Yu Wang, Huazhong Yang, Yuan Xie
To mitigate the “Power Wall” challenges for both mobile devices and data centers, accelerator-rich architecture with normally-off mode has been intensively studied recently. Power/energy optimization in high-level synthesis for accelerator design is critical for such accelerator-rich architecture. The emerging nonvolatile memory (NVM), offers many benefits such as ultra-low leakage power, high density, and instant power-on/off, and therefore is a promising alternative for the hardware accelerator design to achieve further power reduction. However, such NVM suffers from large write energy and latency, which brings new challenges for the buffer allocation in the custom accelerator design. This paper presents the first framework that optimizes NVM allocation in high-level synthesis for custom accelerator design, considering loop transformations. It solves the loop transformation, buffer allocation, and buffer type selection to minimize the memory power consumption, while under area, bandwidth, and performance constraints. This paper formulates the optimization problem, and solves it with a problem-specific designed stimulated annealing solution. Experiments demonstrate 32% extra power reduction compared with the previous method without optimizing loop transformations.
为了缓解移动设备和数据中心面临的“电源墙”挑战,最近人们对具有正常关闭模式的富加速器架构进行了深入研究。高能综合加速器设计中的功率/能量优化对于这种加速器丰富的体系结构至关重要。新兴的非易失性存储器(NVM)提供了许多优点,例如超低泄漏功率、高密度和即时开机/关机,因此是硬件加速器设计的一个有希望的替代方案,可以进一步降低功耗。然而,这种NVM的写能量和延迟较大,这给定制加速器设计中的缓冲区分配带来了新的挑战。本文提出了第一个框架,该框架在考虑环路转换的情况下,在定制加速器设计的高级综合中优化NVM分配。它解决了循环转换、缓冲区分配和缓冲区类型选择,以最大限度地减少内存功耗,同时在面积、带宽和性能限制下。本文提出了优化问题,并针对具体问题设计了模拟退火解。实验表明,与不优化回路变换的方法相比,该方法的功耗降低了32%。
{"title":"Leveraging emerging nonvolatile memory in high-level synthesis with loop transformations","authors":"Shuangchen Li, Ang Li, Yuan Zhe, Yongpan Liu, Peng Li, Guangyu Sun, Yu Wang, Huazhong Yang, Yuan Xie","doi":"10.1109/ISLPED.2015.7273491","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273491","url":null,"abstract":"To mitigate the “Power Wall” challenges for both mobile devices and data centers, accelerator-rich architecture with normally-off mode has been intensively studied recently. Power/energy optimization in high-level synthesis for accelerator design is critical for such accelerator-rich architecture. The emerging nonvolatile memory (NVM), offers many benefits such as ultra-low leakage power, high density, and instant power-on/off, and therefore is a promising alternative for the hardware accelerator design to achieve further power reduction. However, such NVM suffers from large write energy and latency, which brings new challenges for the buffer allocation in the custom accelerator design. This paper presents the first framework that optimizes NVM allocation in high-level synthesis for custom accelerator design, considering loop transformations. It solves the loop transformation, buffer allocation, and buffer type selection to minimize the memory power consumption, while under area, bandwidth, and performance constraints. This paper formulates the optimization problem, and solves it with a problem-specific designed stimulated annealing solution. Experiments demonstrate 32% extra power reduction compared with the previous method without optimizing loop transformations.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114587968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A power-aware digital feedforward neural network platform with backpropagation driven approximate synapses 具有反向传播驱动近似突触的功率感知数字前馈神经网络平台
Pub Date : 2015-07-22 DOI: 10.1109/ISLPED.2015.7273495
J. Kung, Duckhwan Kim, S. Mukhopadhyay
This paper proposes a power-aware digital feedforward neural network platform that utilizes the backpropagation algorithm during training to enable energy-quality trade-off. Given a quality constraint, the proposed approach identifies a set of synaptic weights for approximation in a neural network. The approach selects synapses with small impact on output error, estimated by the backpropagation algorithm, for approximation. The approximations are achieved by a coupled software (reduced bit-width) and hardware (approximate multiplication in the processing engine) based design approaches. The full-chip design in 130nm CMOS shows, compared to a baseline accurate design, the proposed approach reduces system power by ~38% with 0.4% lower recognition accuracy in a classification problem.
本文提出了一种功率感知的数字前馈神经网络平台,该平台在训练过程中利用反向传播算法实现能量质量权衡。在给定质量约束的情况下,该方法确定一组突触权值用于神经网络的逼近。该方法选择对反向传播算法估计的输出误差影响较小的突触进行逼近。近似是通过基于设计方法的耦合软件(减小位宽)和硬件(处理引擎中的近似乘法)实现的。130nm CMOS全芯片设计表明,与基线精度设计相比,该方法在分类问题中降低了约38%的系统功耗和0.4%的识别精度。
{"title":"A power-aware digital feedforward neural network platform with backpropagation driven approximate synapses","authors":"J. Kung, Duckhwan Kim, S. Mukhopadhyay","doi":"10.1109/ISLPED.2015.7273495","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273495","url":null,"abstract":"This paper proposes a power-aware digital feedforward neural network platform that utilizes the backpropagation algorithm during training to enable energy-quality trade-off. Given a quality constraint, the proposed approach identifies a set of synaptic weights for approximation in a neural network. The approach selects synapses with small impact on output error, estimated by the backpropagation algorithm, for approximation. The approximations are achieved by a coupled software (reduced bit-width) and hardware (approximate multiplication in the processing engine) based design approaches. The full-chip design in 130nm CMOS shows, compared to a baseline accurate design, the proposed approach reduces system power by ~38% with 0.4% lower recognition accuracy in a classification problem.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123708266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
PowerTrain: A learning-based calibration of McPAT power models 动力总成:基于学习的McPAT动力模型校准
Pub Date : 2015-07-22 DOI: 10.1109/ISLPED.2015.7273512
Wooseok Lee, Youngchun Kim, Jee Ho Ryoo, Dam Sunwoo, A. Gerstlauer, L. John
As research on improving energy efficiency becomes prevalent, the necessity of a tool to accurately estimate power is increasing. Among various tools proposed, McPAT has gained some popularity due to its easy-to-use analytical power models. However, McPAT's prediction has several limitations. Although under- or over-estimated power from unmodeled and mis-modeled parts offset each other, it still incorporates errors in each block. Moreover, the lack of awareness to the implementation details exacerbates the prediction inaccuracies. To alleviate this problem, we propose a new methodology to train McPAT towards precise processor power prediction using power measurements from real hardware. This calibration enables McPAT's power to fit to the target processor power. Once we adjusted the power consumption of each block to best match those in the target processor, our trained McPAT delivered more precise power estimation. We calibrated the outputs of McPAT against a Cortex-A15 within a Samsung Exynos 5422 SoC. We observe that our methodology successfully reduces the errors, particularly for workloads with fluctuating power behaviors. The results show that the mean percentage error and the mean percentage absolute error of the calibrated power against real hardware are 2.04 percent and 4.37 percent, respectively.
随着提高能源效率的研究越来越普遍,对一种准确估算功率的工具的需求也越来越大。在提出的各种工具中,McPAT因其易于使用的分析能力模型而受到一些欢迎。然而,McPAT的预测有一些局限性。尽管未建模和错误建模部分的功率低估或高估相互抵消,但它仍然包含每个块中的误差。此外,缺乏对实现细节的认识加剧了预测的不准确性。为了缓解这个问题,我们提出了一种新的方法来训练McPAT,使其能够使用真实硬件的功耗测量来精确预测处理器功耗。此校准使McPAT的功率适合目标处理器功率。一旦我们调整每个块的功耗以最佳地匹配目标处理器中的功耗,我们训练有素的McPAT就会提供更精确的功耗估计。我们根据三星Exynos 5422 SoC中的Cortex-A15校准了McPAT的输出。我们观察到我们的方法成功地减少了误差,特别是对于具有波动功率行为的工作负载。结果表明,标定功率相对于实际硬件的平均百分比误差和平均百分比绝对误差分别为2.04%和4.37%。
{"title":"PowerTrain: A learning-based calibration of McPAT power models","authors":"Wooseok Lee, Youngchun Kim, Jee Ho Ryoo, Dam Sunwoo, A. Gerstlauer, L. John","doi":"10.1109/ISLPED.2015.7273512","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273512","url":null,"abstract":"As research on improving energy efficiency becomes prevalent, the necessity of a tool to accurately estimate power is increasing. Among various tools proposed, McPAT has gained some popularity due to its easy-to-use analytical power models. However, McPAT's prediction has several limitations. Although under- or over-estimated power from unmodeled and mis-modeled parts offset each other, it still incorporates errors in each block. Moreover, the lack of awareness to the implementation details exacerbates the prediction inaccuracies. To alleviate this problem, we propose a new methodology to train McPAT towards precise processor power prediction using power measurements from real hardware. This calibration enables McPAT's power to fit to the target processor power. Once we adjusted the power consumption of each block to best match those in the target processor, our trained McPAT delivered more precise power estimation. We calibrated the outputs of McPAT against a Cortex-A15 within a Samsung Exynos 5422 SoC. We observe that our methodology successfully reduces the errors, particularly for workloads with fluctuating power behaviors. The results show that the mean percentage error and the mean percentage absolute error of the calibrated power against real hardware are 2.04 percent and 4.37 percent, respectively.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"9 34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132969198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Analysis of adaptive clocking technique for resonant supply voltage noise mitigation 谐振电源电压噪声抑制的自适应时钟技术分析
Pub Date : 2015-07-22 DOI: 10.1109/ISLPED.2015.7273502
P. Whatmough, Shidhartha Das, David M. Bull
Resonant supply voltage noise is emerging as a serious limitation for power efficiency in SoCs for mobile products. Increasing supply currents coupled with stagnant package inductance is leading to significant AC supply impedance, which necessitates increasing supply voltage margins, impacting power efficiency. Adaptive clocking offers a potentially promising approach to reduce voltage margins, by stretching the clock period to match datapath delays. However, the adaptation bandwidth and clock distribution latencies required can be very demanding. We present analysis of the potential benefits from adaptive clocking based on measurements of supply voltage noise in a dual-core ARM Cortex-A57 cluster in a mobile SoC. By modeling an adaptive clocking system on the measured supply voltage noise dataset, we demonstrate that an adaptation latency of 1.5ns may offer a VMIN improvement of around 30mV and at 1ns improvements of 50mV. Benefits are workload dependent and ultimately limited by insurmountable synchronization and clock distribution latency.
谐振电源电压噪声正在成为移动产品soc功率效率的一个严重限制。增加的电源电流加上停滞的封装电感导致显著的交流电源阻抗,这需要增加电源电压裕度,从而影响功率效率。通过延长时钟周期以匹配数据路径延迟,自适应时钟提供了一种潜在的有前途的方法来降低电压裕度。然而,所需的自适应带宽和时钟分配延迟可能非常苛刻。我们基于移动SoC中双核ARM Cortex-A57集群的电源电压噪声测量,分析了自适应时钟的潜在好处。通过在测量的电源电压噪声数据集上建模自适应时钟系统,我们证明了1.5ns的自适应延迟可以提供约30mV的VMIN改进,在1ns时可以提供50mV的VMIN改进。好处取决于工作负载,并最终受到无法克服的同步和时钟分布延迟的限制。
{"title":"Analysis of adaptive clocking technique for resonant supply voltage noise mitigation","authors":"P. Whatmough, Shidhartha Das, David M. Bull","doi":"10.1109/ISLPED.2015.7273502","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273502","url":null,"abstract":"Resonant supply voltage noise is emerging as a serious limitation for power efficiency in SoCs for mobile products. Increasing supply currents coupled with stagnant package inductance is leading to significant AC supply impedance, which necessitates increasing supply voltage margins, impacting power efficiency. Adaptive clocking offers a potentially promising approach to reduce voltage margins, by stretching the clock period to match datapath delays. However, the adaptation bandwidth and clock distribution latencies required can be very demanding. We present analysis of the potential benefits from adaptive clocking based on measurements of supply voltage noise in a dual-core ARM Cortex-A57 cluster in a mobile SoC. By modeling an adaptive clocking system on the measured supply voltage noise dataset, we demonstrate that an adaptation latency of 1.5ns may offer a VMIN improvement of around 30mV and at 1ns improvements of 50mV. Benefits are workload dependent and ultimately limited by insurmountable synchronization and clock distribution latency.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123839546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A heuristic machine learning-based algorithm for power and thermal management of heterogeneous MPSoCs 基于启发式机器学习的异构mpsoc功率和热管理算法
Pub Date : 2015-07-22 DOI: 10.1109/ISLPED.2015.7273529
Arman Iranfar, S. Shahsavani, M. Kamal, A. Afzali-Kusha
In this work, we propose a power and thermal management algorithm based on machine learning to control the thermal stresses and power consumption of the heterogeneous MPSoCs. The objectives of the proposed algorithm are increasing the performance and decreasing the spatial and temporal temperature gradients along with the thermal cycling under the power and temperature constraints. Our proposed power and thermal management method is based on a heuristic approach to speed up the convergence of the machine learning algorithm which makes it applicable for general purpose processors. Adopting Q-Learning as the machine learning algorithm, the heuristic approach aids to limit the learning space by suggesting the most appropriate actions to the agent in each decision epoch. The heuristic algorithm employs the current and previous states of the machine learning, as well as the amount of the temperature stress and power consumption of each core to determine the appropriate action for each core, independently. The proposed algorithm is evaluated on 4-core, 8-core and 16-core homogeneous and heterogeneous MPSoCs for some benchmarks in the Splash2 benchmark package. The results reveal a faster convergence of machine learning and more thermal stresses reduction.
在这项工作中,我们提出了一种基于机器学习的功率和热管理算法来控制异构mpsoc的热应力和功耗。该算法的目标是在功率和温度约束下提高性能,减小随热循环而产生的时空温度梯度。我们提出的功率和热管理方法是基于一种启发式方法来加速机器学习算法的收敛,使其适用于通用处理器。采用Q-Learning作为机器学习算法,启发式方法通过在每个决策时期向智能体建议最合适的动作来限制学习空间。启发式算法利用机器学习的当前和以前的状态,以及每个核心的温度应力和功耗的大小,独立地确定每个核心的适当动作。在Splash2基准测试包中的一些基准测试中,对所提出的算法在4核、8核和16核同构和异构mpsoc上进行了评估。结果表明,机器学习的收敛速度更快,热应力降低程度更高。
{"title":"A heuristic machine learning-based algorithm for power and thermal management of heterogeneous MPSoCs","authors":"Arman Iranfar, S. Shahsavani, M. Kamal, A. Afzali-Kusha","doi":"10.1109/ISLPED.2015.7273529","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273529","url":null,"abstract":"In this work, we propose a power and thermal management algorithm based on machine learning to control the thermal stresses and power consumption of the heterogeneous MPSoCs. The objectives of the proposed algorithm are increasing the performance and decreasing the spatial and temporal temperature gradients along with the thermal cycling under the power and temperature constraints. Our proposed power and thermal management method is based on a heuristic approach to speed up the convergence of the machine learning algorithm which makes it applicable for general purpose processors. Adopting Q-Learning as the machine learning algorithm, the heuristic approach aids to limit the learning space by suggesting the most appropriate actions to the agent in each decision epoch. The heuristic algorithm employs the current and previous states of the machine learning, as well as the amount of the temperature stress and power consumption of each core to determine the appropriate action for each core, independently. The proposed algorithm is evaluated on 4-core, 8-core and 16-core homogeneous and heterogeneous MPSoCs for some benchmarks in the Splash2 benchmark package. The results reveal a faster convergence of machine learning and more thermal stresses reduction.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124071838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Optimizing Boolean embedding matrix for compressive sensing in RRAM crossbar 基于RRAM交叉棒压缩感知的布尔嵌入矩阵优化
Pub Date : 2015-07-22 DOI: 10.1109/ISLPED.2015.7273483
Yuhao Wang, Xin Li, Hao Yu, Leibin Ni, Wei Yang, Chuliang Weng, Junfeng Zhao
The emerging resistive random-access-memory (RRAM) crossbar provides an intrinsic fabric for matrix-vector multiplication, which can be leveraged as power efficient linear embedding hardware for data analytics such as compressive sensing. As the matrix elements are represented by resistance of RRAM cells, it imposes constraints for the embedding matrix due to limited RRAM programming resolution. A random Boolean embedding can be efficiently mapped to the RRAM crossbar but suffers from poor performance. Learning-based embedding matrices can deliver optimized performance but are continuous-valued which prevents it from being mapped to RRAM crossbar structure directly. In this paper, we have proposed one algorithm that can find an optimal Boolean embedding matrix for a given learned real-valued embedding matrix, so that it can be effectively mapped to the RRAM crossbar structure while high performance is preserved. The numerical experiments demonstrate that the proposed optimized Boolean embedding can reduce the embedding distortion by 2.7x, and image recovery error by 2.5x compared to the random Boolean embedding, both mapped on RRAM crossbar. In addition, optimized Boolean embedding on RRAM crossbar exhibits 10x faster speed, 17x better energy efficiency, and three orders of magnitude smaller area with slight accuracy penalty, when compared to the optimized real-valued embedding on CMOS ASIC platform.
新兴的电阻随机存取存储器(RRAM)交叉棒为矩阵向量乘法提供了内在结构,可以作为高效的线性嵌入硬件用于数据分析,如压缩感知。由于矩阵元素由RRAM单元的电阻表示,由于RRAM编程分辨率有限,它对嵌入矩阵施加了约束。随机布尔嵌入可以有效地映射到RRAM交叉栏,但性能较差。基于学习的嵌入矩阵可以提供最优的性能,但它是连续值的,这使得它不能直接映射到随机存储器的横条结构。在本文中,我们提出了一种算法,该算法可以为给定的实值嵌入矩阵找到最优布尔嵌入矩阵,从而有效地将其映射到RRAM交叉棒结构中,同时保持高性能。数值实验表明,与随机布尔嵌入相比,优化后的布尔嵌入可以将嵌入失真降低2.7倍,图像恢复误差降低2.5倍,并将其映射到RRAM横条上。此外,优化后的布尔嵌入在RRAM交叉条上的速度比优化后的CMOS ASIC平台上的实值嵌入快10倍,能效提高17倍,面积缩小3个数量级,精度略有下降。
{"title":"Optimizing Boolean embedding matrix for compressive sensing in RRAM crossbar","authors":"Yuhao Wang, Xin Li, Hao Yu, Leibin Ni, Wei Yang, Chuliang Weng, Junfeng Zhao","doi":"10.1109/ISLPED.2015.7273483","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273483","url":null,"abstract":"The emerging resistive random-access-memory (RRAM) crossbar provides an intrinsic fabric for matrix-vector multiplication, which can be leveraged as power efficient linear embedding hardware for data analytics such as compressive sensing. As the matrix elements are represented by resistance of RRAM cells, it imposes constraints for the embedding matrix due to limited RRAM programming resolution. A random Boolean embedding can be efficiently mapped to the RRAM crossbar but suffers from poor performance. Learning-based embedding matrices can deliver optimized performance but are continuous-valued which prevents it from being mapped to RRAM crossbar structure directly. In this paper, we have proposed one algorithm that can find an optimal Boolean embedding matrix for a given learned real-valued embedding matrix, so that it can be effectively mapped to the RRAM crossbar structure while high performance is preserved. The numerical experiments demonstrate that the proposed optimized Boolean embedding can reduce the embedding distortion by 2.7x, and image recovery error by 2.5x compared to the random Boolean embedding, both mapped on RRAM crossbar. In addition, optimized Boolean embedding on RRAM crossbar exhibits 10x faster speed, 17x better energy efficiency, and three orders of magnitude smaller area with slight accuracy penalty, when compared to the optimized real-valued embedding on CMOS ASIC platform.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128907430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Making sense of thermoelectrics for processor thermal management and energy harvesting 为处理器热管理和能量收集理解热电
Pub Date : 2015-07-22 DOI: 10.1109/ISLPED.2015.7273486
S. Jayakumar, S. Reda
A thermoelectric (TE) device can be used as a heat pump that consumes electric power to cool a processor chip, or it can be used as a heat engine that generates electricity from the heat dissipated during processor operation. To better understand the use of TE devices, we develop a fully instrumented processor-based system with controllable TE devices. We first examine the use of TE devices for energy harvesting. We identify a pitfall in previous works that can lead to wrong conclusions for TEG use by demonstrating that TEGs increase the processor's leakage power which offsets their harvested power. For thermoelectric cooling (TEC), we elucidate the intricate relationships between the processor power, thermoelectric power, and fan power. We propose a dynamic thermal management scheme (DTM) that maximizes performance under thermal constraints and given total power budgets by controlling the processor's dynamic frequency and voltage scaling (DVFS), TEC current, and fan speed. For the evaluated thermal constraints, our results demonstrate good improvements to performance at the cost of additional cooling power compared to standard DVFS+fan DTM techniques.
热电(TE)装置可以用作消耗电力的热泵来冷却处理器芯片,也可以用作热机,利用处理器运行过程中散发的热量发电。为了更好地理解TE设备的使用,我们开发了一个完全仪表化的基于处理器的系统,其中包含可控的TE设备。我们首先检查TE设备用于能量收集的使用。我们在以前的工作中发现了一个陷阱,通过证明TEG会增加处理器的泄漏功率,从而抵消其收获的功率,从而导致对TEG使用的错误结论。对于热电冷却(TEC),我们阐明了处理器功率、热电功率和风扇功率之间的复杂关系。我们提出了一种动态热管理方案(DTM),通过控制处理器的动态频率和电压缩放(DVFS)、TEC电流和风扇速度,在热约束和给定的总功率预算下最大化性能。对于评估的热约束,我们的结果表明,与标准的DVFS+风扇DTM技术相比,在额外冷却功率的代价下,性能得到了良好的改善。
{"title":"Making sense of thermoelectrics for processor thermal management and energy harvesting","authors":"S. Jayakumar, S. Reda","doi":"10.1109/ISLPED.2015.7273486","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273486","url":null,"abstract":"A thermoelectric (TE) device can be used as a heat pump that consumes electric power to cool a processor chip, or it can be used as a heat engine that generates electricity from the heat dissipated during processor operation. To better understand the use of TE devices, we develop a fully instrumented processor-based system with controllable TE devices. We first examine the use of TE devices for energy harvesting. We identify a pitfall in previous works that can lead to wrong conclusions for TEG use by demonstrating that TEGs increase the processor's leakage power which offsets their harvested power. For thermoelectric cooling (TEC), we elucidate the intricate relationships between the processor power, thermoelectric power, and fan power. We propose a dynamic thermal management scheme (DTM) that maximizes performance under thermal constraints and given total power budgets by controlling the processor's dynamic frequency and voltage scaling (DVFS), TEC current, and fan speed. For the evaluated thermal constraints, our results demonstrate good improvements to performance at the cost of additional cooling power compared to standard DVFS+fan DTM techniques.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127500397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Tackling voltage emergencies in NoC through timing error resilience 利用时序误差弹性处理NoC电压突发事件
Pub Date : 2015-07-22 DOI: 10.1109/ISLPED.2015.7273498
S. RajeshJ., D. Ancajas, Koushik Chakraborty, Sanghamitra Roy
Aggressive technology scaling exacerbates the problem of voltage emergencies in emerging MPSoC systems. Network-on-Chips, the de-facto standard for connecting on-chip components in forthcoming devices play a central role in providing robust and reliable communication. In this work, we propose DrNoC (droop resilient network-on-chip)-two microarchitectural techniques to mitigate voltage emergency-induced timing errors in NoCs and preserve error-free communication throughout the network. DrNoC employs frequency downscaling and a pipeline error-recovery mechanism to reclaim corrupted flits in the router. Compared to the recently proposed NSFTR fault-tolerant technique, DrNoC offers a 27% improvement in energy-delay efficiency.
激进的技术扩展加剧了新兴MPSoC系统中的电压紧急情况问题。片上网络是即将问世的设备中连接片上组件的事实上的标准,在提供健壮和可靠的通信方面起着核心作用。在这项工作中,我们提出了DrNoC(下垂弹性片上网络)-两种微架构技术,以减轻noc中电压紧急引起的时序误差,并在整个网络中保持无差错通信。DrNoC采用频率降尺度和管道错误恢复机制来回收路由器中损坏的flits。与最近提出的NSFTR容错技术相比,DrNoC在能量延迟效率方面提高了27%。
{"title":"Tackling voltage emergencies in NoC through timing error resilience","authors":"S. RajeshJ., D. Ancajas, Koushik Chakraborty, Sanghamitra Roy","doi":"10.1109/ISLPED.2015.7273498","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273498","url":null,"abstract":"Aggressive technology scaling exacerbates the problem of voltage emergencies in emerging MPSoC systems. Network-on-Chips, the de-facto standard for connecting on-chip components in forthcoming devices play a central role in providing robust and reliable communication. In this work, we propose DrNoC (droop resilient network-on-chip)-two microarchitectural techniques to mitigate voltage emergency-induced timing errors in NoCs and preserve error-free communication throughout the network. DrNoC employs frequency downscaling and a pipeline error-recovery mechanism to reclaim corrupted flits in the router. Compared to the recently proposed NSFTR fault-tolerant technique, DrNoC offers a 27% improvement in energy-delay efficiency.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125996841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1