Pub Date : 2015-07-22DOI: 10.1109/ISLPED.2015.7273523
Jae-Yeon Won, Paul V. Gratz, S. Shakkottai, Jiang Hu
Typically in computer systems, performance must be traded-off to achieve energy savings or, conversely, performance gains come with significant energy overhead. Here, we present a novel approach that can achieve synergistic energy-savings and performance gain in chip multiprocessors (CMPs). Our key observation is that per-core dynamic voltage/frequency scaling (DVFS) can be used as a client regulation mechanism for shared resources on-die. Based on this observation, we propose a new DVFS technique inspired by TCP Vegas, a congestion control protocol from the IP-networking domain. Full system simulations on PARSEC benchmarks show that our technique reduces total CMP energy dissipation by over 40% with a small performance improvement.
{"title":"Having your cake and eating it too: Energy savings without performance loss through resource sharing driven power management","authors":"Jae-Yeon Won, Paul V. Gratz, S. Shakkottai, Jiang Hu","doi":"10.1109/ISLPED.2015.7273523","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273523","url":null,"abstract":"Typically in computer systems, performance must be traded-off to achieve energy savings or, conversely, performance gains come with significant energy overhead. Here, we present a novel approach that can achieve synergistic energy-savings and performance gain in chip multiprocessors (CMPs). Our key observation is that per-core dynamic voltage/frequency scaling (DVFS) can be used as a client regulation mechanism for shared resources on-die. Based on this observation, we propose a new DVFS technique inspired by TCP Vegas, a congestion control protocol from the IP-networking domain. Full system simulations on PARSEC benchmarks show that our technique reduces total CMP energy dissipation by over 40% with a small performance improvement.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133810459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-07-22DOI: 10.1109/ISLPED.2015.7273506
Runjie Zhang, K. Mazumdar, B. Meyer, Ke Wang, K. Skadron, M. Stan
Aside from the benefits it brings, 3D-IC technology inevitably exacerbates the difficulty of power delivery with volumetrically increasing power consumption. Recent work managed to “recycle” current within the 3D stack by linking the different layers' supply/ground nets into a series connection. This charge-recycled (also known as voltage-stacked, or V-S) scheme provides a scalable solution for 3D-IC's power delivery because it supports an arbitrary number of layers with a constant off-chip current demand. Although prior work has studied the circuit implementation of a V-S power delivery network (PDN) and its current-reduction benefits, a whole-system evaluation of V-S PDNs' transient voltage noise and a noise comparison between the V-S PDN and the traditional PDN are missing. In this paper, we build a system-level model to examine voltage-stacked 3D-ICs' transient noise and explore the impact of different PDN design parameters and workload behaviors. Our results show that compared with the traditional PDN scheme, V-S provides stronger isolation for cross-layer noise interference, which in turn grants higher performance benefits for run-time noise mitigation techniques, such as dynamic margin adaptation. We observe that, compared with traditional PDNs, V-S PDNs provide up to 60% lower transient noise in the worst-case scenario. Furthermore, we show that V-S PDNs significantly reduce the packaging cost, because their noise is almost insensitive to the package impedance (e.g., a 300% impedance increase only raises worst-case noise by less than 0.3% Vdd).
{"title":"Transient voltage noise in charge-recycled power delivery networks for many-layer 3D-IC","authors":"Runjie Zhang, K. Mazumdar, B. Meyer, Ke Wang, K. Skadron, M. Stan","doi":"10.1109/ISLPED.2015.7273506","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273506","url":null,"abstract":"Aside from the benefits it brings, 3D-IC technology inevitably exacerbates the difficulty of power delivery with volumetrically increasing power consumption. Recent work managed to “recycle” current within the 3D stack by linking the different layers' supply/ground nets into a series connection. This charge-recycled (also known as voltage-stacked, or V-S) scheme provides a scalable solution for 3D-IC's power delivery because it supports an arbitrary number of layers with a constant off-chip current demand. Although prior work has studied the circuit implementation of a V-S power delivery network (PDN) and its current-reduction benefits, a whole-system evaluation of V-S PDNs' transient voltage noise and a noise comparison between the V-S PDN and the traditional PDN are missing. In this paper, we build a system-level model to examine voltage-stacked 3D-ICs' transient noise and explore the impact of different PDN design parameters and workload behaviors. Our results show that compared with the traditional PDN scheme, V-S provides stronger isolation for cross-layer noise interference, which in turn grants higher performance benefits for run-time noise mitigation techniques, such as dynamic margin adaptation. We observe that, compared with traditional PDNs, V-S PDNs provide up to 60% lower transient noise in the worst-case scenario. Furthermore, we show that V-S PDNs significantly reduce the packaging cost, because their noise is almost insensitive to the package impedance (e.g., a 300% impedance increase only raises worst-case noise by less than 0.3% Vdd).","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129353624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To mitigate the “Power Wall” challenges for both mobile devices and data centers, accelerator-rich architecture with normally-off mode has been intensively studied recently. Power/energy optimization in high-level synthesis for accelerator design is critical for such accelerator-rich architecture. The emerging nonvolatile memory (NVM), offers many benefits such as ultra-low leakage power, high density, and instant power-on/off, and therefore is a promising alternative for the hardware accelerator design to achieve further power reduction. However, such NVM suffers from large write energy and latency, which brings new challenges for the buffer allocation in the custom accelerator design. This paper presents the first framework that optimizes NVM allocation in high-level synthesis for custom accelerator design, considering loop transformations. It solves the loop transformation, buffer allocation, and buffer type selection to minimize the memory power consumption, while under area, bandwidth, and performance constraints. This paper formulates the optimization problem, and solves it with a problem-specific designed stimulated annealing solution. Experiments demonstrate 32% extra power reduction compared with the previous method without optimizing loop transformations.
{"title":"Leveraging emerging nonvolatile memory in high-level synthesis with loop transformations","authors":"Shuangchen Li, Ang Li, Yuan Zhe, Yongpan Liu, Peng Li, Guangyu Sun, Yu Wang, Huazhong Yang, Yuan Xie","doi":"10.1109/ISLPED.2015.7273491","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273491","url":null,"abstract":"To mitigate the “Power Wall” challenges for both mobile devices and data centers, accelerator-rich architecture with normally-off mode has been intensively studied recently. Power/energy optimization in high-level synthesis for accelerator design is critical for such accelerator-rich architecture. The emerging nonvolatile memory (NVM), offers many benefits such as ultra-low leakage power, high density, and instant power-on/off, and therefore is a promising alternative for the hardware accelerator design to achieve further power reduction. However, such NVM suffers from large write energy and latency, which brings new challenges for the buffer allocation in the custom accelerator design. This paper presents the first framework that optimizes NVM allocation in high-level synthesis for custom accelerator design, considering loop transformations. It solves the loop transformation, buffer allocation, and buffer type selection to minimize the memory power consumption, while under area, bandwidth, and performance constraints. This paper formulates the optimization problem, and solves it with a problem-specific designed stimulated annealing solution. Experiments demonstrate 32% extra power reduction compared with the previous method without optimizing loop transformations.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114587968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-07-22DOI: 10.1109/ISLPED.2015.7273495
J. Kung, Duckhwan Kim, S. Mukhopadhyay
This paper proposes a power-aware digital feedforward neural network platform that utilizes the backpropagation algorithm during training to enable energy-quality trade-off. Given a quality constraint, the proposed approach identifies a set of synaptic weights for approximation in a neural network. The approach selects synapses with small impact on output error, estimated by the backpropagation algorithm, for approximation. The approximations are achieved by a coupled software (reduced bit-width) and hardware (approximate multiplication in the processing engine) based design approaches. The full-chip design in 130nm CMOS shows, compared to a baseline accurate design, the proposed approach reduces system power by ~38% with 0.4% lower recognition accuracy in a classification problem.
{"title":"A power-aware digital feedforward neural network platform with backpropagation driven approximate synapses","authors":"J. Kung, Duckhwan Kim, S. Mukhopadhyay","doi":"10.1109/ISLPED.2015.7273495","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273495","url":null,"abstract":"This paper proposes a power-aware digital feedforward neural network platform that utilizes the backpropagation algorithm during training to enable energy-quality trade-off. Given a quality constraint, the proposed approach identifies a set of synaptic weights for approximation in a neural network. The approach selects synapses with small impact on output error, estimated by the backpropagation algorithm, for approximation. The approximations are achieved by a coupled software (reduced bit-width) and hardware (approximate multiplication in the processing engine) based design approaches. The full-chip design in 130nm CMOS shows, compared to a baseline accurate design, the proposed approach reduces system power by ~38% with 0.4% lower recognition accuracy in a classification problem.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123708266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-07-22DOI: 10.1109/ISLPED.2015.7273512
Wooseok Lee, Youngchun Kim, Jee Ho Ryoo, Dam Sunwoo, A. Gerstlauer, L. John
As research on improving energy efficiency becomes prevalent, the necessity of a tool to accurately estimate power is increasing. Among various tools proposed, McPAT has gained some popularity due to its easy-to-use analytical power models. However, McPAT's prediction has several limitations. Although under- or over-estimated power from unmodeled and mis-modeled parts offset each other, it still incorporates errors in each block. Moreover, the lack of awareness to the implementation details exacerbates the prediction inaccuracies. To alleviate this problem, we propose a new methodology to train McPAT towards precise processor power prediction using power measurements from real hardware. This calibration enables McPAT's power to fit to the target processor power. Once we adjusted the power consumption of each block to best match those in the target processor, our trained McPAT delivered more precise power estimation. We calibrated the outputs of McPAT against a Cortex-A15 within a Samsung Exynos 5422 SoC. We observe that our methodology successfully reduces the errors, particularly for workloads with fluctuating power behaviors. The results show that the mean percentage error and the mean percentage absolute error of the calibrated power against real hardware are 2.04 percent and 4.37 percent, respectively.
{"title":"PowerTrain: A learning-based calibration of McPAT power models","authors":"Wooseok Lee, Youngchun Kim, Jee Ho Ryoo, Dam Sunwoo, A. Gerstlauer, L. John","doi":"10.1109/ISLPED.2015.7273512","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273512","url":null,"abstract":"As research on improving energy efficiency becomes prevalent, the necessity of a tool to accurately estimate power is increasing. Among various tools proposed, McPAT has gained some popularity due to its easy-to-use analytical power models. However, McPAT's prediction has several limitations. Although under- or over-estimated power from unmodeled and mis-modeled parts offset each other, it still incorporates errors in each block. Moreover, the lack of awareness to the implementation details exacerbates the prediction inaccuracies. To alleviate this problem, we propose a new methodology to train McPAT towards precise processor power prediction using power measurements from real hardware. This calibration enables McPAT's power to fit to the target processor power. Once we adjusted the power consumption of each block to best match those in the target processor, our trained McPAT delivered more precise power estimation. We calibrated the outputs of McPAT against a Cortex-A15 within a Samsung Exynos 5422 SoC. We observe that our methodology successfully reduces the errors, particularly for workloads with fluctuating power behaviors. The results show that the mean percentage error and the mean percentage absolute error of the calibrated power against real hardware are 2.04 percent and 4.37 percent, respectively.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"9 34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132969198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-07-22DOI: 10.1109/ISLPED.2015.7273502
P. Whatmough, Shidhartha Das, David M. Bull
Resonant supply voltage noise is emerging as a serious limitation for power efficiency in SoCs for mobile products. Increasing supply currents coupled with stagnant package inductance is leading to significant AC supply impedance, which necessitates increasing supply voltage margins, impacting power efficiency. Adaptive clocking offers a potentially promising approach to reduce voltage margins, by stretching the clock period to match datapath delays. However, the adaptation bandwidth and clock distribution latencies required can be very demanding. We present analysis of the potential benefits from adaptive clocking based on measurements of supply voltage noise in a dual-core ARM Cortex-A57 cluster in a mobile SoC. By modeling an adaptive clocking system on the measured supply voltage noise dataset, we demonstrate that an adaptation latency of 1.5ns may offer a VMIN improvement of around 30mV and at 1ns improvements of 50mV. Benefits are workload dependent and ultimately limited by insurmountable synchronization and clock distribution latency.
{"title":"Analysis of adaptive clocking technique for resonant supply voltage noise mitigation","authors":"P. Whatmough, Shidhartha Das, David M. Bull","doi":"10.1109/ISLPED.2015.7273502","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273502","url":null,"abstract":"Resonant supply voltage noise is emerging as a serious limitation for power efficiency in SoCs for mobile products. Increasing supply currents coupled with stagnant package inductance is leading to significant AC supply impedance, which necessitates increasing supply voltage margins, impacting power efficiency. Adaptive clocking offers a potentially promising approach to reduce voltage margins, by stretching the clock period to match datapath delays. However, the adaptation bandwidth and clock distribution latencies required can be very demanding. We present analysis of the potential benefits from adaptive clocking based on measurements of supply voltage noise in a dual-core ARM Cortex-A57 cluster in a mobile SoC. By modeling an adaptive clocking system on the measured supply voltage noise dataset, we demonstrate that an adaptation latency of 1.5ns may offer a VMIN improvement of around 30mV and at 1ns improvements of 50mV. Benefits are workload dependent and ultimately limited by insurmountable synchronization and clock distribution latency.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123839546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-07-22DOI: 10.1109/ISLPED.2015.7273529
Arman Iranfar, S. Shahsavani, M. Kamal, A. Afzali-Kusha
In this work, we propose a power and thermal management algorithm based on machine learning to control the thermal stresses and power consumption of the heterogeneous MPSoCs. The objectives of the proposed algorithm are increasing the performance and decreasing the spatial and temporal temperature gradients along with the thermal cycling under the power and temperature constraints. Our proposed power and thermal management method is based on a heuristic approach to speed up the convergence of the machine learning algorithm which makes it applicable for general purpose processors. Adopting Q-Learning as the machine learning algorithm, the heuristic approach aids to limit the learning space by suggesting the most appropriate actions to the agent in each decision epoch. The heuristic algorithm employs the current and previous states of the machine learning, as well as the amount of the temperature stress and power consumption of each core to determine the appropriate action for each core, independently. The proposed algorithm is evaluated on 4-core, 8-core and 16-core homogeneous and heterogeneous MPSoCs for some benchmarks in the Splash2 benchmark package. The results reveal a faster convergence of machine learning and more thermal stresses reduction.
{"title":"A heuristic machine learning-based algorithm for power and thermal management of heterogeneous MPSoCs","authors":"Arman Iranfar, S. Shahsavani, M. Kamal, A. Afzali-Kusha","doi":"10.1109/ISLPED.2015.7273529","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273529","url":null,"abstract":"In this work, we propose a power and thermal management algorithm based on machine learning to control the thermal stresses and power consumption of the heterogeneous MPSoCs. The objectives of the proposed algorithm are increasing the performance and decreasing the spatial and temporal temperature gradients along with the thermal cycling under the power and temperature constraints. Our proposed power and thermal management method is based on a heuristic approach to speed up the convergence of the machine learning algorithm which makes it applicable for general purpose processors. Adopting Q-Learning as the machine learning algorithm, the heuristic approach aids to limit the learning space by suggesting the most appropriate actions to the agent in each decision epoch. The heuristic algorithm employs the current and previous states of the machine learning, as well as the amount of the temperature stress and power consumption of each core to determine the appropriate action for each core, independently. The proposed algorithm is evaluated on 4-core, 8-core and 16-core homogeneous and heterogeneous MPSoCs for some benchmarks in the Splash2 benchmark package. The results reveal a faster convergence of machine learning and more thermal stresses reduction.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124071838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The emerging resistive random-access-memory (RRAM) crossbar provides an intrinsic fabric for matrix-vector multiplication, which can be leveraged as power efficient linear embedding hardware for data analytics such as compressive sensing. As the matrix elements are represented by resistance of RRAM cells, it imposes constraints for the embedding matrix due to limited RRAM programming resolution. A random Boolean embedding can be efficiently mapped to the RRAM crossbar but suffers from poor performance. Learning-based embedding matrices can deliver optimized performance but are continuous-valued which prevents it from being mapped to RRAM crossbar structure directly. In this paper, we have proposed one algorithm that can find an optimal Boolean embedding matrix for a given learned real-valued embedding matrix, so that it can be effectively mapped to the RRAM crossbar structure while high performance is preserved. The numerical experiments demonstrate that the proposed optimized Boolean embedding can reduce the embedding distortion by 2.7x, and image recovery error by 2.5x compared to the random Boolean embedding, both mapped on RRAM crossbar. In addition, optimized Boolean embedding on RRAM crossbar exhibits 10x faster speed, 17x better energy efficiency, and three orders of magnitude smaller area with slight accuracy penalty, when compared to the optimized real-valued embedding on CMOS ASIC platform.
{"title":"Optimizing Boolean embedding matrix for compressive sensing in RRAM crossbar","authors":"Yuhao Wang, Xin Li, Hao Yu, Leibin Ni, Wei Yang, Chuliang Weng, Junfeng Zhao","doi":"10.1109/ISLPED.2015.7273483","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273483","url":null,"abstract":"The emerging resistive random-access-memory (RRAM) crossbar provides an intrinsic fabric for matrix-vector multiplication, which can be leveraged as power efficient linear embedding hardware for data analytics such as compressive sensing. As the matrix elements are represented by resistance of RRAM cells, it imposes constraints for the embedding matrix due to limited RRAM programming resolution. A random Boolean embedding can be efficiently mapped to the RRAM crossbar but suffers from poor performance. Learning-based embedding matrices can deliver optimized performance but are continuous-valued which prevents it from being mapped to RRAM crossbar structure directly. In this paper, we have proposed one algorithm that can find an optimal Boolean embedding matrix for a given learned real-valued embedding matrix, so that it can be effectively mapped to the RRAM crossbar structure while high performance is preserved. The numerical experiments demonstrate that the proposed optimized Boolean embedding can reduce the embedding distortion by 2.7x, and image recovery error by 2.5x compared to the random Boolean embedding, both mapped on RRAM crossbar. In addition, optimized Boolean embedding on RRAM crossbar exhibits 10x faster speed, 17x better energy efficiency, and three orders of magnitude smaller area with slight accuracy penalty, when compared to the optimized real-valued embedding on CMOS ASIC platform.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128907430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-07-22DOI: 10.1109/ISLPED.2015.7273486
S. Jayakumar, S. Reda
A thermoelectric (TE) device can be used as a heat pump that consumes electric power to cool a processor chip, or it can be used as a heat engine that generates electricity from the heat dissipated during processor operation. To better understand the use of TE devices, we develop a fully instrumented processor-based system with controllable TE devices. We first examine the use of TE devices for energy harvesting. We identify a pitfall in previous works that can lead to wrong conclusions for TEG use by demonstrating that TEGs increase the processor's leakage power which offsets their harvested power. For thermoelectric cooling (TEC), we elucidate the intricate relationships between the processor power, thermoelectric power, and fan power. We propose a dynamic thermal management scheme (DTM) that maximizes performance under thermal constraints and given total power budgets by controlling the processor's dynamic frequency and voltage scaling (DVFS), TEC current, and fan speed. For the evaluated thermal constraints, our results demonstrate good improvements to performance at the cost of additional cooling power compared to standard DVFS+fan DTM techniques.
{"title":"Making sense of thermoelectrics for processor thermal management and energy harvesting","authors":"S. Jayakumar, S. Reda","doi":"10.1109/ISLPED.2015.7273486","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273486","url":null,"abstract":"A thermoelectric (TE) device can be used as a heat pump that consumes electric power to cool a processor chip, or it can be used as a heat engine that generates electricity from the heat dissipated during processor operation. To better understand the use of TE devices, we develop a fully instrumented processor-based system with controllable TE devices. We first examine the use of TE devices for energy harvesting. We identify a pitfall in previous works that can lead to wrong conclusions for TEG use by demonstrating that TEGs increase the processor's leakage power which offsets their harvested power. For thermoelectric cooling (TEC), we elucidate the intricate relationships between the processor power, thermoelectric power, and fan power. We propose a dynamic thermal management scheme (DTM) that maximizes performance under thermal constraints and given total power budgets by controlling the processor's dynamic frequency and voltage scaling (DVFS), TEC current, and fan speed. For the evaluated thermal constraints, our results demonstrate good improvements to performance at the cost of additional cooling power compared to standard DVFS+fan DTM techniques.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127500397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-07-22DOI: 10.1109/ISLPED.2015.7273498
S. RajeshJ., D. Ancajas, Koushik Chakraborty, Sanghamitra Roy
Aggressive technology scaling exacerbates the problem of voltage emergencies in emerging MPSoC systems. Network-on-Chips, the de-facto standard for connecting on-chip components in forthcoming devices play a central role in providing robust and reliable communication. In this work, we propose DrNoC (droop resilient network-on-chip)-two microarchitectural techniques to mitigate voltage emergency-induced timing errors in NoCs and preserve error-free communication throughout the network. DrNoC employs frequency downscaling and a pipeline error-recovery mechanism to reclaim corrupted flits in the router. Compared to the recently proposed NSFTR fault-tolerant technique, DrNoC offers a 27% improvement in energy-delay efficiency.
{"title":"Tackling voltage emergencies in NoC through timing error resilience","authors":"S. RajeshJ., D. Ancajas, Koushik Chakraborty, Sanghamitra Roy","doi":"10.1109/ISLPED.2015.7273498","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273498","url":null,"abstract":"Aggressive technology scaling exacerbates the problem of voltage emergencies in emerging MPSoC systems. Network-on-Chips, the de-facto standard for connecting on-chip components in forthcoming devices play a central role in providing robust and reliable communication. In this work, we propose DrNoC (droop resilient network-on-chip)-two microarchitectural techniques to mitigate voltage emergency-induced timing errors in NoCs and preserve error-free communication throughout the network. DrNoC employs frequency downscaling and a pipeline error-recovery mechanism to reclaim corrupted flits in the router. Compared to the recently proposed NSFTR fault-tolerant technique, DrNoC offers a 27% improvement in energy-delay efficiency.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125996841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}