首页 > 最新文献

2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)最新文献

英文 中文
CGSharing: Efficient content sharing in GPU-based cloud gaming CGSharing:基于gpu的云游戏中高效的内容共享
Pub Date : 2015-07-22 DOI: 10.1109/ISLPED.2015.7273509
Xiangyu Wu, Y. Xia, Naifeng Jing, Xiaoyao Liang
With the fast development of the GPU server technology, cloud gaming has become popular in recent years. Unlike the traditional desktop gaming where the graphic rendering is performed locally using the user's personal graphics card, cloud gaming runs multiple games to support many users at the same time in the data center where most of the rendering jobs are done in the remote GPU cluster. The rendered frames are streamed to user's devices such as notebooks, tablets and cell phones. For the economic cloud gaming to be viable, the operator must make full utilization of the expensive hardware resources like the graphic cards, and the state of art technology tries to render multiple instances of games on the same GPU. In this paper, we first identify that there are many redundant and duplicated contexts/workloads existing in today's cloud gaming rendering that waste a large amount of memory bandwidth and system energy. We in turn propose novel system architecture enhancements to effectively share the contents across the game instances from different users in the cloud gaming center.
随着GPU服务器技术的快速发展,近年来云游戏开始流行。与使用用户个人显卡本地执行图形渲染的传统桌面游戏不同,云游戏在数据中心同时运行多个游戏以支持许多用户,其中大多数渲染工作在远程GPU集群中完成。渲染的帧流传输到用户的设备,如笔记本电脑、平板电脑和手机。为了使经济的云游戏可行,运营商必须充分利用昂贵的硬件资源,如图形卡,并且最先进的技术试图在同一个GPU上呈现多个游戏实例。在本文中,我们首先确定在今天的云游戏渲染中存在许多冗余和重复的上下文/工作负载,这些上下文/工作负载浪费了大量的内存带宽和系统能量。反过来,我们提出了新的系统架构增强功能,以便在云游戏中心的不同用户的游戏实例之间有效地共享内容。
{"title":"CGSharing: Efficient content sharing in GPU-based cloud gaming","authors":"Xiangyu Wu, Y. Xia, Naifeng Jing, Xiaoyao Liang","doi":"10.1109/ISLPED.2015.7273509","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273509","url":null,"abstract":"With the fast development of the GPU server technology, cloud gaming has become popular in recent years. Unlike the traditional desktop gaming where the graphic rendering is performed locally using the user's personal graphics card, cloud gaming runs multiple games to support many users at the same time in the data center where most of the rendering jobs are done in the remote GPU cluster. The rendered frames are streamed to user's devices such as notebooks, tablets and cell phones. For the economic cloud gaming to be viable, the operator must make full utilization of the expensive hardware resources like the graphic cards, and the state of art technology tries to render multiple instances of games on the same GPU. In this paper, we first identify that there are many redundant and duplicated contexts/workloads existing in today's cloud gaming rendering that waste a large amount of memory bandwidth and system energy. We in turn propose novel system architecture enhancements to effectively share the contents across the game instances from different users in the cloud gaming center.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"153 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131300211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
COAST: Correlated material assisted STT MRAMs for optimized read operation COAST:相关材料辅助STT mram优化读取操作
Pub Date : 2015-07-22 DOI: 10.1109/ISLPED.2015.7273481
Ahmedullah Aziz, N. Shukla, S. Datta, S. Gupta
We present a novel technique for optimizing the read operation of spin-transfer torque (STT) MRAMs by employing a correlated material in conjunction with a magnetic tunnel junction (MTJ). The design of the proposed memory cell is based on exploiting the orders-of-magnitude difference in the resistance of the two phases of the correlated material (CM) and triggering operation-driven phase transitions in the CM by judiciously co-optimizing devices and the memory cell. During read, the CM operates in the metallic and insulating phases when the MTJ is in the low resistance and high resistance states, respectively. This leads to superior distinguishability, read efficiency and stability. During write, the CM operates in the metallic phase, which minimizes the impact of the CM resistance on the write speed. Our analysis shows that CM amplifies the cell tunneling magneto-resistance from 107% (for the standard STT MRAM) to 1878% (for the proposed cell) leading to 68% higher sense margin. In addition, 45% enhancement in the read disturb margin and 36% reduction in the cell read power is achieved. At the same time, the write asymmetry associated with different state transitions is mildly mitigated, leading to 9% reduction in the write power. This comes at a negligible cost of 4% larger write time. We also discuss the layout implications of our technique and propose the sharing of the CM amongst multiple cells. As a result of the sharing, the proposed technique incurs no area penalty.
我们提出了一种优化自旋传递扭矩(STT) mram读取操作的新技术,该技术采用了与磁隧道结(MTJ)结合的相关材料。所提出的存储单元的设计是基于利用相关材料(CM)的两相电阻的数量级差异,并通过明智地共同优化器件和存储单元来触发CM中的操作驱动相变。在读取过程中,当MTJ分别处于低电阻和高电阻状态时,CM工作在金属相和绝缘相。这导致了优异的可识别性,读取效率和稳定性。在写入过程中,CM工作在金属相,这使CM电阻对写入速度的影响最小化。我们的分析表明,CM将电池的隧道磁电阻从107%(适用于标准STT MRAM)放大到1878%(适用于提议的电池),从而提高68%的感知裕度。此外,读取干扰裕度提高了45%,小区读取功率降低了36%。与此同时,与不同状态转换相关的写不对称得到了轻微缓解,导致写功率降低了9%。这样做的代价可以忽略不计,只是增加了4%的写入时间。我们还讨论了我们的技术的布局含义,并建议在多个单元之间共享CM。由于共享,所提出的技术不会造成面积损失。
{"title":"COAST: Correlated material assisted STT MRAMs for optimized read operation","authors":"Ahmedullah Aziz, N. Shukla, S. Datta, S. Gupta","doi":"10.1109/ISLPED.2015.7273481","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273481","url":null,"abstract":"We present a novel technique for optimizing the read operation of spin-transfer torque (STT) MRAMs by employing a correlated material in conjunction with a magnetic tunnel junction (MTJ). The design of the proposed memory cell is based on exploiting the orders-of-magnitude difference in the resistance of the two phases of the correlated material (CM) and triggering operation-driven phase transitions in the CM by judiciously co-optimizing devices and the memory cell. During read, the CM operates in the metallic and insulating phases when the MTJ is in the low resistance and high resistance states, respectively. This leads to superior distinguishability, read efficiency and stability. During write, the CM operates in the metallic phase, which minimizes the impact of the CM resistance on the write speed. Our analysis shows that CM amplifies the cell tunneling magneto-resistance from 107% (for the standard STT MRAM) to 1878% (for the proposed cell) leading to 68% higher sense margin. In addition, 45% enhancement in the read disturb margin and 36% reduction in the cell read power is achieved. At the same time, the write asymmetry associated with different state transitions is mildly mitigated, leading to 9% reduction in the write power. This comes at a negligible cost of 4% larger write time. We also discuss the layout implications of our technique and propose the sharing of the CM amongst multiple cells. As a result of the sharing, the proposed technique incurs no area penalty.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"152 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116499954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Energy efficient scheduling for web search on heterogeneous microservers 异构微服务器上web搜索的节能调度
Pub Date : 2015-07-22 DOI: 10.1109/ISLPED.2015.7273510
Sankalp Jain, Harshad Navale, Ümit Y. Ogras, S. Garg
Heterogeneous multi-core processors, such as the ARM big-LITTLE architecture, are becoming increasingly popular due to power and thermal constraints. In this paper, we address the use of low-power heterogeneous multi-cores as microservers utilizing web search as a motivational application. In particular, we propose a new family of scheduling policies for heterogeneous microservers to optimize for performance metrics such as mean response time and service level agreements, while guaranteeing thermally-safe operation. Thorough experimental evaluations on a big-LITTLE platform demonstrate that naive performance-oriented scheduling policies quickly result in thermal instability, while the proposed policies not only reduce peak temperature but also achieve 4.8× reduction in processing time and 5.6× increase in energy efficiency compared to baseline scheduling policies.
异构多核处理器,如ARM的big-LITTLE架构,由于功率和热限制而变得越来越流行。在本文中,我们讨论了使用低功耗异构多核作为微服务器,利用web搜索作为激励应用程序。特别是,我们为异构微服务器提出了一系列新的调度策略,以优化性能指标,如平均响应时间和服务水平协议,同时保证热安全操作。在big-LITTLE平台上进行的实验评估表明,单纯的以性能为导向的调度策略会迅速导致热不稳定,而与基线调度策略相比,所提出的策略不仅降低了峰值温度,而且处理时间减少了4.8倍,能源效率提高了5.6倍。
{"title":"Energy efficient scheduling for web search on heterogeneous microservers","authors":"Sankalp Jain, Harshad Navale, Ümit Y. Ogras, S. Garg","doi":"10.1109/ISLPED.2015.7273510","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273510","url":null,"abstract":"Heterogeneous multi-core processors, such as the ARM big-LITTLE architecture, are becoming increasingly popular due to power and thermal constraints. In this paper, we address the use of low-power heterogeneous multi-cores as microservers utilizing web search as a motivational application. In particular, we propose a new family of scheduling policies for heterogeneous microservers to optimize for performance metrics such as mean response time and service level agreements, while guaranteeing thermally-safe operation. Thorough experimental evaluations on a big-LITTLE platform demonstrate that naive performance-oriented scheduling policies quickly result in thermal instability, while the proposed policies not only reduce peak temperature but also achieve 4.8× reduction in processing time and 5.6× increase in energy efficiency compared to baseline scheduling policies.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123500558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
An optimal power supply and body bias voltage for a ultra low power micro-controller with silicon on thin box MOSFET 超低功耗微控制器的最佳电源和体偏置电压与薄盒硅MOSFET
Pub Date : 2015-07-22 DOI: 10.1109/ISLPED.2015.7273515
Hayate Okuhara, K. Kitamori, Yu Fujita, K. Usami, H. Amano
Body bias control is an efficient means of balancing the trade-off between leakage power and performance especially for chips with silicon on thin buried oxide (SOTB), a type of FD-SOI technology. In this work, a method for finding the optimal combination of the supply voltage and body bias voltage to the core and memory is proposed and applied to a real micro-controller chip using SOTB CMOS technology. By obtaining several coefficients of equations for leakage power, switching power and operational frequency from the real chip measurements, the optimized voltage setting can be obtained for the target operational frequency. The power consumption lost by the error of optimization is 12.6% at maximum, and it can save at most 73.1% of power from the cases where only the body bias voltage is optimized. This method can be applied to the latest FD-SOI technologies.
体偏置控制是平衡泄漏功率和性能之间的有效手段,特别是对于采用FD-SOI技术的薄埋氧化硅(SOTB)芯片。本文提出了一种寻找电源电压和体偏置电压对核心和存储器的最佳组合的方法,并将其应用于使用SOTB CMOS技术的实际微控制器芯片上。通过实际芯片测量得到泄漏功率、开关功率和工作频率方程的几个系数,可以得到目标工作频率的优化电压整定。优化误差造成的功耗损失最大为12.6%,仅对体偏置电压进行优化最多可节省73.1%的功耗。该方法可应用于最新的FD-SOI技术。
{"title":"An optimal power supply and body bias voltage for a ultra low power micro-controller with silicon on thin box MOSFET","authors":"Hayate Okuhara, K. Kitamori, Yu Fujita, K. Usami, H. Amano","doi":"10.1109/ISLPED.2015.7273515","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273515","url":null,"abstract":"Body bias control is an efficient means of balancing the trade-off between leakage power and performance especially for chips with silicon on thin buried oxide (SOTB), a type of FD-SOI technology. In this work, a method for finding the optimal combination of the supply voltage and body bias voltage to the core and memory is proposed and applied to a real micro-controller chip using SOTB CMOS technology. By obtaining several coefficients of equations for leakage power, switching power and operational frequency from the real chip measurements, the optimized voltage setting can be obtained for the target operational frequency. The power consumption lost by the error of optimization is 12.6% at maximum, and it can save at most 73.1% of power from the cases where only the body bias voltage is optimized. This method can be applied to the latest FD-SOI technologies.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"34 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123535621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
DRVS: Power-efficient reliability management through Dynamic Redundancy and Voltage Scaling under variations DRVS:通过动态冗余和电压缩放实现的高能效可靠性管理
Pub Date : 2015-07-22 DOI: 10.1109/ISLPED.2015.7273518
M. Salehi, Mohammad Khavari Tavana, Semeen Rehman, F. Kriebel, M. Shafique, A. Ejlali, J. Henkel
Many-core processors facilitate coarse-grained reliability by exploiting available cores for redundant multithreading. However, ensuring high reliability with reduced power consumption necessitates joint considerations of variations in vulnerability, performance and power properties of software as well as the underlying hardware. In this paper, we propose a power-efficient reliability management system for many-core processors. It exploits various basic redundancy techniques (like, dual and triple modular redundancy) operating in different voltage-frequency levels, each offering distinct reliability, performance and power properties. Our system performs Dynamic Redundancy and Voltage Scaling (DRVS) considering process variations in hardware, and diversities in software vulnerability and execution time properties. Experiments show that DRVS system provides significant reliability improvements while providing up to 60% reduced power consumption compared to state-of-the-art techniques.
多核处理器通过利用可用的核来实现冗余多线程,从而提高了粗粒度的可靠性。然而,要在降低功耗的同时确保高可靠性,就需要同时考虑软件和底层硬件的脆弱性、性能和功率属性的变化。在本文中,我们提出了一种低功耗的多核处理器可靠性管理系统。它利用各种基本的冗余技术(如,双和三模块冗余)在不同的电压频率水平上工作,每个提供不同的可靠性,性能和功率特性。我们的系统执行动态冗余和电压缩放(DRVS),考虑到硬件的进程变化,以及软件漏洞和执行时间属性的多样性。实验表明,与最先进的技术相比,DRVS系统提供了显著的可靠性改进,同时降低了高达60%的功耗。
{"title":"DRVS: Power-efficient reliability management through Dynamic Redundancy and Voltage Scaling under variations","authors":"M. Salehi, Mohammad Khavari Tavana, Semeen Rehman, F. Kriebel, M. Shafique, A. Ejlali, J. Henkel","doi":"10.1109/ISLPED.2015.7273518","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273518","url":null,"abstract":"Many-core processors facilitate coarse-grained reliability by exploiting available cores for redundant multithreading. However, ensuring high reliability with reduced power consumption necessitates joint considerations of variations in vulnerability, performance and power properties of software as well as the underlying hardware. In this paper, we propose a power-efficient reliability management system for many-core processors. It exploits various basic redundancy techniques (like, dual and triple modular redundancy) operating in different voltage-frequency levels, each offering distinct reliability, performance and power properties. Our system performs Dynamic Redundancy and Voltage Scaling (DRVS) considering process variations in hardware, and diversities in software vulnerability and execution time properties. Experiments show that DRVS system provides significant reliability improvements while providing up to 60% reduced power consumption compared to state-of-the-art techniques.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125863010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 46
Power management for mobile games on asymmetric multi-cores 非对称多核手机游戏的电源管理
Pub Date : 2015-07-22 DOI: 10.1109/ISLPED.2015.7273521
A. Pathania, Santiago Pagani, M. Shafique, J. Henkel
Gaming on mobile platforms is highly power hungry and rapidly drains the limited-capacity battery. In multi-threaded gaming, each thread has different processing requirements and even a single slow thread may lead to Quality of Service (QoS) violations. Further, modern mobile platforms are equipped with asymmetric multi-core processors, so that different cores exhibit diverse power and performance properties. These asymmetric cores along with different Dynamic Power Management (DPM) techniques enable a high degree of power efficiency in mobile gaming. The default Linux power manager (i.e. “Governor”) of asymmetric multi-cores performs power-wise inefficient for mobile games as it over allocates resources for processing threads by being oblivious to the QoS. The state-of-the-art Governor for mobile gaming does not account for multi-threaded gaming workloads, which are mainstream in mobile gaming. In this work, we present a power-performance characterization of multi-threaded mobile games by executing them on a real-world mobile platform with an asymmetric multi-core. This analysis is leveraged to propose a QoS-aware Governor running a lightweight online heuristic that holistically accounts for thread-to-core mapping and DPM. This solution, when integrated into the platform's Operating System (OS), provides 12% improved power efficiency on average.
手机平台上的游戏非常耗电,很快就会耗尽有限容量的电池。在多线程游戏中,每个线程都有不同的处理需求,甚至单个慢线程也可能导致服务质量(QoS)的违反。此外,现代移动平台配备了非对称多核处理器,因此不同的核心表现出不同的功率和性能特性。这些不对称的内核加上不同的动态电源管理(DPM)技术,使得手机游戏的电源效率很高。非对称多核的默认Linux电源管理器(即“总督”)在手机游戏中执行低效率的电源管理器,因为它通过忽略QoS而为处理线程分配过多的资源。用于手机游戏的最先进的总督并没有考虑到多线程游戏工作负载,这是手机游戏的主流。在这项工作中,我们通过在具有非对称多核的真实移动平台上执行多线程移动游戏来展示其功率性能特征。这个分析被用来提出一个qos感知的调控器,它运行一个轻量级的在线启发式算法,从整体上考虑线程到核的映射和DPM。当集成到平台的操作系统(OS)中时,该解决方案可平均提高12%的电源效率。
{"title":"Power management for mobile games on asymmetric multi-cores","authors":"A. Pathania, Santiago Pagani, M. Shafique, J. Henkel","doi":"10.1109/ISLPED.2015.7273521","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273521","url":null,"abstract":"Gaming on mobile platforms is highly power hungry and rapidly drains the limited-capacity battery. In multi-threaded gaming, each thread has different processing requirements and even a single slow thread may lead to Quality of Service (QoS) violations. Further, modern mobile platforms are equipped with asymmetric multi-core processors, so that different cores exhibit diverse power and performance properties. These asymmetric cores along with different Dynamic Power Management (DPM) techniques enable a high degree of power efficiency in mobile gaming. The default Linux power manager (i.e. “Governor”) of asymmetric multi-cores performs power-wise inefficient for mobile games as it over allocates resources for processing threads by being oblivious to the QoS. The state-of-the-art Governor for mobile gaming does not account for multi-threaded gaming workloads, which are mainstream in mobile gaming. In this work, we present a power-performance characterization of multi-threaded mobile games by executing them on a real-world mobile platform with an asymmetric multi-core. This analysis is leveraged to propose a QoS-aware Governor running a lightweight online heuristic that holistically accounts for thread-to-core mapping and DPM. This solution, when integrated into the platform's Operating System (OS), provides 12% improved power efficiency on average.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114414091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Reducing display power consumption for real-time video calls on mobile devices 降低移动设备上实时视频通话的显示功耗
Pub Date : 2015-07-22 DOI: 10.1109/ISLPED.2015.7273528
Mengbai Xiao, Yao Liu, Lei Guo, Songqing Chen
The display subsystem of a mobile device usually consumes 38%-68% [1] of the total battery power in video streaming. Therefore, a few schemes have been designed to reduce the display power consumption. The basic idea is to dim the backlight level while properly compensating the pixel luminance to maintain image fidelity. The luminance compensation and proper backlight level calculation are computation intensive and demand per-frame luminance information. For these reasons, existing schemes only work for video-on-demand where each frame (and thus the luminance information) is available in advance. In addition, they demand additional computing resource support. Otherwise, if the computation is conducted on the mobile device, the power consumption due to such computation can easily offset the power savings from dimming the backlight. In this work, we set to investigate power saving for real-time video calls on mobile devices. Different from video-on-demand, real-time video calls are highly delay sensitive and the frame luminance information is not known in advance. Moreover, video calls often involve multiple streaming sources from multiple (≥2) participants, making it more difficult. Because there are few background changes and the frame rate is usually small in video calls, we design a Greedy Display Power saving scheme, called LCD-GDP, which utilizes the commonly available GPU on mobile devices without demanding additional support. Our design is implemented on WebRTC, a popular real-time web browser based video call standard. Experiments show that our scheme can save up to 33% power consumption in video calls without affecting the video call quality.
在视频流中,移动设备的显示子系统通常会消耗总电池电量的38%-68%[1]。因此,设计了一些方案来降低显示功耗。其基本思想是在适当补偿像素亮度的同时调暗背光水平,以保持图像保真度。亮度补偿和适当的背光电平计算是计算量大的,需要每帧的亮度信息。由于这些原因,现有的方案只适用于视频点播,其中每帧(因此亮度信息)都是预先可用的。此外,它们还需要额外的计算资源支持。否则,如果在移动设备上进行计算,那么由于这种计算而产生的功耗很容易抵消调暗背光所节省的功耗。在这项工作中,我们开始研究移动设备上实时视频通话的节能问题。与视频点播不同,实时视频通话具有高度的延迟敏感性,帧亮度信息无法提前获知。此外,视频通话通常涉及来自多个(≥2)参与者的多个流媒体源,使其更加困难。由于视频通话的背景变化很少,帧率通常很小,我们设计了一种贪图显示节能方案,称为LCD-GDP,它利用了移动设备上常用的GPU,而不需要额外的支持。我们的设计是在webbrtc上实现的,这是一种流行的基于实时网络浏览器的视频通话标准。实验表明,该方案在不影响视频通话质量的情况下,可以节省高达33%的视频通话功耗。
{"title":"Reducing display power consumption for real-time video calls on mobile devices","authors":"Mengbai Xiao, Yao Liu, Lei Guo, Songqing Chen","doi":"10.1109/ISLPED.2015.7273528","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273528","url":null,"abstract":"The display subsystem of a mobile device usually consumes 38%-68% [1] of the total battery power in video streaming. Therefore, a few schemes have been designed to reduce the display power consumption. The basic idea is to dim the backlight level while properly compensating the pixel luminance to maintain image fidelity. The luminance compensation and proper backlight level calculation are computation intensive and demand per-frame luminance information. For these reasons, existing schemes only work for video-on-demand where each frame (and thus the luminance information) is available in advance. In addition, they demand additional computing resource support. Otherwise, if the computation is conducted on the mobile device, the power consumption due to such computation can easily offset the power savings from dimming the backlight. In this work, we set to investigate power saving for real-time video calls on mobile devices. Different from video-on-demand, real-time video calls are highly delay sensitive and the frame luminance information is not known in advance. Moreover, video calls often involve multiple streaming sources from multiple (≥2) participants, making it more difficult. Because there are few background changes and the frame rate is usually small in video calls, we design a Greedy Display Power saving scheme, called LCD-GDP, which utilizes the commonly available GPU on mobile devices without demanding additional support. Our design is implemented on WebRTC, a popular real-time web browser based video call standard. Experiments show that our scheme can save up to 33% power consumption in video calls without affecting the video call quality.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124002991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A compact low-power eDRAM-based NoC buffer 一种紧凑的低功耗edram NoC缓冲器
Pub Date : 2015-07-22 DOI: 10.1109/ISLPED.2015.7273500
Cheng Li, P. Ampadu
Whereas buffers significantly impact Network-on-Chip (NoC) performance, they also account for up to 75% and nearly 50% of NoC router area and power respectively. Traditionally, SRAM has been used as an area and power efficient implementation of the router buffer. However, motivated by the smaller size and lower-power potential of planar embedded DRAM (eDRAM), we implement the router buffer using a 3T NMOS eDRAM for improved power and area efficiency. We demonstrate that the lifetime of flits stalled in the NoC router buffer is much shorter than the retention time of currently available eDRAM. This observation allows us to make the appropriate trade-off in size and sense-amplifier complexity to meet requirements of power and performance. A low-overhead need-based refresh mechanism is further explored. With a conservative buffer design using 65nm CMOS technology, our method reduces buffer area by up to 52% and power by 43%, while maintaining performance similar to a SRAM-based buffer. In a NoC router with 128-bit channel width, we achieve 26% and 11% reduction of total router area and power respectively. We conclude that eDRAM-based buffer is a power and area efficient alternative to SRAM-based buffer for NoC router design.
虽然缓冲区对片上网络(NoC)性能有显著影响,但它们也分别占NoC路由器面积和功耗的75%和近50%。传统上,SRAM一直被用作路由器缓冲区的面积和功率效率实现。然而,由于平面嵌入式DRAM (eDRAM)具有更小的尺寸和更低的功耗潜力,我们使用3T NMOS eDRAM实现路由器缓冲器,以提高功率和面积效率。我们证明了在NoC路由器缓冲区中停滞的flits的寿命比当前可用的eDRAM的保留时间短得多。这一观察结果使我们能够在尺寸和感测放大器复杂性方面做出适当的权衡,以满足功率和性能的要求。进一步探索了一种低开销的基于需求的刷新机制。采用65nm CMOS技术的保守缓冲器设计,我们的方法将缓冲器面积减少了52%,功耗减少了43%,同时保持了与基于sram的缓冲器相似的性能。在128位信道宽度的NoC路由器中,我们实现了总路由器面积和功耗分别减少26%和11%。我们得出结论,基于edram的缓冲器是NoC路由器设计中基于sram的缓冲器的功耗和面积效率更高的替代方案。
{"title":"A compact low-power eDRAM-based NoC buffer","authors":"Cheng Li, P. Ampadu","doi":"10.1109/ISLPED.2015.7273500","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273500","url":null,"abstract":"Whereas buffers significantly impact Network-on-Chip (NoC) performance, they also account for up to 75% and nearly 50% of NoC router area and power respectively. Traditionally, SRAM has been used as an area and power efficient implementation of the router buffer. However, motivated by the smaller size and lower-power potential of planar embedded DRAM (eDRAM), we implement the router buffer using a 3T NMOS eDRAM for improved power and area efficiency. We demonstrate that the lifetime of flits stalled in the NoC router buffer is much shorter than the retention time of currently available eDRAM. This observation allows us to make the appropriate trade-off in size and sense-amplifier complexity to meet requirements of power and performance. A low-overhead need-based refresh mechanism is further explored. With a conservative buffer design using 65nm CMOS technology, our method reduces buffer area by up to 52% and power by 43%, while maintaining performance similar to a SRAM-based buffer. In a NoC router with 128-bit channel width, we achieve 26% and 11% reduction of total router area and power respectively. We conclude that eDRAM-based buffer is a power and area efficient alternative to SRAM-based buffer for NoC router design.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132357580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Bank stealing for conflict mitigation in GPGPU Register File 为缓解GPGPU寄存器文件中的冲突而窃取银行
Pub Date : 2015-07-22 DOI: 10.1109/ISLPED.2015.7273490
Naifeng Jing, Shuang Chen, Shunning Jiang, Li Jiang, Chao Li, Xiaoyao Liang
Modern General Purpose Graphic Processing Unit (GPGPU) demands a large Register File (RF), which is typically organized into multiple banks to support the massive parallelism. Although heavy banking benefits RF throughput, its associated area and energy costs with diminishing performance gains greatly limit future RF s-caling. In this paper, we propose an improved RF design with a bank stealing technique, which enables a high RF throughput with compact area. By deeply investigating the GPGPU microarchitecture, we identify the deficiency in the state-of-the-art RF designs as the bank conflict problem, while the majority of conflicts can be eliminated leveraging the fact that the highly-banked RF oftentimes experiences under-utilization. This is especially true in GPGPU where multiple ready warps are available at the scheduling stage with their operands to be wisely coordinated. Our lightweight bank stealing technique can opportunistically fill the idle banks for better operand service, and the average GPGPU performance can be improved under smaller energy budget with significant area saving, which makes it promising for sustainable RF scaling.
现代通用图形处理单元(GPGPU)需要一个大的寄存器文件(RF),它通常被组织成多个银行来支持大规模并行性。虽然繁重的银行业务有利于射频吞吐量,但其相关的面积和能源成本以及不断下降的性能收益极大地限制了未来的射频呼叫。在本文中,我们提出了一种改进的射频设计,采用银行窃取技术,使射频吞吐量高,面积小。通过深入研究GPGPU微架构,我们确定了最先进的射频设计中的不足之处,即银行冲突问题,而大多数冲突可以通过利用高度银行射频经常未充分利用的事实来消除。这在GPGPU中尤其如此,在调度阶段有多个ready warp可用,它们的操作数需要明智地协调。我们的轻量级银行窃取技术可以机会性地填补闲置银行以获得更好的操作服务,并且可以在更小的能量预算下提高GPGPU的平均性能,并显着节省面积,使其具有可持续RF扩展的前景。
{"title":"Bank stealing for conflict mitigation in GPGPU Register File","authors":"Naifeng Jing, Shuang Chen, Shunning Jiang, Li Jiang, Chao Li, Xiaoyao Liang","doi":"10.1109/ISLPED.2015.7273490","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273490","url":null,"abstract":"Modern General Purpose Graphic Processing Unit (GPGPU) demands a large Register File (RF), which is typically organized into multiple banks to support the massive parallelism. Although heavy banking benefits RF throughput, its associated area and energy costs with diminishing performance gains greatly limit future RF s-caling. In this paper, we propose an improved RF design with a bank stealing technique, which enables a high RF throughput with compact area. By deeply investigating the GPGPU microarchitecture, we identify the deficiency in the state-of-the-art RF designs as the bank conflict problem, while the majority of conflicts can be eliminated leveraging the fact that the highly-banked RF oftentimes experiences under-utilization. This is especially true in GPGPU where multiple ready warps are available at the scheduling stage with their operands to be wisely coordinated. Our lightweight bank stealing technique can opportunistically fill the idle banks for better operand service, and the average GPGPU performance can be improved under smaller energy budget with significant area saving, which makes it promising for sustainable RF scaling.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"205 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131921013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Reconfigurable three dimensional photovoltaic panel architecture for solar-powered time extension 可重构三维光伏板结构,延长太阳能供电时间
Pub Date : 2015-07-22 DOI: 10.1109/ISLPED.2015.7273526
Donghwa Shin, N. Chang, Yanzhi Wang, Massoud Pedram
Photovoltaic (PV) power generation systems are usually accompanied by battery to bridge the gap between the generation and load demand. Solar tracking is also used to enhance the power stability and increase the amount of collected energy from the Sun. However, battery and tracking devices significantly increase the system cost, and they are subject to wear and tear, which makes maintenance-free installation challenging. In this work, we conduct the design optimization of a twofold three dimensional PV panel for solar-powered systems. With the proposed three dimensional arrangement, we extend the solar-powered time of the target application that is powered only with solar power. Experimental results show that the proposed architecture and control method extend the service time of the target system by up to 23% compared to a non-reconfigurable flat panel with the same PV panel area.
光伏发电系统通常配有电池,以弥补发电与负荷需求之间的差距。太阳能跟踪也用于提高电力稳定性和增加从太阳收集的能量。然而,电池和跟踪设备大大增加了系统成本,而且它们容易磨损,这使得免维护安装成为一项挑战。在这项工作中,我们进行了一种用于太阳能供电系统的二维三维光伏板的设计优化。通过提出的三维排列,我们延长了仅用太阳能供电的目标应用的太阳能供电时间。实验结果表明,与相同光伏板面积的不可重构平板相比,所提出的结构和控制方法可将目标系统的使用时间延长23%。
{"title":"Reconfigurable three dimensional photovoltaic panel architecture for solar-powered time extension","authors":"Donghwa Shin, N. Chang, Yanzhi Wang, Massoud Pedram","doi":"10.1109/ISLPED.2015.7273526","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273526","url":null,"abstract":"Photovoltaic (PV) power generation systems are usually accompanied by battery to bridge the gap between the generation and load demand. Solar tracking is also used to enhance the power stability and increase the amount of collected energy from the Sun. However, battery and tracking devices significantly increase the system cost, and they are subject to wear and tear, which makes maintenance-free installation challenging. In this work, we conduct the design optimization of a twofold three dimensional PV panel for solar-powered systems. With the proposed three dimensional arrangement, we extend the solar-powered time of the target application that is powered only with solar power. Experimental results show that the proposed architecture and control method extend the service time of the target system by up to 23% compared to a non-reconfigurable flat panel with the same PV panel area.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131724941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1