首页 > 最新文献

2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)最新文献

英文 中文
A compact low-power eDRAM-based NoC buffer 一种紧凑的低功耗edram NoC缓冲器
Pub Date : 2015-07-22 DOI: 10.1109/ISLPED.2015.7273500
Cheng Li, P. Ampadu
Whereas buffers significantly impact Network-on-Chip (NoC) performance, they also account for up to 75% and nearly 50% of NoC router area and power respectively. Traditionally, SRAM has been used as an area and power efficient implementation of the router buffer. However, motivated by the smaller size and lower-power potential of planar embedded DRAM (eDRAM), we implement the router buffer using a 3T NMOS eDRAM for improved power and area efficiency. We demonstrate that the lifetime of flits stalled in the NoC router buffer is much shorter than the retention time of currently available eDRAM. This observation allows us to make the appropriate trade-off in size and sense-amplifier complexity to meet requirements of power and performance. A low-overhead need-based refresh mechanism is further explored. With a conservative buffer design using 65nm CMOS technology, our method reduces buffer area by up to 52% and power by 43%, while maintaining performance similar to a SRAM-based buffer. In a NoC router with 128-bit channel width, we achieve 26% and 11% reduction of total router area and power respectively. We conclude that eDRAM-based buffer is a power and area efficient alternative to SRAM-based buffer for NoC router design.
虽然缓冲区对片上网络(NoC)性能有显著影响,但它们也分别占NoC路由器面积和功耗的75%和近50%。传统上,SRAM一直被用作路由器缓冲区的面积和功率效率实现。然而,由于平面嵌入式DRAM (eDRAM)具有更小的尺寸和更低的功耗潜力,我们使用3T NMOS eDRAM实现路由器缓冲器,以提高功率和面积效率。我们证明了在NoC路由器缓冲区中停滞的flits的寿命比当前可用的eDRAM的保留时间短得多。这一观察结果使我们能够在尺寸和感测放大器复杂性方面做出适当的权衡,以满足功率和性能的要求。进一步探索了一种低开销的基于需求的刷新机制。采用65nm CMOS技术的保守缓冲器设计,我们的方法将缓冲器面积减少了52%,功耗减少了43%,同时保持了与基于sram的缓冲器相似的性能。在128位信道宽度的NoC路由器中,我们实现了总路由器面积和功耗分别减少26%和11%。我们得出结论,基于edram的缓冲器是NoC路由器设计中基于sram的缓冲器的功耗和面积效率更高的替代方案。
{"title":"A compact low-power eDRAM-based NoC buffer","authors":"Cheng Li, P. Ampadu","doi":"10.1109/ISLPED.2015.7273500","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273500","url":null,"abstract":"Whereas buffers significantly impact Network-on-Chip (NoC) performance, they also account for up to 75% and nearly 50% of NoC router area and power respectively. Traditionally, SRAM has been used as an area and power efficient implementation of the router buffer. However, motivated by the smaller size and lower-power potential of planar embedded DRAM (eDRAM), we implement the router buffer using a 3T NMOS eDRAM for improved power and area efficiency. We demonstrate that the lifetime of flits stalled in the NoC router buffer is much shorter than the retention time of currently available eDRAM. This observation allows us to make the appropriate trade-off in size and sense-amplifier complexity to meet requirements of power and performance. A low-overhead need-based refresh mechanism is further explored. With a conservative buffer design using 65nm CMOS technology, our method reduces buffer area by up to 52% and power by 43%, while maintaining performance similar to a SRAM-based buffer. In a NoC router with 128-bit channel width, we achieve 26% and 11% reduction of total router area and power respectively. We conclude that eDRAM-based buffer is a power and area efficient alternative to SRAM-based buffer for NoC router design.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132357580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
A simulation framework for rapid prototyping and evaluation of thermal mitigation techniques in many-core architectures 多核架构中热缓解技术快速原型设计和评估的仿真框架
Pub Date : 2015-07-22 DOI: 10.1109/ISLPED.2015.7273485
Tanguy Sassolas, C. Sandionigi, Alexandre Guerre, Julien Mottin, P. Vivet, H. Boussetta, N. Peltier
Modern SoCs are characterized by increasing power density and consequently increasing temperature, that directly impacts performances, reliability and cost of a device through its packaging. Thermal issues need to be predicted and mitigated as early as possible in the design flow, when the optimization opportunities are the highest. In this paper, we present an efficient framework for the design of dynamic thermal mitigation schemes based on a high-level SystemC virtual prototype tightly coupled with efficient power and thermal simulation tools. We demonstrate the benefit of our approach through silicon comparison with the SThorm 64-core architecture and provide simulation speed results making it a sound solution for the design of thermal mitigation early in the flow.
现代soc的特点是功率密度增加,因此温度升高,通过其封装直接影响器件的性能,可靠性和成本。在设计流程中,当优化机会最高时,需要尽早预测和缓解热问题。在本文中,我们提出了一个基于高级SystemC虚拟样机与高效功率和热仿真工具紧密耦合的动态热缓解方案设计的有效框架。通过与SThorm 64核架构的芯片比较,我们展示了该方法的优势,并提供了仿真速度结果,使其成为流程早期热缓解设计的可靠解决方案。
{"title":"A simulation framework for rapid prototyping and evaluation of thermal mitigation techniques in many-core architectures","authors":"Tanguy Sassolas, C. Sandionigi, Alexandre Guerre, Julien Mottin, P. Vivet, H. Boussetta, N. Peltier","doi":"10.1109/ISLPED.2015.7273485","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273485","url":null,"abstract":"Modern SoCs are characterized by increasing power density and consequently increasing temperature, that directly impacts performances, reliability and cost of a device through its packaging. Thermal issues need to be predicted and mitigated as early as possible in the design flow, when the optimization opportunities are the highest. In this paper, we present an efficient framework for the design of dynamic thermal mitigation schemes based on a high-level SystemC virtual prototype tightly coupled with efficient power and thermal simulation tools. We demonstrate the benefit of our approach through silicon comparison with the SThorm 64-core architecture and provide simulation speed results making it a sound solution for the design of thermal mitigation early in the flow.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"2009 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129446964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Dynamic power management for many-core platforms in the dark silicon era: A multi-objective control approach 暗硅时代多核平台的动态电源管理:多目标控制方法
Pub Date : 2015-07-22 DOI: 10.1109/ISLPED.2015.7273517
A. Rahmani, M. Haghbayan, A. Kanduri, Awet Yemane Weldezion, P. Liljeberg, J. Plosila, A. Jantsch, H. Tenhunen
Power management of NoC-based many-core systems with runtime application mapping becomes more challenging in the dark silicon era. It necessitates a multi-objective control approach to consider an upper limit on total power consumption, dynamic behaviour of workloads, processing elements utilization, per-core power consumption, and load on network-on-chip. In this paper, we propose a multi-objective dynamic power management method that simultaneously considers all of these parameters. Fine-grained voltage and frequency scaling, including near-threshold operation, and per-core power gating are utilized to optimize the performance. In addition, a disturbance rejecter is designed that proactively scales down activity in running applications when a new application commences execution, to prevent sharp power budget violations. Simulations of dynamic workloads and mixed time-critical application profiles show that our method is effective in honoring the power budget while considerably boosting the system throughput and reducing power budget violation, compared to the state-of-the-art power management policies.
在暗硅时代,基于运行时应用程序映射的多核系统的电源管理变得更具挑战性。它需要一种多目标控制方法来考虑总功耗的上限、工作负载的动态行为、处理元素的利用率、每核功耗和片上网络负载。在本文中,我们提出了一种同时考虑所有这些参数的多目标动态电源管理方法。细粒度的电压和频率缩放(包括近阈值操作)和每核功率门控被用于优化性能。此外,还设计了干扰抑制器,当新应用程序开始执行时,该干扰抑制器可以主动缩小正在运行的应用程序中的活动,以防止严重的功率预算违规。动态工作负载和混合时间关键型应用程序配置文件的仿真表明,与最先进的电源管理策略相比,我们的方法在遵守功率预算的同时有效地提高了系统吞吐量并减少了功率预算违规。
{"title":"Dynamic power management for many-core platforms in the dark silicon era: A multi-objective control approach","authors":"A. Rahmani, M. Haghbayan, A. Kanduri, Awet Yemane Weldezion, P. Liljeberg, J. Plosila, A. Jantsch, H. Tenhunen","doi":"10.1109/ISLPED.2015.7273517","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273517","url":null,"abstract":"Power management of NoC-based many-core systems with runtime application mapping becomes more challenging in the dark silicon era. It necessitates a multi-objective control approach to consider an upper limit on total power consumption, dynamic behaviour of workloads, processing elements utilization, per-core power consumption, and load on network-on-chip. In this paper, we propose a multi-objective dynamic power management method that simultaneously considers all of these parameters. Fine-grained voltage and frequency scaling, including near-threshold operation, and per-core power gating are utilized to optimize the performance. In addition, a disturbance rejecter is designed that proactively scales down activity in running applications when a new application commences execution, to prevent sharp power budget violations. Simulations of dynamic workloads and mixed time-critical application profiles show that our method is effective in honoring the power budget while considerably boosting the system throughput and reducing power budget violation, compared to the state-of-the-art power management policies.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129344104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
Adaptive sprinting: How to get the most out of Phase Change based passive cooling 自适应冲刺:如何充分利用基于相变的被动冷却
Pub Date : 2015-07-22 DOI: 10.1109/ISLPED.2015.7273487
Fulya Kaplan, A. Coskun
CMOS scaling trends lead to elevated on-chip temperatures, which substantially limit the performance of today's processors. To improve thermal efficiency, Phase Change Materials (PCMs) have recently been used as passive cooling solutions. PCMs store large amount of heat at near-constant temperature during phase change, allowing strategies such as computational sprinting. While existing sprinting methods allow short performance boosts, there is significant unexplored potential in improving performance on systems with PCM-enhanced cooling. To this end, this paper proposes a novel runtime management policy driven by observations that are not captured by prior techniques: (i) PCM melts non-uniformly due to spatially heterogeneous on-chip heat distribution; (ii) power consumption during sprinting is highly application dependent and assuming a fixed sprinting power leads to lower thermal efficiency; (iii) if we monitor the remaining PCM energy at various locations, we can utilize the PCM heat storage capability much more efficiently. The proposed Adaptive Sprinting policy exploits these observations to extend sprinting duration for increased performance gains. Our policy monitors the remaining PCM energy corresponding to each core at runtime, and using this information, it decides on the number, the location and the voltage-frequency (V/f) setting of the sprinting cores. Experimental evaluation including a detailed phase change thermal model demonstrates 29% performance improvement, 22% energy savings, and 43% energy delay product (EDP) reduction on average, compared to prior strategies.
CMOS缩放趋势导致芯片上温度升高,这大大限制了当今处理器的性能。为了提高热效率,相变材料(PCMs)最近被用作被动冷却解决方案。pcm在相变过程中以近乎恒定的温度储存大量热量,从而实现计算冲刺等策略。虽然现有的冲刺方法可以在短时间内提高性能,但在提高pcm增强冷却系统的性能方面,仍有很大的潜力有待开发。为此,本文提出了一种新的运行时管理策略,该策略由先前技术无法捕获的观测数据驱动:(i)由于片上热量分布的空间异质性,PCM熔化不均匀;(ii)冲刺过程中的功耗高度依赖于应用,假设固定的冲刺功率会导致热效率降低;(iii)如果我们在不同地点监测剩余的PCM能量,我们可以更有效地利用PCM的储热能力。提出的自适应冲刺策略利用这些观察结果来延长冲刺时间,以提高性能。我们的策略在运行时监控每个内核对应的剩余PCM能量,并使用这些信息来决定冲刺内核的数量、位置和电压频率(V/f)设置。包括详细相变热模型在内的实验评估表明,与之前的策略相比,该策略的性能平均提高了29%,节能22%,能量延迟积(EDP)平均降低了43%。
{"title":"Adaptive sprinting: How to get the most out of Phase Change based passive cooling","authors":"Fulya Kaplan, A. Coskun","doi":"10.1109/ISLPED.2015.7273487","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273487","url":null,"abstract":"CMOS scaling trends lead to elevated on-chip temperatures, which substantially limit the performance of today's processors. To improve thermal efficiency, Phase Change Materials (PCMs) have recently been used as passive cooling solutions. PCMs store large amount of heat at near-constant temperature during phase change, allowing strategies such as computational sprinting. While existing sprinting methods allow short performance boosts, there is significant unexplored potential in improving performance on systems with PCM-enhanced cooling. To this end, this paper proposes a novel runtime management policy driven by observations that are not captured by prior techniques: (i) PCM melts non-uniformly due to spatially heterogeneous on-chip heat distribution; (ii) power consumption during sprinting is highly application dependent and assuming a fixed sprinting power leads to lower thermal efficiency; (iii) if we monitor the remaining PCM energy at various locations, we can utilize the PCM heat storage capability much more efficiently. The proposed Adaptive Sprinting policy exploits these observations to extend sprinting duration for increased performance gains. Our policy monitors the remaining PCM energy corresponding to each core at runtime, and using this information, it decides on the number, the location and the voltage-frequency (V/f) setting of the sprinting cores. Experimental evaluation including a detailed phase change thermal model demonstrates 29% performance improvement, 22% energy savings, and 43% energy delay product (EDP) reduction on average, compared to prior strategies.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116628223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Design and optimization of a reconfigurable power delivery network for large-area, DVS-enabled OLED displays 设计和优化用于大面积、支持dvs的OLED显示器的可重构电源传输网络
Pub Date : 2015-07-22 DOI: 10.1109/ISLPED.2015.7273507
Woojoo Lee, Yanzhi Wang, Donghwa Shin, Shahin Nazarian, Massoud Pedram
Dynamic voltage scaling (DVS) has proven effective in minimizing the power consumption of OLED displays, resulting only in minimal image distortion. This technique has been extended to perform zone-specific DVS by dividing the panel area into zones and applying independent DVS to each zone based on the displayed content. The application of the latter technique to large-area OLED displays has not been done in part due to a high overhead of its dedicated DC-DC converter for each zone and low conversion efficiency when the load current of each converter lies outside the desirable range. To address this issue, this work proposes a reconfigurable power delivery network architecture, comprised of a small number of DC-DC converters, a switch network and an online controller, to realize fine-grained (zone-specific) DVS in large-area OLED display panels. The proposed framework consistently achieves high power conversion efficiency and significant energy saving while preserving the image quality. Experimental results demonstrate that up to 36% power savings can be achieved in a 65" 4K Ultra high-definition OLED display by using the proposed framework.
动态电压缩放(DVS)已被证明在最大限度地降低OLED显示器的功耗方面是有效的,仅导致最小的图像失真。通过将面板区域划分为多个分区,并根据显示的内容为每个分区应用独立的分布式交换机,该技术已扩展到执行特定于分区的分布式交换机。后一种技术在大面积OLED显示器上的应用尚未完成,部分原因是其每个区域的专用DC-DC转换器的开销很高,并且当每个转换器的负载电流超出理想范围时转换效率较低。为了解决这一问题,本研究提出了一种可重构的电力输送网络架构,该架构由少量DC-DC转换器、一个交换网络和一个在线控制器组成,以实现大面积OLED显示面板上的细粒度(特定区域)分布式交换机。该框架在保证图像质量的同时,实现了高功率转换效率和显著节能。实验结果表明,采用该框架可在65英寸4K超高清OLED显示器上节省高达36%的功耗。
{"title":"Design and optimization of a reconfigurable power delivery network for large-area, DVS-enabled OLED displays","authors":"Woojoo Lee, Yanzhi Wang, Donghwa Shin, Shahin Nazarian, Massoud Pedram","doi":"10.1109/ISLPED.2015.7273507","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273507","url":null,"abstract":"Dynamic voltage scaling (DVS) has proven effective in minimizing the power consumption of OLED displays, resulting only in minimal image distortion. This technique has been extended to perform zone-specific DVS by dividing the panel area into zones and applying independent DVS to each zone based on the displayed content. The application of the latter technique to large-area OLED displays has not been done in part due to a high overhead of its dedicated DC-DC converter for each zone and low conversion efficiency when the load current of each converter lies outside the desirable range. To address this issue, this work proposes a reconfigurable power delivery network architecture, comprised of a small number of DC-DC converters, a switch network and an online controller, to realize fine-grained (zone-specific) DVS in large-area OLED display panels. The proposed framework consistently achieves high power conversion efficiency and significant energy saving while preserving the image quality. Experimental results demonstrate that up to 36% power savings can be achieved in a 65\" 4K Ultra high-definition OLED display by using the proposed framework.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122739938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Reference-circuit analysis for high-bandwidth spin transfer torque random access memory 高带宽自旋传递转矩随机存取存储器参考电路分析
Pub Date : 2015-07-22 DOI: 10.1109/ISLPED.2015.7273541
Byungkyu Song, T. Na, Seong-ook Jung, Jung Pill Kim, Seung H. Kang
A global reference-circuit (RC), which means one RC is shared with many sensing circuits (SC), is being considered for high-bandwidth STT-RAMs because of the low power consumption and small area characteristic. However, using the global RC for high-bandwidth STT-RAMs causes a droop effect and coupling noise effect, leading to the significant performance degradation. Thus, the validity of using the global RC should be identified. In this paper, the local RC and various global RCs are introduced, and compared in aspects of area, sensing time, and power consumption. By classification of the merits and demerits of various RCs, we present the following requirements of proper RC for high-bandwidth STT-RAMs: 1) small area, 2) no performance degradation, 3) low power consumption, and 4) process variation tolerant reference signal generation.
由于低功耗和小面积特性,高带宽stt - ram正在考虑采用全局参考电路(RC),即一个RC与多个传感电路(SC)共享。然而,在高带宽stt - ram中使用全局RC会产生下垂效应和耦合噪声效应,导致性能显著下降。因此,应该确定使用全局RC的有效性。本文介绍了局部RC和各种全局RC,并在面积、传感时间和功耗等方面进行了比较。通过对各种RC的优缺点进行分类,我们提出了用于高带宽stt - ram的适当RC的以下要求:1)小面积,2)无性能下降,3)低功耗,以及4)过程变化容忍参考信号生成。
{"title":"Reference-circuit analysis for high-bandwidth spin transfer torque random access memory","authors":"Byungkyu Song, T. Na, Seong-ook Jung, Jung Pill Kim, Seung H. Kang","doi":"10.1109/ISLPED.2015.7273541","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273541","url":null,"abstract":"A global reference-circuit (RC), which means one RC is shared with many sensing circuits (SC), is being considered for high-bandwidth STT-RAMs because of the low power consumption and small area characteristic. However, using the global RC for high-bandwidth STT-RAMs causes a droop effect and coupling noise effect, leading to the significant performance degradation. Thus, the validity of using the global RC should be identified. In this paper, the local RC and various global RCs are introduced, and compared in aspects of area, sensing time, and power consumption. By classification of the merits and demerits of various RCs, we present the following requirements of proper RC for high-bandwidth STT-RAMs: 1) small area, 2) no performance degradation, 3) low power consumption, and 4) process variation tolerant reference signal generation.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122042094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Hardware-software interaction for run-time power optimization: A case study of embedded Linux on multicore smartphones 运行时功耗优化的软硬件交互:多核智能手机上嵌入式Linux的案例研究
Pub Date : 2015-07-22 DOI: 10.1109/ISLPED.2015.7273508
Anup Das, M. J. Walker, Andreas Hansson, B. Al-Hashimi, G. Merrett
Applications running on smartphones interact with the hardware and the system software differently, resulting in widely varying power consumption and hence thermal profiles. Typically, these smartphone platforms expose some hardware power control features to users, controlled through software governors such as cpufreq for dynamic voltage-frequency scaling (DVFS) and cpuquiet for dynamic core selection (DCS). Operating systems on these platforms manage these governors conservatively, independent of application's performance requirement. To address this, we propose an alternative approach, which uses reinforcement learning to explore the trade-off between power saving opportunities using DVFS and DCS and application's performance at run-time. The objective is to reduce power consumption, taking into consideration dynamic power, leakage power, and the inter-dependency between temperature and power. The reinforcement learning-based control is validated as a case-study on ARM A15-based nvidia's tegra smartphone through its implementation as a run-time manager (RTM). This RTM interfaces with different hardware performance counters and the embedded Linux Operating System through (1) the cpuquiet API to select cores at run-time; and (2) the cpufreq API to scale the frequency of active cores. Experiments with mobile and high performance applications demonstrate that the proposed approach achieves an average 22% (7-40%) power reduction compared to existing techniques.
在智能手机上运行的应用程序与硬件和系统软件的交互方式不同,导致功耗和热分布差异很大。通常,这些智能手机平台向用户公开了一些硬件电源控制功能,这些功能通过软件调控器进行控制,例如用于动态电压频率缩放(DVFS)的cpufreq和用于动态核心选择(DCS)的cpuquiet。这些平台上的操作系统保守地管理这些调控器,独立于应用程序的性能需求。为了解决这个问题,我们提出了一种替代方法,该方法使用强化学习来探索使用DVFS和DCS的节能机会与运行时应用程序性能之间的权衡。目标是降低功耗,同时考虑到动态功率、泄漏功率以及温度和功率之间的相互依赖性。以基于ARM a15的英伟达tegra智能手机为例,通过其作为运行时管理器(RTM)的实现,验证了基于强化学习的控制。该RTM通过(1)cpuquiet API与不同的硬件性能计数器和嵌入式Linux操作系统接口,在运行时选择内核;(2) cpufreq API,用于缩放活动内核的频率。移动和高性能应用的实验表明,与现有技术相比,所提出的方法平均可降低22%(7-40%)的功耗。
{"title":"Hardware-software interaction for run-time power optimization: A case study of embedded Linux on multicore smartphones","authors":"Anup Das, M. J. Walker, Andreas Hansson, B. Al-Hashimi, G. Merrett","doi":"10.1109/ISLPED.2015.7273508","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273508","url":null,"abstract":"Applications running on smartphones interact with the hardware and the system software differently, resulting in widely varying power consumption and hence thermal profiles. Typically, these smartphone platforms expose some hardware power control features to users, controlled through software governors such as cpufreq for dynamic voltage-frequency scaling (DVFS) and cpuquiet for dynamic core selection (DCS). Operating systems on these platforms manage these governors conservatively, independent of application's performance requirement. To address this, we propose an alternative approach, which uses reinforcement learning to explore the trade-off between power saving opportunities using DVFS and DCS and application's performance at run-time. The objective is to reduce power consumption, taking into consideration dynamic power, leakage power, and the inter-dependency between temperature and power. The reinforcement learning-based control is validated as a case-study on ARM A15-based nvidia's tegra smartphone through its implementation as a run-time manager (RTM). This RTM interfaces with different hardware performance counters and the embedded Linux Operating System through (1) the cpuquiet API to select cores at run-time; and (2) the cpufreq API to scale the frequency of active cores. Experiments with mobile and high performance applications demonstrate that the proposed approach achieves an average 22% (7-40%) power reduction compared to existing techniques.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131951857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
High-efficiency crossbar switches using capacitively coupled signaling 使用电容耦合信号的高效交叉开关
Pub Date : 2015-07-22 DOI: 10.1109/ISLPED.2015.7273497
Cagla Cakir, R. Ho, J. Lexau, K. Mai
As process technologies have scaled, the increasing number of processor cores and memories on a single die has also driven the need for more complex on-chip interconnection networks. Crossbar switches are primary building blocks in such networks-on-chip, as they can be used as fast single-stage networks or as the core of the router switch in multi-stage networks. While crossbars offer non-blocking, single-hop, all-to-all communication, they tend to scale poorly with the number of nodes due to the latency and energy of the long wires and high-radix multiplexor structures needed. To combat these limitations, we propose a low-swing crossbar design that uses capacitively driven wires and capacitively coupled multiplexers. Capacitively driven wires offer low swing signaling, higher bandwidths, and low energy consumption, while capacitively coupled multiplexers offer reduced parasitic loading from the inactive inputs. We present a 16×16 64b low-swing crossbar switch designed in a TSMC 40nm CMOS bulk process. Post-layout simulation shows it operating at a maximum frequency of 2.2GHz, achieving a bandwidth of 2.56Tb/s at 0.9V (nominal Vdd) with an area of 0.94mm2. Total energy consumption for full, half, and minimum bandwidths are 110pJ, 84pJ, and 64pJ respectively, thus offering an efficiency of 10.49 Tbps/W, a 3X improvement over previously published results.
随着工艺技术的发展,单个芯片上的处理器内核和存储器数量的增加也推动了对更复杂的片上互连网络的需求。Crossbar交换机是这种片上网络的主要组成部分,因为它们可以用作快速的单级网络,也可以用作多级网络中路由器交换机的核心。虽然crossbar提供非阻塞、单跳、全对全的通信,但由于所需的长线路和高基数多路复用结构的延迟和能量,它们往往无法随节点数量而扩展。为了克服这些限制,我们提出了一种使用电容驱动导线和电容耦合多路复用器的低摆幅横杆设计。电容驱动线提供低摆幅信号,更高的带宽和低能耗,而电容耦合多路复用器提供减少来自非活动输入的寄生负载。我们提出了一种16×16 64b低摆幅横杆开关,采用台积电40nm CMOS批量工艺设计。布局后仿真结果表明,其最大工作频率为2.2GHz,在0.9V(标称Vdd)下实现了2.56Tb/s的带宽,面积为0.94mm2。全带宽、半带宽和最小带宽的总能耗分别为110pJ、84pJ和64pJ,因此提供10.49 Tbps/W的效率,比以前发表的结果提高了3倍。
{"title":"High-efficiency crossbar switches using capacitively coupled signaling","authors":"Cagla Cakir, R. Ho, J. Lexau, K. Mai","doi":"10.1109/ISLPED.2015.7273497","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273497","url":null,"abstract":"As process technologies have scaled, the increasing number of processor cores and memories on a single die has also driven the need for more complex on-chip interconnection networks. Crossbar switches are primary building blocks in such networks-on-chip, as they can be used as fast single-stage networks or as the core of the router switch in multi-stage networks. While crossbars offer non-blocking, single-hop, all-to-all communication, they tend to scale poorly with the number of nodes due to the latency and energy of the long wires and high-radix multiplexor structures needed. To combat these limitations, we propose a low-swing crossbar design that uses capacitively driven wires and capacitively coupled multiplexers. Capacitively driven wires offer low swing signaling, higher bandwidths, and low energy consumption, while capacitively coupled multiplexers offer reduced parasitic loading from the inactive inputs. We present a 16×16 64b low-swing crossbar switch designed in a TSMC 40nm CMOS bulk process. Post-layout simulation shows it operating at a maximum frequency of 2.2GHz, achieving a bandwidth of 2.56Tb/s at 0.9V (nominal Vdd) with an area of 0.94mm2. Total energy consumption for full, half, and minimum bandwidths are 110pJ, 84pJ, and 64pJ respectively, thus offering an efficiency of 10.49 Tbps/W, a 3X improvement over previously published results.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"78 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133605077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Reducing dynamic energy of set-associative L1 instruction cache by early tag lookup 通过早期标签查找减少集合关联L1指令缓存的动态能量
Pub Date : 2015-07-22 DOI: 10.1109/ISLPED.2015.7273489
Wei Zhang, Hang Zhang, J. Lach
To minimize the access latency of set-associative caches, the data in all ways are read out in parallel with the tag lookup. However, this is energy inefficient, as only the data from the matching way is used and the others are discarded. This paper proposes an early tag lookup (ETL) technique for L1 instruction caches that determines the matching way one cycle earlier than the cache access, so that only the matching data way need be accessed. ETL incurs no performance penalty and insignificant hardware overhead. Evaluation on a 4-way set-associative L1 instruction cache in 45nm technology shows that ETL reduces the read energy by 68% on average.
为了最小化集合关联缓存的访问延迟,所有方式的数据都是与标签查找并行读取的。然而,这是能源效率低下的,因为只有来自匹配方式的数据被使用,而其他数据被丢弃。本文提出了一种L1指令缓存的早期标签查找(ETL)技术,该技术比缓存访问早一个周期确定匹配方式,从而只需要访问匹配的数据方式。ETL不会导致性能损失和微不足道的硬件开销。对45nm技术的4路集关联L1指令缓存的评估表明,ETL平均减少了68%的读取能量。
{"title":"Reducing dynamic energy of set-associative L1 instruction cache by early tag lookup","authors":"Wei Zhang, Hang Zhang, J. Lach","doi":"10.1109/ISLPED.2015.7273489","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273489","url":null,"abstract":"To minimize the access latency of set-associative caches, the data in all ways are read out in parallel with the tag lookup. However, this is energy inefficient, as only the data from the matching way is used and the others are discarded. This paper proposes an early tag lookup (ETL) technique for L1 instruction caches that determines the matching way one cycle earlier than the cache access, so that only the matching data way need be accessed. ETL incurs no performance penalty and insignificant hardware overhead. Evaluation on a 4-way set-associative L1 instruction cache in 45nm technology shows that ETL reduces the read energy by 68% on average.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116314135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A micropower energy harvesting circuit with piezoelectric transformer-based ultra-low voltage start-up 一种基于压电变压器的超低电压启动微功率能量收集电路
Pub Date : 2015-07-22 DOI: 10.1109/ISLPED.2015.7273527
A. Romani, Antonio Camarda, A. Baldazzi, M. Tartagni
This paper introduces the use of piezoelectric transformers (PTs) as key elements for ultra-low voltage start-up circuits for battery-less energy harvesting applications. Firstly, a step-up oscillator topology based on a PT and a JFET is presented. The circuit is able to start from voltages as low as 16 mV, and to boost the output voltage up to 1.32 V in a no load condition. In order to validate the proposed approach, a surrounding power management and conversion circuit is developed. This circuit is able to automatically enable a boost DC/DC converter once the start-up circuit has generated a sufficient voltage. The whole circuit self-starts with an input voltage of 30 mV, and the maximum conversion efficiency referred to the maximum power point (MPP) is higher than 40%, with an intrinsic current consumption as low as 1.3 μA.
本文介绍了压电变压器(PTs)作为无电池能量收集应用的超低电压启动电路的关键元件。首先提出了一种基于PT和JFET的升压振荡器拓扑结构。该电路能够从低至16 mV的电压启动,并在空载条件下将输出电压提升至1.32 V。为了验证所提出的方法,开发了一个周边的电源管理和转换电路。一旦启动电路产生足够的电压,该电路能够自动启用升压DC/DC转换器。整个电路在输入电压为30 mV时自动启动,以最大功率点(MPP)为基准的最大转换效率高于40%,固有电流消耗低至1.3 μA。
{"title":"A micropower energy harvesting circuit with piezoelectric transformer-based ultra-low voltage start-up","authors":"A. Romani, Antonio Camarda, A. Baldazzi, M. Tartagni","doi":"10.1109/ISLPED.2015.7273527","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273527","url":null,"abstract":"This paper introduces the use of piezoelectric transformers (PTs) as key elements for ultra-low voltage start-up circuits for battery-less energy harvesting applications. Firstly, a step-up oscillator topology based on a PT and a JFET is presented. The circuit is able to start from voltages as low as 16 mV, and to boost the output voltage up to 1.32 V in a no load condition. In order to validate the proposed approach, a surrounding power management and conversion circuit is developed. This circuit is able to automatically enable a boost DC/DC converter once the start-up circuit has generated a sufficient voltage. The whole circuit self-starts with an input voltage of 30 mV, and the maximum conversion efficiency referred to the maximum power point (MPP) is higher than 40%, with an intrinsic current consumption as low as 1.3 μA.","PeriodicalId":421236,"journal":{"name":"2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117304305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
期刊
2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1