首页 > 最新文献

2016 4th International Workshop on Energy Efficient Supercomputing (E2SC)最新文献

英文 中文
Energy Aware Scheduling Study on BlueWonder BlueWonder的能源意识调度研究
Pub Date : 2016-11-13 DOI: 10.1109/E2SC.2016.14
V. Elisseev, John Baker, Neil Morgan, L. Brochard, W. T. Hewitt
Power consumption of the world's leading supercomputers is of the order of tens of MegaWatts (MW). Therefore, energy efficiency and power management of High Performance Computing (HPC) systems are among the main goals of the HPC community. This paper presents our study of managing energy consumption of supercomputers with the use of the energy aware workload management software IBM Platform Load Sharing Facility (LSF). We analyze energy consumption and workloads of the IBM NextScale Cluster, BlueWonder, located at the Daresbury Laboratory, STFC, UK. We describe power management algorithms implemented as Energy Aware Scheduling (EAS) policies in the IBM Platform LSF software. We show the effect of the power management policies on supercomputer efficiency and power consumption using experimental as well as simulated data from scientific workloads on the BlueWonder supercomputer. We observed energy saving of up to 12% from EAS policies.
世界领先的超级计算机的功耗在几十兆瓦(MW)的数量级。因此,高性能计算(HPC)系统的能源效率和电源管理是HPC社区的主要目标之一。本文介绍了我们使用能源感知工作负载管理软件IBM平台负载共享设施(LSF)管理超级计算机能耗的研究。我们分析了位于英国STFC达斯伯里实验室的IBM NextScale集群BlueWonder的能耗和工作负载。我们将电源管理算法描述为IBM Platform LSF软件中的能源感知调度(Energy Aware Scheduling, EAS)策略。我们使用来自BlueWonder超级计算机上的科学工作负载的实验和模拟数据,展示了电源管理策略对超级计算机效率和功耗的影响。我们观察到通过EAS政策可以节省高达12%的能源。
{"title":"Energy Aware Scheduling Study on BlueWonder","authors":"V. Elisseev, John Baker, Neil Morgan, L. Brochard, W. T. Hewitt","doi":"10.1109/E2SC.2016.14","DOIUrl":"https://doi.org/10.1109/E2SC.2016.14","url":null,"abstract":"Power consumption of the world's leading supercomputers is of the order of tens of MegaWatts (MW). Therefore, energy efficiency and power management of High Performance Computing (HPC) systems are among the main goals of the HPC community. This paper presents our study of managing energy consumption of supercomputers with the use of the energy aware workload management software IBM Platform Load Sharing Facility (LSF). We analyze energy consumption and workloads of the IBM NextScale Cluster, BlueWonder, located at the Daresbury Laboratory, STFC, UK. We describe power management algorithms implemented as Energy Aware Scheduling (EAS) policies in the IBM Platform LSF software. We show the effect of the power management policies on supercomputer efficiency and power consumption using experimental as well as simulated data from scientific workloads on the BlueWonder supercomputer. We observed energy saving of up to 12% from EAS policies.","PeriodicalId":424743,"journal":{"name":"2016 4th International Workshop on Energy Efficient Supercomputing (E2SC)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131819820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Preliminary Investigation of Mobile System Features Potentially Relevant to HPC 可能与高性能计算相关的移动系统特性的初步调查
Pub Date : 2016-11-13 DOI: 10.1109/E2SC.2016.13
David D. Pruitt, E. Freudenthal
Energy consumption's increasing importance in scientific computing has driven an interest in developing energy efficient high performance systems. Energy constraints of mobile computing has motivated the design and evolution of low-power computing systems capable of supporting a variety of compute-intensive user interfaces and applications. Others have observed the evolution of mobile devices to also provide high performance [14]. Their work has primarily examined the performance and efficiency of compute-intensive scientific programs executed either on mobile systems or hybrids of mobile CPUs grafted into non-mobile (sometimes HPC) systems [6, 12, 14].This report describes an investigation of performance and energy consumption of a single scientific code on five high performance and mobile systems with the objective of identifying the performance and energy efficiency implications of a variety of architectural features. The results of this pilot study suggest that ISA is less significant than other specific aspects of system architecture in achieving high performance at high efficiency. The strategy employed in this study may be extended to other scientific applications with a variety of memory access, computation, and communication properties.
能源消耗在科学计算中的重要性日益增加,这促使人们对开发高效节能的高性能系统产生了兴趣。移动计算的能量限制推动了低功耗计算系统的设计和发展,这些系统能够支持各种计算密集型用户界面和应用程序。其他人已经观察到移动设备的发展也提供了高性能[14]。他们的工作主要是研究在移动系统上执行的计算密集型科学程序的性能和效率,或者将移动cpu嫁接到非移动(有时是HPC)系统中的混合系统[6,12,14]。本报告描述了对五个高性能和移动系统的单一科学规范的性能和能耗的调查,目的是确定各种建筑特征的性能和能效影响。这个试点研究的结果表明,在实现高效率的高性能方面,ISA不如系统架构的其他特定方面重要。本研究中采用的策略可以扩展到具有各种存储访问,计算和通信特性的其他科学应用中。
{"title":"Preliminary Investigation of Mobile System Features Potentially Relevant to HPC","authors":"David D. Pruitt, E. Freudenthal","doi":"10.1109/E2SC.2016.13","DOIUrl":"https://doi.org/10.1109/E2SC.2016.13","url":null,"abstract":"Energy consumption's increasing importance in scientific computing has driven an interest in developing energy efficient high performance systems. Energy constraints of mobile computing has motivated the design and evolution of low-power computing systems capable of supporting a variety of compute-intensive user interfaces and applications. Others have observed the evolution of mobile devices to also provide high performance [14]. Their work has primarily examined the performance and efficiency of compute-intensive scientific programs executed either on mobile systems or hybrids of mobile CPUs grafted into non-mobile (sometimes HPC) systems [6, 12, 14].This report describes an investigation of performance and energy consumption of a single scientific code on five high performance and mobile systems with the objective of identifying the performance and energy efficiency implications of a variety of architectural features. The results of this pilot study suggest that ISA is less significant than other specific aspects of system architecture in achieving high performance at high efficiency. The strategy employed in this study may be extended to other scientific applications with a variety of memory access, computation, and communication properties.","PeriodicalId":424743,"journal":{"name":"2016 4th International Workshop on Energy Efficient Supercomputing (E2SC)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115081792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Neural Network-Based Task Scheduling with Preemptive Fan Control 基于神经网络的优先风扇控制任务调度
Pub Date : 2016-11-13 DOI: 10.1109/E2SC.2016.6
Bilge Acun, Eun Kyung Lee, Yoonho Park, L. Kalé
As cooling cost is a significant portion of the total operating cost of supercomputers, improving the efficiency of the cooling mechanisms can significantly reduce the cost. Two sources of cooling inefficiency in existing computing systems are discussed in this paper: temperature variations, and reactive fan speed control. To address these problems, we propose a learning-based approach using a neural network model to accurately predict core temperatures, a preemptive fan control mechanism, and a thermal-aware load balancing algorithm that uses the temperature prediction model. We demonstrate that temperature variations among cores can be reduced from 9°C to 2°C, and that peak fan power can be reduced by 61%. These savings are realized with minimal performance degradation.
由于冷却成本是超级计算机总运行成本的重要组成部分,因此提高冷却机制的效率可以显著降低成本。本文讨论了现有计算系统中冷却效率低下的两个来源:温度变化和反应式风扇转速控制。为了解决这些问题,我们提出了一种基于学习的方法,使用神经网络模型来准确预测核心温度,一种先发制人的风扇控制机制,以及一种使用温度预测模型的热感知负载平衡算法。我们证明了内核之间的温度变化可以从9°C减少到2°C,并且风扇的峰值功率可以降低61%。这些节省以最小的性能下降实现。
{"title":"Neural Network-Based Task Scheduling with Preemptive Fan Control","authors":"Bilge Acun, Eun Kyung Lee, Yoonho Park, L. Kalé","doi":"10.1109/E2SC.2016.6","DOIUrl":"https://doi.org/10.1109/E2SC.2016.6","url":null,"abstract":"As cooling cost is a significant portion of the total operating cost of supercomputers, improving the efficiency of the cooling mechanisms can significantly reduce the cost. Two sources of cooling inefficiency in existing computing systems are discussed in this paper: temperature variations, and reactive fan speed control. To address these problems, we propose a learning-based approach using a neural network model to accurately predict core temperatures, a preemptive fan control mechanism, and a thermal-aware load balancing algorithm that uses the temperature prediction model. We demonstrate that temperature variations among cores can be reduced from 9°C to 2°C, and that peak fan power can be reduced by 61%. These savings are realized with minimal performance degradation.","PeriodicalId":424743,"journal":{"name":"2016 4th International Workshop on Energy Efficient Supercomputing (E2SC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116671187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A Unified Platform for Exploring Power Management Strategies 探索电源管理策略的统一平台
Pub Date : 2016-11-13 DOI: 10.1109/E2SC.2016.10
D. Ellsworth, Tapasya Patki, M. Schulz, B. Rountree, A. Malony
Power is quickly becoming a first class resource management concern in HPC. Upcoming HPC systems will likely be hardware over-provisioned, which will require enhanced power management subsystems to prevent service interruption. To advance the state of the art in HPC power management research, we are implementing SLURM plugins to explore a range of power-aware scheduling strategies. Our goal is to develop a coherent platform that allows for a direct comparison of various power-aware approaches on research as well as production clusters.
在高性能计算中,功率正迅速成为首要的资源管理问题。即将到来的高性能计算系统可能会出现硬件过度配置,这将需要增强电源管理子系统来防止服务中断。为了推进高性能计算电源管理研究的最新水平,我们正在实施SLURM插件来探索一系列功率感知调度策略。我们的目标是开发一个连贯的平台,允许对研究和生产集群的各种功率感知方法进行直接比较。
{"title":"A Unified Platform for Exploring Power Management Strategies","authors":"D. Ellsworth, Tapasya Patki, M. Schulz, B. Rountree, A. Malony","doi":"10.1109/E2SC.2016.10","DOIUrl":"https://doi.org/10.1109/E2SC.2016.10","url":null,"abstract":"Power is quickly becoming a first class resource management concern in HPC. Upcoming HPC systems will likely be hardware over-provisioned, which will require enhanced power management subsystems to prevent service interruption. To advance the state of the art in HPC power management research, we are implementing SLURM plugins to explore a range of power-aware scheduling strategies. Our goal is to develop a coherent platform that allows for a direct comparison of various power-aware approaches on research as well as production clusters.","PeriodicalId":424743,"journal":{"name":"2016 4th International Workshop on Energy Efficient Supercomputing (E2SC)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122459650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Economic Viability of Hardware Overprovisioning in Power-Constrained High Performance Computing 在功率受限的高性能计算中硬件过度配置的经济可行性
Pub Date : 2016-11-13 DOI: 10.1109/E2SC.2016.12
Tapasya Patki, D. Lowenthal, B. Rountree, M. Schulz, B. Supinski
Recent research has established that hardware overprovisioning can improve system power utilization as well as job throughput in power-constrained, high-performance computing environments significantly. These benefits, however, may be associated with an additional infrastructure cost, making hardware overprovisioned systems less viable economically. It is thus important to conduct a detailed cost-benefit analysis before investing in such systems at a large-scale. In this paper, we develop a model to conduct this analysis and show that for a given, fixed infrastructure cost budget and a system power budget, it is possible for hardware overprovisioned systems to lead to a net performance benefit when compared to traditional, worst-case provisioned HPC systems.
最近的研究已经证实,在功率受限的高性能计算环境中,硬件过度配置可以显著提高系统功率利用率和作业吞吐量。然而,这些好处可能伴随着额外的基础设施成本,使得硬件过度配置的系统在经济上不太可行。因此,在大规模投资此类系统之前,进行详细的成本效益分析是很重要的。在本文中,我们开发了一个模型来进行此分析,并表明对于给定的固定基础设施成本预算和系统功率预算,与传统的最坏情况配置的HPC系统相比,硬件过度配置的系统可能会带来净性能优势。
{"title":"Economic Viability of Hardware Overprovisioning in Power-Constrained High Performance Computing","authors":"Tapasya Patki, D. Lowenthal, B. Rountree, M. Schulz, B. Supinski","doi":"10.1109/E2SC.2016.12","DOIUrl":"https://doi.org/10.1109/E2SC.2016.12","url":null,"abstract":"Recent research has established that hardware overprovisioning can improve system power utilization as well as job throughput in power-constrained, high-performance computing environments significantly. These benefits, however, may be associated with an additional infrastructure cost, making hardware overprovisioned systems less viable economically. It is thus important to conduct a detailed cost-benefit analysis before investing in such systems at a large-scale. In this paper, we develop a model to conduct this analysis and show that for a given, fixed infrastructure cost budget and a system power budget, it is possible for hardware overprovisioned systems to lead to a net performance benefit when compared to traditional, worst-case provisioned HPC systems.","PeriodicalId":424743,"journal":{"name":"2016 4th International Workshop on Energy Efficient Supercomputing (E2SC)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123084614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Software Controlled Clock Modulation for Energy Efficiency Optimization on Intel Processors 用于英特尔处理器能效优化的软件控制时钟调制
Pub Date : 2016-11-13 DOI: 10.1109/E2SC.2016.15
R. Schöne, T. Ilsche, Mario Bielert, Daniel Molka, D. Hackenberg
Current Intel processors implement a variety of power saving features like frequency scaling and idle states. These mechanisms limit the power draw and thereby decrease the thermal dissipation of the processors. However, they also have an impact on the achievable performance. The various mechanisms significantly differ regarding the amount of power savings, the latency of mode changes, and the associated overhead. In this paper, we describe and closely examine the so-called software controlled clock modulation mechanism for different processor generations. We present results that imply that the available documentation is not always correct and describe when this feature can be used to improve energy efficiency. We additionally compare it against the more popular feature of dynamic voltage and frequency scaling and develop a model to decide which feature should be used to optimize inter-process synchronizations on Intel Haswell-EP processors.
当前的英特尔处理器实现了各种节能特性,如频率缩放和空闲状态。这些机制限制了功耗,从而减少了处理器的散热。然而,它们也会对可实现的性能产生影响。各种机制在省电量、模式更改延迟和相关开销方面存在显著差异。在本文中,我们描述并仔细研究了所谓的软件控制时钟调制机制,用于不同的处理器世代。我们给出的结果表明,可用的文档并不总是正确的,并描述了何时可以使用此功能来提高能源效率。我们还将其与更流行的动态电压和频率缩放特性进行了比较,并开发了一个模型来决定应该使用哪个特性来优化英特尔Haswell-EP处理器上的进程间同步。
{"title":"Software Controlled Clock Modulation for Energy Efficiency Optimization on Intel Processors","authors":"R. Schöne, T. Ilsche, Mario Bielert, Daniel Molka, D. Hackenberg","doi":"10.1109/E2SC.2016.15","DOIUrl":"https://doi.org/10.1109/E2SC.2016.15","url":null,"abstract":"Current Intel processors implement a variety of power saving features like frequency scaling and idle states. These mechanisms limit the power draw and thereby decrease the thermal dissipation of the processors. However, they also have an impact on the achievable performance. The various mechanisms significantly differ regarding the amount of power savings, the latency of mode changes, and the associated overhead. In this paper, we describe and closely examine the so-called software controlled clock modulation mechanism for different processor generations. We present results that imply that the available documentation is not always correct and describe when this feature can be used to improve energy efficiency. We additionally compare it against the more popular feature of dynamic voltage and frequency scaling and develop a model to decide which feature should be used to optimize inter-process synchronizations on Intel Haswell-EP processors.","PeriodicalId":424743,"journal":{"name":"2016 4th International Workshop on Energy Efficient Supercomputing (E2SC)","volume":"123 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127072828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Power-Constrained Performance Scheduling of Data Parallel Tasks 数据并行任务的功率约束性能调度
Pub Date : 2016-11-13 DOI: 10.1109/E2SC.2016.11
E. Anger, Jeremiah J. Wilke, S. Yalamanchili
This paper explores the potential benefits to asynchronous task-based execution to achieve high performance under a power cap. Task-graph schedulers can flexibly reorder tasks and assign compute resources to data-parallel (elastic) tasks to minimize execution time, compared to executing step-by-step (bulk-synchronously). The efficient utilization of the available cores becomes a challenging task when a power cap is imposed. This work characterizes the trade-offs between power and performance as a Pareto frontier, identifying the set of configurations that achieve the best performance for a given amount of power. We present a set of scheduling heuristics that leverage this information dynamically during execution to ensure that the processing cores are used efficiently when running under a power cap. This work examines the behavior of three HPC applications on a 57 core Intel Xeon Phi device, demonstrating a significant performance increase over the baseline.
本文探讨了基于异步任务的执行在功率上限下实现高性能的潜在好处。与分步执行(批量同步)相比,任务图调度器可以灵活地重新排序任务并将计算资源分配给数据并行(弹性)任务,以最大限度地减少执行时间。当施加功率上限时,有效利用可用内核成为一项具有挑战性的任务。这项工作将功率和性能之间的权衡描述为帕累托边界,确定在给定功率的情况下实现最佳性能的配置集。我们提出了一组调度启发式方法,在执行过程中动态地利用这些信息,以确保在功率上限下运行时有效地使用处理核心。这项工作检查了57核Intel Xeon Phi设备上三个HPC应用程序的行为,显示了在基线上的显着性能提高。
{"title":"Power-Constrained Performance Scheduling of Data Parallel Tasks","authors":"E. Anger, Jeremiah J. Wilke, S. Yalamanchili","doi":"10.1109/E2SC.2016.11","DOIUrl":"https://doi.org/10.1109/E2SC.2016.11","url":null,"abstract":"This paper explores the potential benefits to asynchronous task-based execution to achieve high performance under a power cap. Task-graph schedulers can flexibly reorder tasks and assign compute resources to data-parallel (elastic) tasks to minimize execution time, compared to executing step-by-step (bulk-synchronously). The efficient utilization of the available cores becomes a challenging task when a power cap is imposed. This work characterizes the trade-offs between power and performance as a Pareto frontier, identifying the set of configurations that achieve the best performance for a given amount of power. We present a set of scheduling heuristics that leverage this information dynamically during execution to ensure that the processing cores are used efficiently when running under a power cap. This work examines the behavior of three HPC applications on a 57 core Intel Xeon Phi device, demonstrating a significant performance increase over the baseline.","PeriodicalId":424743,"journal":{"name":"2016 4th International Workshop on Energy Efficient Supercomputing (E2SC)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121439654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Characterizing Power and Performance of GPU Memory Access GPU内存访问的功耗和性能表征
Pub Date : 2016-11-13 DOI: 10.1109/E2SC.2016.8
Tyler N. Allen, Rong Ge
Power is a major limiting factor for the future of HPC and the realization of exascale computing under a power budget. GPUs have now become a mainstream parallel computation device in HPC, and optimizing power usage on GPUs is critical to achieving future goals. GPU memory is seldom studied, especially for power usage. Nevertheless, memory accesses draw significant power and are critical to understanding and optimizing GPU power usage. In this work we investigate the power and performance characteristics of various GPU memory accesses. We take an empirical approach and experimentally examine and evaluate how GPU power and performance vary with data access patterns and software parameters including GPU thread block size. In addition, we take into account the advanced power saving technology dynamic voltage and frequency scaling (DVFS) on GPU processing units and global memory. We analyze power and performance and provide some suggestions for the optimal parameters for applications that heavily use specific memory operations.
功率是未来高性能计算和在功率预算下实现百亿亿次计算的主要限制因素。gpu已经成为高性能计算领域的主流并行计算设备,优化gpu的功耗是实现未来目标的关键。GPU内存很少被研究,尤其是在功耗方面。然而,内存访问需要大量的能量,对于理解和优化GPU的能量使用是至关重要的。在这项工作中,我们研究了各种GPU内存访问的功率和性能特征。我们采用经验方法,实验检查和评估GPU功率和性能如何随数据访问模式和软件参数(包括GPU线程块大小)而变化。此外,我们还考虑了GPU处理单元和全局存储器上的先进节能技术动态电压和频率缩放(DVFS)。我们分析了功耗和性能,并为大量使用特定内存操作的应用程序提供了一些最佳参数建议。
{"title":"Characterizing Power and Performance of GPU Memory Access","authors":"Tyler N. Allen, Rong Ge","doi":"10.1109/E2SC.2016.8","DOIUrl":"https://doi.org/10.1109/E2SC.2016.8","url":null,"abstract":"Power is a major limiting factor for the future of HPC and the realization of exascale computing under a power budget. GPUs have now become a mainstream parallel computation device in HPC, and optimizing power usage on GPUs is critical to achieving future goals. GPU memory is seldom studied, especially for power usage. Nevertheless, memory accesses draw significant power and are critical to understanding and optimizing GPU power usage. In this work we investigate the power and performance characteristics of various GPU memory accesses. We take an empirical approach and experimentally examine and evaluate how GPU power and performance vary with data access patterns and software parameters including GPU thread block size. In addition, we take into account the advanced power saving technology dynamic voltage and frequency scaling (DVFS) on GPU processing units and global memory. We analyze power and performance and provide some suggestions for the optimal parameters for applications that heavily use specific memory operations.","PeriodicalId":424743,"journal":{"name":"2016 4th International Workshop on Energy Efficient Supercomputing (E2SC)","volume":"17 21","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120966138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Quantifying Energy Use in Dense Shared Memory HPC Node 高密度共享内存HPC节点能耗量化研究
Pub Date : 2016-11-13 DOI: 10.1109/E2SC.2016.7
Milos Puzovic, Srilatha Manne, Shay GalOn, M. Ono
In this paper we introduce a novel, dense, system-on-chip many-core Lenovo NeXtScale System® server based on the Cavium THUNDERX® ARMv8 processor that was designed for performance, energy efficiency and programmability. THUNDERX processor was designed to scale up to 96 cores in a cache coherent, shared memory architecture. Furthermore, this hardware system has a power interface board (PIB) that measures with high accuracy power draw across the server board in the NeXtScale™ chassis. We use data obtainable from PIB to measure the energy use of PARSEC and Splash-2 benchmarks and demonstrate how to use available hardware counters from THUNDERX processor in order to quantify the amount of energy that is used by different aspects of shared memory programming, such as cache coherent communication. We show that energy used required to keep caches coherent is negligible and demonstrate that shared memory programming paradigm is viable candidate for future energy aware HPC designs.
在本文中,我们介绍了一种新颖的,密集的,系统级片上多核联想NeXtScale系统®服务器,该服务器基于Cavium THUNDERX®ARMv8处理器,专为性能,能效和可编程性而设计。THUNDERX处理器被设计为在缓存一致的共享内存架构中扩展到96核。此外,该硬件系统具有电源接口板(PIB),可在NeXtScale™机箱中测量整个服务器板的高精度功耗。我们使用从PIB获得的数据来测量PARSEC和Splash-2基准测试的能耗,并演示如何使用来自THUNDERX处理器的可用硬件计数器来量化共享内存编程的不同方面(如缓存相干通信)所使用的能耗。我们表明,保持缓存一致性所需的能量可以忽略不计,并证明共享内存编程范式是未来节能高性能计算设计的可行候选。
{"title":"Quantifying Energy Use in Dense Shared Memory HPC Node","authors":"Milos Puzovic, Srilatha Manne, Shay GalOn, M. Ono","doi":"10.1109/E2SC.2016.7","DOIUrl":"https://doi.org/10.1109/E2SC.2016.7","url":null,"abstract":"In this paper we introduce a novel, dense, system-on-chip many-core Lenovo NeXtScale System® server based on the Cavium THUNDERX® ARMv8 processor that was designed for performance, energy efficiency and programmability. THUNDERX processor was designed to scale up to 96 cores in a cache coherent, shared memory architecture. Furthermore, this hardware system has a power interface board (PIB) that measures with high accuracy power draw across the server board in the NeXtScale™ chassis. We use data obtainable from PIB to measure the energy use of PARSEC and Splash-2 benchmarks and demonstrate how to use available hardware counters from THUNDERX processor in order to quantify the amount of energy that is used by different aspects of shared memory programming, such as cache coherent communication. We show that energy used required to keep caches coherent is negligible and demonstrate that shared memory programming paradigm is viable candidate for future energy aware HPC designs.","PeriodicalId":424743,"journal":{"name":"2016 4th International Workshop on Energy Efficient Supercomputing (E2SC)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128884783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Runtime Power Limiting of Parallel Applications on Intel Xeon Phi Processors Intel Xeon Phi处理器上并行应用程序的运行时功率限制
Pub Date : 2016-11-13 DOI: 10.1109/E2SC.2016.9
Gary Lawson, Vaibhav Sundriyal, M. Sosonkina, Yuzhong Shen
Energy-efficient computing is crucial to achieving exascale performance. Power capping and dynamic voltage/frequency scaling may be used to achieve energy savings. The Intel Xeon Phi implements a power capping strategy, where power thresholds are employed to dynamically set voltage/frequency at the runtime. By default, these power limits are much higher than the majority of applications would reach. Hence, this work aims to set the power limits according to the workload characteristics and application performance. Certain models, originally developed for the CPU performance and power, have been adapted here to determine power-limit thresholds in the Xeon Phi. Next, a procedure to select these thresholds dynamically is proposed, and its limitations outlined. When this runtime procedure along with static power-threshold assignment were compared with the default execution, energy savings ranging from 5% to 49% were observed, mostly for memory-intensive applications.
节能计算是实现百亿亿次性能的关键。可以使用功率封顶和动态电压/频率缩放来实现节能。英特尔至强Phi协处理器采用功率封顶策略,在运行时采用功率阈值来动态设置电压/频率。默认情况下,这些功率限制比大多数应用程序要高得多。因此,本工作旨在根据工作负载特征和应用程序性能设置功率限制。某些模型,最初是为CPU性能和功率开发的,已经在这里进行了调整,以确定Xeon Phi的功率限制阈值。接下来,提出了动态选择这些阈值的过程,并概述了其局限性。当将此运行时过程以及静态功率阈值分配与默认执行进行比较时,可以观察到节省了5%到49%的能源,主要用于内存密集型应用程序。
{"title":"Runtime Power Limiting of Parallel Applications on Intel Xeon Phi Processors","authors":"Gary Lawson, Vaibhav Sundriyal, M. Sosonkina, Yuzhong Shen","doi":"10.1109/E2SC.2016.9","DOIUrl":"https://doi.org/10.1109/E2SC.2016.9","url":null,"abstract":"Energy-efficient computing is crucial to achieving exascale performance. Power capping and dynamic voltage/frequency scaling may be used to achieve energy savings. The Intel Xeon Phi implements a power capping strategy, where power thresholds are employed to dynamically set voltage/frequency at the runtime. By default, these power limits are much higher than the majority of applications would reach. Hence, this work aims to set the power limits according to the workload characteristics and application performance. Certain models, originally developed for the CPU performance and power, have been adapted here to determine power-limit thresholds in the Xeon Phi. Next, a procedure to select these thresholds dynamically is proposed, and its limitations outlined. When this runtime procedure along with static power-threshold assignment were compared with the default execution, energy savings ranging from 5% to 49% were observed, mostly for memory-intensive applications.","PeriodicalId":424743,"journal":{"name":"2016 4th International Workshop on Energy Efficient Supercomputing (E2SC)","volume":"77 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128083613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
期刊
2016 4th International Workshop on Energy Efficient Supercomputing (E2SC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1