Demanding larger memory footprint and relying heavily on data locality has made last-level cache (LLC) a major contributor to overall energy consumption in modern computer systems. As a result, numerous techniques have been proposed to reduce power dissipation in LLCs via low power interconnects, energy-efficient signaling, and power-aware data encoding. One such technique that has proven successful at lowering dynamic power in cache interconnects is time-based data encoding that represents data with the time elapsed between subsequent pulses on a wire. Regrettably, a time-based data representation induces excessive transmission delay per every block transfer, thereby degrading the energy efficiency of memory intensive applications. This paper presents a novel adaptive mechanism that monitors characteristics of every application at runtime and intelligently uses time-based codes for LLC interconnects, thereby alleviating the diverse impact of longer transmission delay in time-based codes while still saving significant energy. Two adaptation approaches are realized for the proposed mechanism to monitor 1) application phases and 2) memory bursts. Experimental results on a set of 12 memory intensive parallel applications on a quad-core system indicate that the proposed encoding mechanism can improve system performance by an average of 9%, which results in improving the system energy-efficiency by 7% on average. Moreover, the proposed hardware controller consumes less than 1% area of a 4MB LLC.
{"title":"Adaptive Time-based Encoding for Energy-Efficient Large Cache Architectures","authors":"Payman Behnam, N. Sedaghati, M. N. Bojnordi","doi":"10.1145/3149412.3149417","DOIUrl":"https://doi.org/10.1145/3149412.3149417","url":null,"abstract":"Demanding larger memory footprint and relying heavily on data locality has made last-level cache (LLC) a major contributor to overall energy consumption in modern computer systems. As a result, numerous techniques have been proposed to reduce power dissipation in LLCs via low power interconnects, energy-efficient signaling, and power-aware data encoding. One such technique that has proven successful at lowering dynamic power in cache interconnects is time-based data encoding that represents data with the time elapsed between subsequent pulses on a wire. Regrettably, a time-based data representation induces excessive transmission delay per every block transfer, thereby degrading the energy efficiency of memory intensive applications. This paper presents a novel adaptive mechanism that monitors characteristics of every application at runtime and intelligently uses time-based codes for LLC interconnects, thereby alleviating the diverse impact of longer transmission delay in time-based codes while still saving significant energy. Two adaptation approaches are realized for the proposed mechanism to monitor 1) application phases and 2) memory bursts. Experimental results on a set of 12 memory intensive parallel applications on a quad-core system indicate that the proposed encoding mechanism can improve system performance by an average of 9%, which results in improving the system energy-efficiency by 7% on average. Moreover, the proposed hardware controller consumes less than 1% area of a 4MB LLC.","PeriodicalId":102033,"journal":{"name":"Proceedings of the 5th International Workshop on Energy Efficient Supercomputing","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121658821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xingfu Wu, V. Taylor, Jeanine E. Cook, Tanner Juedeman
Energy efficient execution of scientific applications requires insight into how HPC system features affect the performance and power of the applications. In this paper, we analyze and model performance and power characteristics of hybrid MPI/OpenMP LULESH (Livermore Unstructured Lagrange Explicit Shock Hydrodynamics) miniapps under various workloads using MuMMI (Multiple Metrics Modeling Infrastructure). Output from these models is then used to guide code optimizations of performance and power. Our optimization methods result in performance improvement and energy savings of up to approximately 10%. Further, based on the insight learned from our models and measurements under various workloads, applying DCT (Dynamic Concurrency Throttling) to the optimized codes results in the energy savings by 43.12% to 58.30% for different problem sizes compared with the baseline results on 27 nodes with 32 threads per node on a 36-node Intel Haswell testbed cluster Shepard.
{"title":"Performance and Power Characteristics and Optimizations of Hybrid MPI/OpenMP LULESH Miniapps under Various Workloads","authors":"Xingfu Wu, V. Taylor, Jeanine E. Cook, Tanner Juedeman","doi":"10.1145/3149412.3149416","DOIUrl":"https://doi.org/10.1145/3149412.3149416","url":null,"abstract":"Energy efficient execution of scientific applications requires insight into how HPC system features affect the performance and power of the applications. In this paper, we analyze and model performance and power characteristics of hybrid MPI/OpenMP LULESH (Livermore Unstructured Lagrange Explicit Shock Hydrodynamics) miniapps under various workloads using MuMMI (Multiple Metrics Modeling Infrastructure). Output from these models is then used to guide code optimizations of performance and power. Our optimization methods result in performance improvement and energy savings of up to approximately 10%. Further, based on the insight learned from our models and measurements under various workloads, applying DCT (Dynamic Concurrency Throttling) to the optimized codes results in the energy savings by 43.12% to 58.30% for different problem sizes compared with the baseline results on 27 nodes with 32 threads per node on a 36-node Intel Haswell testbed cluster Shepard.","PeriodicalId":102033,"journal":{"name":"Proceedings of the 5th International Workshop on Energy Efficient Supercomputing","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115976823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aniruddha Marathe, Yijia Zhang, Grayson Blanks, Nirmal Kumbhare, G. Abdulla, B. Rountree
Traditional HPC performance and energy characterization approaches assume homogeneity and predictability in the performance of the target processor platform. Consequently, processor performance variation has been considered to be a secondary issue in the broader problem of performance characterization. In this work, we present an empirical survey of the variation in processor performance and energy efficiency on several generations of HPC-grade Intel processors. Our study shows that, compared to the previous generation of Intel processors, the problem of performance variation has become worse on more recent generation of Intel processors. Specifically, the performance variation across processors on a large-scale production HPC cluster at LLNL has increased to 20% and the run-to-run variation in the performance of individual processors has increased to 15%. We show that this variation is further magnified under a hardware-enforced power constraint, potentially due to the increase in number of cores, inconsistencies in the chip manufacturing process and their combined impact on processor's energy management functionality. Our experimentation with a hardware-enforced processor power constraint shows that the variation in processor performance and energy efficiency has increased by up to 4x on the latest Intel processors.
{"title":"An empirical survey of performance and energy efficiency variation on Intel processors","authors":"Aniruddha Marathe, Yijia Zhang, Grayson Blanks, Nirmal Kumbhare, G. Abdulla, B. Rountree","doi":"10.1145/3149412.3149421","DOIUrl":"https://doi.org/10.1145/3149412.3149421","url":null,"abstract":"Traditional HPC performance and energy characterization approaches assume homogeneity and predictability in the performance of the target processor platform. Consequently, processor performance variation has been considered to be a secondary issue in the broader problem of performance characterization. In this work, we present an empirical survey of the variation in processor performance and energy efficiency on several generations of HPC-grade Intel processors. Our study shows that, compared to the previous generation of Intel processors, the problem of performance variation has become worse on more recent generation of Intel processors. Specifically, the performance variation across processors on a large-scale production HPC cluster at LLNL has increased to 20% and the run-to-run variation in the performance of individual processors has increased to 15%. We show that this variation is further magnified under a hardware-enforced power constraint, potentially due to the increase in number of cores, inconsistencies in the chip manufacturing process and their combined impact on processor's energy management functionality. Our experimentation with a hardware-enforced processor power constraint shows that the variation in processor performance and energy efficiency has increased by up to 4x on the latest Intel processors.","PeriodicalId":102033,"journal":{"name":"Proceedings of the 5th International Workshop on Energy Efficient Supercomputing","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122299515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sridutt Bhalachandra, Allan Porterfield, Stephen L. Olivier, J. Prins, R. Fowler
Power is increasingly the limiting factor in High Performance Computing (HPC) at Exascale and will continue to influence future advancements in supercomputing. Recent processors equipped with on-board hardware counters allow real time monitoring of operating conditions such as energy and temperature, in addition to performance measures such as instructions retired and memory accesses. An experimental memory study presented on modern CPU architectures, Intel Sandybridge and Haswell, identifies a metric, TORo_core, that detects bandwidth saturation and increased latency. TORo-Core is used to construct a dynamic policy applied at coarse and fine-grained levels to modulate per-core power controls on Haswell machines. The coarse and fine-grained application of dynamic policy shows best energy savings of 32.1% and 19.5% with a 2% slowdown in both cases. On average for six MPI applications, the fine-grained dynamic policy speeds execution by 1% while the coarse-grained application results in a 3% slowdown. Energy savings through frequency reduction not only provide cost advantages, they also reduce resource contention and create additional thermal headroom for non-throttled cores improving performance.
{"title":"Improving Energy Efficiency in Memory-constrained Applications Using Core-specific Power Control","authors":"Sridutt Bhalachandra, Allan Porterfield, Stephen L. Olivier, J. Prins, R. Fowler","doi":"10.1145/3149412.3149418","DOIUrl":"https://doi.org/10.1145/3149412.3149418","url":null,"abstract":"Power is increasingly the limiting factor in High Performance Computing (HPC) at Exascale and will continue to influence future advancements in supercomputing. Recent processors equipped with on-board hardware counters allow real time monitoring of operating conditions such as energy and temperature, in addition to performance measures such as instructions retired and memory accesses. An experimental memory study presented on modern CPU architectures, Intel Sandybridge and Haswell, identifies a metric, TORo_core, that detects bandwidth saturation and increased latency. TORo-Core is used to construct a dynamic policy applied at coarse and fine-grained levels to modulate per-core power controls on Haswell machines. The coarse and fine-grained application of dynamic policy shows best energy savings of 32.1% and 19.5% with a 2% slowdown in both cases. On average for six MPI applications, the fine-grained dynamic policy speeds execution by 1% while the coarse-grained application results in a 3% slowdown. Energy savings through frequency reduction not only provide cost advantages, they also reduce resource contention and create additional thermal headroom for non-throttled cores improving performance.","PeriodicalId":102033,"journal":{"name":"Proceedings of the 5th International Workshop on Energy Efficient Supercomputing","volume":"680 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116108565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Ellsworth, Tapasya Patki, M. Schulz, B. Rountree, A. Malony
Comparison of power scheduling strategies at scale is challenging due to the limited availability of high performance computing (HPC) systems exposing power control to researchers. In this paper we describe PowSim, a simulator for comparing different power management strategies at large-scale for HPC systems. PowSim enables light-weight simulation of dynamically-changing hardware-enforced processor power caps at the scale of an HPC cluster, supporting power scheduling research. PowSim's architecture supports easily changing power scheduler, job scheduler, and application models to enable comparison studies. Preliminary results comparing generalized power scheduling strategies are also presented.
{"title":"Simulating Power Scheduling at Scale","authors":"D. Ellsworth, Tapasya Patki, M. Schulz, B. Rountree, A. Malony","doi":"10.1145/3149412.3149414","DOIUrl":"https://doi.org/10.1145/3149412.3149414","url":null,"abstract":"Comparison of power scheduling strategies at scale is challenging due to the limited availability of high performance computing (HPC) systems exposing power control to researchers. In this paper we describe PowSim, a simulator for comparing different power management strategies at large-scale for HPC systems. PowSim enables light-weight simulation of dynamically-changing hardware-enforced processor power caps at the scale of an HPC cluster, supporting power scheduling research. PowSim's architecture supports easily changing power scheduler, job scheduler, and application models to enable comparison studies. Preliminary results comparing generalized power scheduling strategies are also presented.","PeriodicalId":102033,"journal":{"name":"Proceedings of the 5th International Workshop on Energy Efficient Supercomputing","volume":"140 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127006700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Power and energy consumption are now key design concerns in HPC. To develop software that meets power and energy constraints, scientific application developers must have a reliable way to measure these values and relate them to application-specific events. Scientists face two challenges when measuring and controlling power: (1) diversity---power and energy measurement interfaces differ between vendors---and (2) distribution---power measurements of MPI simulations should be unaffected by the mapping of MPI processes to physical hardware nodes. While some prior work defines standardized software interfaces for power management, these efforts do not support distributed environments. The result is that the current state-of-the-art requires scientists interested in power optimization to write tedious, error-prone application-and system-specific code. To make power measurement and management easier for scientists, we propose PoLiMEr, a user-space library that supports fine-grained application-level power monitoring and capping. We evaluate PoLiMEr by deploying it on Argonne National Laboratory's Theta system and using it to measure and cap power, scaling the performance and power of several applications on up to 1024 nodes. We find that PoLiMEr requires only a few additional lines of code, but easily allows users to detect energy anomalies, apply power caps, and evaluate Theta's unique architectural features.
{"title":"PoLiMEr: An Energy Monitoring and Power Limiting Interface for HPC Applications","authors":"I. Marincic, V. Vishwanath, H. Hoffmann","doi":"10.1145/3149412.3149419","DOIUrl":"https://doi.org/10.1145/3149412.3149419","url":null,"abstract":"Power and energy consumption are now key design concerns in HPC. To develop software that meets power and energy constraints, scientific application developers must have a reliable way to measure these values and relate them to application-specific events. Scientists face two challenges when measuring and controlling power: (1) diversity---power and energy measurement interfaces differ between vendors---and (2) distribution---power measurements of MPI simulations should be unaffected by the mapping of MPI processes to physical hardware nodes. While some prior work defines standardized software interfaces for power management, these efforts do not support distributed environments. The result is that the current state-of-the-art requires scientists interested in power optimization to write tedious, error-prone application-and system-specific code. To make power measurement and management easier for scientists, we propose PoLiMEr, a user-space library that supports fine-grained application-level power monitoring and capping. We evaluate PoLiMEr by deploying it on Argonne National Laboratory's Theta system and using it to measure and cap power, scaling the performance and power of several applications on up to 1024 nodes. We find that PoLiMEr requires only a few additional lines of code, but easily allows users to detect energy anomalies, apply power caps, and evaluate Theta's unique architectural features.","PeriodicalId":102033,"journal":{"name":"Proceedings of the 5th International Workshop on Energy Efficient Supercomputing","volume":"131 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131187615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper focuses on different methods developed to detect the upcoming execution phase of a workload with regards to power demands. By controlling the state of the processor in power demanding phases, the operating system can maintain a relatively steady power pattern in the workload, leading to higher power headroom in the system. We compared two main approaches in phase prediction. Firstly, we show that by detecting the precursors leading to an upcoming phase, the system can speculate the next phase with high accuracy. Additionally, we compared this method with another approach which relies on the assumption of phase locality, expecting the current dominant phase to continue in the near future. Our results show that by detecting the precursors we can detect 81% of the upcoming phases with lower processor frequency switching overhead compared to most of the proposed locality-based methods.
{"title":"Execution Phase Prediction Based on Phase Precursors and Locality","authors":"Saman Khoshbakht, N. Dimopoulos","doi":"10.1145/3149412.3149415","DOIUrl":"https://doi.org/10.1145/3149412.3149415","url":null,"abstract":"This paper focuses on different methods developed to detect the upcoming execution phase of a workload with regards to power demands. By controlling the state of the processor in power demanding phases, the operating system can maintain a relatively steady power pattern in the workload, leading to higher power headroom in the system. We compared two main approaches in phase prediction. Firstly, we show that by detecting the precursors leading to an upcoming phase, the system can speculate the next phase with high accuracy. Additionally, we compared this method with another approach which relies on the assumption of phase locality, expecting the current dominant phase to continue in the near future. Our results show that by detecting the precursors we can detect 81% of the upcoming phases with lower processor frequency switching overhead compared to most of the proposed locality-based methods.","PeriodicalId":102033,"journal":{"name":"Proceedings of the 5th International Workshop on Energy Efficient Supercomputing","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116740145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the age of exascale computing, it is crucial to provide the best possible performance under power constraints. A major part of this optimization is managing power and bandwidth intelligently in a cluster to maximize performance. There are significant improvements in the power efficiency of HPC runtimes, yet little work has explored our ability to determine the theoretical optimal performance under a give power and bandwidth bound. In this paper, we present a scalable model to identify the optimal power and bandwidth distribution such that the makespan of a program is minimized. We utilize the network flow formulation in constructing a linear program that is efficient to solve. We demonstrate the applicability of the model to MPI programs and provide synthetic benchmarks on the performance of the model.
{"title":"Scalable performance bounding under multiple constrained renewable resources","authors":"R. Medhat, S. Funk, B. Rountree","doi":"10.1145/3149412.3149422","DOIUrl":"https://doi.org/10.1145/3149412.3149422","url":null,"abstract":"In the age of exascale computing, it is crucial to provide the best possible performance under power constraints. A major part of this optimization is managing power and bandwidth intelligently in a cluster to maximize performance. There are significant improvements in the power efficiency of HPC runtimes, yet little work has explored our ability to determine the theoretical optimal performance under a give power and bandwidth bound. In this paper, we present a scalable model to identify the optimal power and bandwidth distribution such that the makespan of a program is minimized. We utilize the network flow formulation in constructing a linear program that is efficient to solve. We demonstrate the applicability of the model to MPI programs and provide synthetic benchmarks on the performance of the model.","PeriodicalId":102033,"journal":{"name":"Proceedings of the 5th International Workshop on Energy Efficient Supercomputing","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130420884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bo Wang, Dirk Schmidl, C. Terboven, Matthias S. Müller
A future large-scale high-performance computing (HPC) cluster will likely be power capped since the surrounding infrastructure like power supply and cooling is constrained. For such a cluster, it may be impossible to supply thermal design power (TDP) to all components. The default power supply of current system guarantees TDP to each computing node will become unfeasible. Power capping was introduced to limit power consumption to a value below TDP, with the drawback of resulting performance limitations. We developed an alternative dynamic application-aware power scheduling (DAPS) strategy to enforce a predetermined power limit and at the same time improve the cluster-wide performance. The power scheduling decision is guided by the cap value, the hardware usage, and the application-specific performance sensitivity to power. Applying DAPS on a test platform comprising 12 computing nodes with three representative applications, we obtained a performance improvement up to 17% compared to a strategy that distributes power equally and statically across nodes.
{"title":"Dynamic Application-aware Power Capping","authors":"Bo Wang, Dirk Schmidl, C. Terboven, Matthias S. Müller","doi":"10.1145/3149412.3149413","DOIUrl":"https://doi.org/10.1145/3149412.3149413","url":null,"abstract":"A future large-scale high-performance computing (HPC) cluster will likely be power capped since the surrounding infrastructure like power supply and cooling is constrained. For such a cluster, it may be impossible to supply thermal design power (TDP) to all components. The default power supply of current system guarantees TDP to each computing node will become unfeasible. Power capping was introduced to limit power consumption to a value below TDP, with the drawback of resulting performance limitations. We developed an alternative dynamic application-aware power scheduling (DAPS) strategy to enforce a predetermined power limit and at the same time improve the cluster-wide performance. The power scheduling decision is guided by the cap value, the hardware usage, and the application-specific performance sensitivity to power. Applying DAPS on a test platform comprising 12 computing nodes with three representative applications, we obtained a performance improvement up to 17% compared to a strategy that distributes power equally and statically across nodes.","PeriodicalId":102033,"journal":{"name":"Proceedings of the 5th International Workshop on Energy Efficient Supercomputing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128555795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
William E. Whiteside, S. Funk, Aniruddha Marathe, B. Rountree
Exascale architecture computers will be limited not only by hardware but also by power consumption. In these bounded power situations, a system can deliver better results by overprovisioning -having more hardware than can be fully powered. Overprovisioned systems require power to be an integral part of any scheduling algorithm. This paper introduces a system called PANN that uses neural networks to dynamically allocate power in overprovisioned systems. Traces of applications are used to train a neural network power controller, which is then used as an online power allocation system. Simulation results were obtained on traces of ParaDiS and work is continuing on more applications. We found in simulations PANN completes jobs up to 24% faster than static allocation. For tightly constrained systems PANN performs 6% to 11% better than Conductor. A runtime system has been constructed, but it is not yet performing as expected, reasons for this are explored.
{"title":"PANN: Power Allocation via Neural Networks Dynamic Bounded-Power Allocation in High Performance Computing","authors":"William E. Whiteside, S. Funk, Aniruddha Marathe, B. Rountree","doi":"10.1145/3149412.3149420","DOIUrl":"https://doi.org/10.1145/3149412.3149420","url":null,"abstract":"Exascale architecture computers will be limited not only by hardware but also by power consumption. In these bounded power situations, a system can deliver better results by overprovisioning -having more hardware than can be fully powered. Overprovisioned systems require power to be an integral part of any scheduling algorithm. This paper introduces a system called PANN that uses neural networks to dynamically allocate power in overprovisioned systems. Traces of applications are used to train a neural network power controller, which is then used as an online power allocation system. Simulation results were obtained on traces of ParaDiS and work is continuing on more applications. We found in simulations PANN completes jobs up to 24% faster than static allocation. For tightly constrained systems PANN performs 6% to 11% better than Conductor. A runtime system has been constructed, but it is not yet performing as expected, reasons for this are explored.","PeriodicalId":102033,"journal":{"name":"Proceedings of the 5th International Workshop on Energy Efficient Supercomputing","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121676992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}