{"title":"Power-Constrained Performance Scheduling of Data Parallel Tasks","authors":"E. Anger, Jeremiah J. Wilke, S. Yalamanchili","doi":"10.1109/E2SC.2016.11","DOIUrl":null,"url":null,"abstract":"This paper explores the potential benefits to asynchronous task-based execution to achieve high performance under a power cap. Task-graph schedulers can flexibly reorder tasks and assign compute resources to data-parallel (elastic) tasks to minimize execution time, compared to executing step-by-step (bulk-synchronously). The efficient utilization of the available cores becomes a challenging task when a power cap is imposed. This work characterizes the trade-offs between power and performance as a Pareto frontier, identifying the set of configurations that achieve the best performance for a given amount of power. We present a set of scheduling heuristics that leverage this information dynamically during execution to ensure that the processing cores are used efficiently when running under a power cap. This work examines the behavior of three HPC applications on a 57 core Intel Xeon Phi device, demonstrating a significant performance increase over the baseline.","PeriodicalId":424743,"journal":{"name":"2016 4th International Workshop on Energy Efficient Supercomputing (E2SC)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 4th International Workshop on Energy Efficient Supercomputing (E2SC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/E2SC.2016.11","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
This paper explores the potential benefits to asynchronous task-based execution to achieve high performance under a power cap. Task-graph schedulers can flexibly reorder tasks and assign compute resources to data-parallel (elastic) tasks to minimize execution time, compared to executing step-by-step (bulk-synchronously). The efficient utilization of the available cores becomes a challenging task when a power cap is imposed. This work characterizes the trade-offs between power and performance as a Pareto frontier, identifying the set of configurations that achieve the best performance for a given amount of power. We present a set of scheduling heuristics that leverage this information dynamically during execution to ensure that the processing cores are used efficiently when running under a power cap. This work examines the behavior of three HPC applications on a 57 core Intel Xeon Phi device, demonstrating a significant performance increase over the baseline.