等待的人会有好事:优化云中的工作等待

Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference) Pub Date : 2021-11-01 DOI:10.1145/3472883.3487007

Pradeep Ambati, Noman Bashir, David E. Irwin, P. Shenoy

{"title":"等待的人会有好事:优化云中的工作等待","authors":"Pradeep Ambati, Noman Bashir, David E. Irwin, P. Shenoy","doi":"10.1145/3472883.3487007","DOIUrl":null,"url":null,"abstract":"Cloud-enabled schedulers execute jobs on either fixed resources or those acquired on demand from cloud platforms. Thus, these schedulers must define not only a scheduling policy, which selects which jobs run when fixed resources become available, but also a waiting policy, which selects which jobs wait for fixed resources when they are not available, rather than run on on-demand resources. As with scheduling policies, optimizing waiting policies requires a priori knowledge of job runtime. Unfortunately, prior work has shown that accurately predicting job runtime is challenging. In this paper, we show that optimizing job waiting in the cloud is possible without accurate job runtime predictions. To do so, we i) speculatively execute jobs on on-demand resources for a small time and cost to learn more about job runtime, and ii) develop a ML model to predict wait time from cluster state, which is more accurate and has less overhead than prior approaches that use job runtime predictions. We evaluate our approach on a year-long batch workload consisting of 14 million jobs, and show that it yields a cost and average wait time within 4% and 13%, respectively, of the optimal.","PeriodicalId":91949,"journal":{"name":"Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference)","volume":"21 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Good Things Come to Those Who Wait: Optimizing Job Waiting in the Cloud\",\"authors\":\"Pradeep Ambati, Noman Bashir, David E. Irwin, P. Shenoy\",\"doi\":\"10.1145/3472883.3487007\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Cloud-enabled schedulers execute jobs on either fixed resources or those acquired on demand from cloud platforms. Thus, these schedulers must define not only a scheduling policy, which selects which jobs run when fixed resources become available, but also a waiting policy, which selects which jobs wait for fixed resources when they are not available, rather than run on on-demand resources. As with scheduling policies, optimizing waiting policies requires a priori knowledge of job runtime. Unfortunately, prior work has shown that accurately predicting job runtime is challenging. In this paper, we show that optimizing job waiting in the cloud is possible without accurate job runtime predictions. To do so, we i) speculatively execute jobs on on-demand resources for a small time and cost to learn more about job runtime, and ii) develop a ML model to predict wait time from cluster state, which is more accurate and has less overhead than prior approaches that use job runtime predictions. We evaluate our approach on a year-long batch workload consisting of 14 million jobs, and show that it yields a cost and average wait time within 4% and 13%, respectively, of the optimal.\",\"PeriodicalId\":91949,\"journal\":{\"name\":\"Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference)\",\"volume\":\"21 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3472883.3487007\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3472883.3487007","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

支持云的调度器在固定资源或从云平台按需获取的资源上执行作业。因此，这些调度器不仅必须定义调度策略(在固定资源可用时选择哪些作业运行)，还必须定义等待策略(在固定资源不可用时选择哪些作业等待固定资源，而不是在按需资源上运行)。与调度策略一样，优化等待策略需要对作业运行时有先验的了解。不幸的是，先前的工作表明，准确预测作业运行时是具有挑战性的。在本文中，我们证明了在没有精确的作业运行时预测的情况下，优化云中的作业等待是可能的。为此，我们i)推测性地在按需资源上执行作业，花费很少的时间和成本，以了解更多关于作业运行时的信息;ii)开发一个ML模型，从集群状态预测等待时间，这比以前使用作业运行时预测的方法更准确，开销更少。我们在由1400万个作业组成的为期一年的批处理工作负载上评估了我们的方法，并表明它产生的成本和平均等待时间分别在最佳方法的4%和13%之内。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Good Things Come to Those Who Wait: Optimizing Job Waiting in the Cloud

Cloud-enabled schedulers execute jobs on either fixed resources or those acquired on demand from cloud platforms. Thus, these schedulers must define not only a scheduling policy, which selects which jobs run when fixed resources become available, but also a waiting policy, which selects which jobs wait for fixed resources when they are not available, rather than run on on-demand resources. As with scheduling policies, optimizing waiting policies requires a priori knowledge of job runtime. Unfortunately, prior work has shown that accurately predicting job runtime is challenging. In this paper, we show that optimizing job waiting in the cloud is possible without accurate job runtime predictions. To do so, we i) speculatively execute jobs on on-demand resources for a small time and cost to learn more about job runtime, and ii) develop a ML model to predict wait time from cluster state, which is more accurate and has less overhead than prior approaches that use job runtime predictions. We evaluate our approach on a year-long batch workload consisting of 14 million jobs, and show that it yields a cost and average wait time within 4% and 13%, respectively, of the optimal.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助