Omega:用于大型计算集群的灵活、可伸缩的调度器

Proceedings of the Eleventh European Conference on Computer Systems Pub Date : 2013-04-15 DOI:10.1145/2465351.2465386

Malte Schwarzkopf, A. Konwinski, M. Abd-El-Malek, J. Wilkes

{"title":"Omega:用于大型计算集群的灵活、可伸缩的调度器","authors":"Malte Schwarzkopf, A. Konwinski, M. Abd-El-Malek, J. Wilkes","doi":"10.1145/2465351.2465386","DOIUrl":null,"url":null,"abstract":"Increasing scale and the need for rapid response to changing requirements are hard to meet with current monolithic cluster scheduler architectures. This restricts the rate at which new features can be deployed, decreases efficiency and utilization, and will eventually limit cluster growth. We present a novel approach to address these needs using parallelism, shared state, and lock-free optimistic concurrency control.\n We compare this approach to existing cluster scheduler designs, evaluate how much interference between schedulers occurs and how much it matters in practice, present some techniques to alleviate it, and finally discuss a use case highlighting the advantages of our approach -- all driven by real-life Google production workloads.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"39 1","pages":"351-364"},"PeriodicalIF":0.0000,"publicationDate":"2013-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"703","resultStr":"{\"title\":\"Omega: flexible, scalable schedulers for large compute clusters\",\"authors\":\"Malte Schwarzkopf, A. Konwinski, M. Abd-El-Malek, J. Wilkes\",\"doi\":\"10.1145/2465351.2465386\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Increasing scale and the need for rapid response to changing requirements are hard to meet with current monolithic cluster scheduler architectures. This restricts the rate at which new features can be deployed, decreases efficiency and utilization, and will eventually limit cluster growth. We present a novel approach to address these needs using parallelism, shared state, and lock-free optimistic concurrency control.\\n We compare this approach to existing cluster scheduler designs, evaluate how much interference between schedulers occurs and how much it matters in practice, present some techniques to alleviate it, and finally discuss a use case highlighting the advantages of our approach -- all driven by real-life Google production workloads.\",\"PeriodicalId\":20737,\"journal\":{\"name\":\"Proceedings of the Eleventh European Conference on Computer Systems\",\"volume\":\"39 1\",\"pages\":\"351-364\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-04-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"703\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Eleventh European Conference on Computer Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2465351.2465386\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Eleventh European Conference on Computer Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2465351.2465386","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 703

摘要

当前的单片集群调度器架构很难满足不断增长的规模和对不断变化的需求的快速响应需求。这限制了部署新特性的速度，降低了效率和利用率，最终将限制集群的增长。我们提出了一种使用并行性、共享状态和无锁乐观并发控制来解决这些需求的新方法。我们将这种方法与现有的集群调度器设计进行比较，评估调度器之间发生了多少干扰以及在实践中有多重要，提出了一些减轻干扰的技术，最后讨论了一个用例，突出了我们的方法的优点——所有这些都是由现实生活中的Google生产工作负载驱动的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Omega: flexible, scalable schedulers for large compute clusters

Increasing scale and the need for rapid response to changing requirements are hard to meet with current monolithic cluster scheduler architectures. This restricts the rate at which new features can be deployed, decreases efficiency and utilization, and will eventually limit cluster growth. We present a novel approach to address these needs using parallelism, shared state, and lock-free optimistic concurrency control. We compare this approach to existing cluster scheduler designs, evaluate how much interference between schedulers occurs and how much it matters in practice, present some techniques to alleviate it, and finally discuss a use case highlighting the advantages of our approach -- all driven by real-life Google production workloads.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the Eleventh European Conference on Computer Systems

自引率

0.00%

发文量