共同优化作业分配和资源划分，提高云数据中心系统吞吐量

IF 1.5 3区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE ACM Transactions on Architecture and Code Optimization Pub Date : 2023-07-19 DOI:https://dl.acm.org/doi/10.1145/3593055

Ruobing Chen, Haosen Shi, Jinping Wu, Yusen Li, Xiaoguang Liu, Gang Wang

{"title":"共同优化作业分配和资源划分，提高云数据中心系统吞吐量","authors":"Ruobing Chen, Haosen Shi, Jinping Wu, Yusen Li, Xiaoguang Liu, Gang Wang","doi":"https://dl.acm.org/doi/10.1145/3593055","DOIUrl":null,"url":null,"abstract":"Colocating multiple jobs on the same server has been widely applied for improving resource utilization in cloud datacenters. However, the colocated jobs would contend for the shared resources, which could lead to significant performance degradation. An efficient approach for eliminating performance interference is to partition the shared resources among the colocated jobs. However, this makes the resource management in datacenters very challenging. In this paper, we propose JointOPT, the first resource management framework that optimizes job assignment and resource partitioning jointly for improving the throughput of cloud datacenters. JointOPT uses a local search based algorithm to find the near optimal job assignment configuration, and uses a deep reinforcement learning (DRL) based approach to dynamically partition the shared resources among the colocated jobs. In order to reduce the interaction overhead with real systems, it leverages deep learning to estimate job performance without running them on real servers. We conduct extensive experiments to evaluate JointOPT and the results show that JointOPT significantly outperforms the state-of-the-art baselines, with an advantage from 13.3% to 47.7%.","PeriodicalId":50920,"journal":{"name":"ACM Transactions on Architecture and Code Optimization","volume":"254-255 1","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2023-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Jointly Optimizing Job Assignment and Resource Partitioning for Improving System Throughput in Cloud Datacenters\",\"authors\":\"Ruobing Chen, Haosen Shi, Jinping Wu, Yusen Li, Xiaoguang Liu, Gang Wang\",\"doi\":\"https://dl.acm.org/doi/10.1145/3593055\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Colocating multiple jobs on the same server has been widely applied for improving resource utilization in cloud datacenters. However, the colocated jobs would contend for the shared resources, which could lead to significant performance degradation. An efficient approach for eliminating performance interference is to partition the shared resources among the colocated jobs. However, this makes the resource management in datacenters very challenging. In this paper, we propose JointOPT, the first resource management framework that optimizes job assignment and resource partitioning jointly for improving the throughput of cloud datacenters. JointOPT uses a local search based algorithm to find the near optimal job assignment configuration, and uses a deep reinforcement learning (DRL) based approach to dynamically partition the shared resources among the colocated jobs. In order to reduce the interaction overhead with real systems, it leverages deep learning to estimate job performance without running them on real servers. We conduct extensive experiments to evaluate JointOPT and the results show that JointOPT significantly outperforms the state-of-the-art baselines, with an advantage from 13.3% to 47.7%.\",\"PeriodicalId\":50920,\"journal\":{\"name\":\"ACM Transactions on Architecture and Code Optimization\",\"volume\":\"254-255 1\",\"pages\":\"\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2023-07-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Architecture and Code Optimization\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/https://dl.acm.org/doi/10.1145/3593055\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Architecture and Code Optimization","FirstCategoryId":"94","ListUrlMain":"https://doi.org/https://dl.acm.org/doi/10.1145/3593055","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

为了提高云数据中心的资源利用率，在同一台服务器上配置多个作业已经得到了广泛的应用。然而，共存的作业将争夺共享资源，这可能导致显著的性能下降。消除性能干扰的一种有效方法是将共享资源在多个并发作业之间进行分区。然而，这使得数据中心的资源管理非常具有挑战性。在本文中，我们提出了JointOPT，这是第一个共同优化作业分配和资源划分的资源管理框架，以提高云数据中心的吞吐量。JointOPT使用基于局部搜索的算法来找到接近最优的作业分配配置，并使用基于深度强化学习(DRL)的方法在并发作业之间动态划分共享资源。为了减少与真实系统的交互开销，它利用深度学习来评估作业性能，而无需在真实服务器上运行它们。我们进行了大量的实验来评估JointOPT，结果表明JointOPT显著优于最先进的基线，优势从13.3%到47.7%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Jointly Optimizing Job Assignment and Resource Partitioning for Improving System Throughput in Cloud Datacenters

Colocating multiple jobs on the same server has been widely applied for improving resource utilization in cloud datacenters. However, the colocated jobs would contend for the shared resources, which could lead to significant performance degradation. An efficient approach for eliminating performance interference is to partition the shared resources among the colocated jobs. However, this makes the resource management in datacenters very challenging. In this paper, we propose JointOPT, the first resource management framework that optimizes job assignment and resource partitioning jointly for improving the throughput of cloud datacenters. JointOPT uses a local search based algorithm to find the near optimal job assignment configuration, and uses a deep reinforcement learning (DRL) based approach to dynamically partition the shared resources among the colocated jobs. In order to reduce the interaction overhead with real systems, it leverages deep learning to estimate job performance without running them on real servers. We conduct extensive experiments to evaluate JointOPT and the results show that JointOPT significantly outperforms the state-of-the-art baselines, with an advantage from 13.3% to 47.7%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Architecture and Code Optimization 工程技术-计算机：理论方法

CiteScore

3.60

自引率

6.20%

发文量

审稿时长

6-12 weeks

期刊介绍： ACM Transactions on Architecture and Code Optimization (TACO) focuses on hardware, software, and system research spanning the fields of computer architecture and code optimization. Articles that appear in TACO will either present new techniques and concepts or report on experiences and experiments with actual systems. Insights useful to architects, hardware or software developers, designers, builders, and users will be emphasized.