DRS: A deep reinforcement learning enhanced Kubernetes scheduler for microservice‐based system

Software: Practice and Experience Pub Date : 2023-10-25 DOI:10.1002/spe.3284

Zhaolong Jian, Xueshuo Xie, Yaozheng Fang, Yibing Jiang, Ye Lu, Ankan Dash, Tao Li, Guiling Wang

{"title":"DRS: A deep reinforcement learning enhanced Kubernetes scheduler for microservice‐based system","authors":"Zhaolong Jian, Xueshuo Xie, Yaozheng Fang, Yibing Jiang, Ye Lu, Ankan Dash, Tao Li, Guiling Wang","doi":"10.1002/spe.3284","DOIUrl":null,"url":null,"abstract":"Summary Recently, Kubernetes is widely used to manage and schedule the resources of microservices in cloud‐native distributed applications, as the most famous container orchestration framework. However, Kubernetes preferentially schedules microservices to nodes with rich and balanced CPU and memory resources on a single node. The native scheduler of Kubernetes, called Kube‐scheduler, may cause resource fragmentation and decrease resource utilization. In this paper, we propose a deep reinforcement learning enhanced Kubernetes scheduler named DRS. We initially frame the Kubernetes scheduling problem as a Markov decision process with intricately designed state , action , and reward structures in an effort to increase resource usage and decrease load imbalance. Then, we design and implement DRS mointor to perceive six parameters concerning resource utilization and create a thorough picture of all available resources globally. Finally, DRS can automatically learn the scheduling policy through interaction with the Kubernetes cluster, without relying on expert knowledge about workload and cluster status. We implement a prototype of DRS in a Kubernetes cluster with five nodes and evaluate its performance. Experimental results highlight that DRS overcomes the shortcomings of Kube‐scheduler and achieves the expected scheduling target with three workloads. With only 3.27% CPU overhead and 0.648% communication delay, DRS outperforms Kube‐scheduler by 27.29% in terms of resource utilization and reduces load imbalance by 2.90 times on average.","PeriodicalId":21899,"journal":{"name":"Software: Practice and Experience","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Software: Practice and Experience","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/spe.3284","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Summary Recently, Kubernetes is widely used to manage and schedule the resources of microservices in cloud‐native distributed applications, as the most famous container orchestration framework. However, Kubernetes preferentially schedules microservices to nodes with rich and balanced CPU and memory resources on a single node. The native scheduler of Kubernetes, called Kube‐scheduler, may cause resource fragmentation and decrease resource utilization. In this paper, we propose a deep reinforcement learning enhanced Kubernetes scheduler named DRS. We initially frame the Kubernetes scheduling problem as a Markov decision process with intricately designed state , action , and reward structures in an effort to increase resource usage and decrease load imbalance. Then, we design and implement DRS mointor to perceive six parameters concerning resource utilization and create a thorough picture of all available resources globally. Finally, DRS can automatically learn the scheduling policy through interaction with the Kubernetes cluster, without relying on expert knowledge about workload and cluster status. We implement a prototype of DRS in a Kubernetes cluster with five nodes and evaluate its performance. Experimental results highlight that DRS overcomes the shortcomings of Kube‐scheduler and achieves the expected scheduling target with three workloads. With only 3.27% CPU overhead and 0.648% communication delay, DRS outperforms Kube‐scheduler by 27.29% in terms of resource utilization and reduces load imbalance by 2.90 times on average.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

DRS:一个深度强化学习增强的Kubernetes调度器，用于基于微服务的系统

最近，Kubernetes作为最著名的容器编排框架，被广泛用于管理和调度云原生分布式应用程序中的微服务资源。然而，Kubernetes优先将微服务调度到单个节点上具有丰富且均衡的CPU和内存资源的节点上。Kubernetes的本机调度器，称为Kube - scheduler，可能会导致资源碎片并降低资源利用率。在本文中，我们提出了一个深度强化学习增强的Kubernetes调度器DRS。我们最初将Kubernetes调度问题框架为具有复杂设计的状态、动作和奖励结构的马尔可夫决策过程，以努力增加资源使用并减少负载不平衡。然后，我们设计并实现了DRS监测器来感知与资源利用有关的六个参数，并创建了全局所有可用资源的全景图。最后，DRS可以通过与Kubernetes集群的交互自动学习调度策略，而不依赖于关于工作负载和集群状态的专家知识。我们在一个有5个节点的Kubernetes集群中实现了DRS的原型，并对其性能进行了评估。实验结果表明，DRS克服了Kube - scheduler的缺点，在三种工作负载下达到了预期的调度目标。DRS仅具有3.27%的CPU开销和0.648%的通信延迟，在资源利用率方面比Kube - scheduler高出27.29%，平均减少了2.90倍的负载不平衡。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Software: Practice and Experience

自引率

0.00%

发文量