Tetris: Proactive Container Scheduling for Long-Term Load Balancing in Shared Clusters

IF 5.8 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS IEEE Transactions on Services Computing Pub Date : 2024-08-13 DOI:10.1109/TSC.2024.3442544
Fei Xu;Xiyue Shen;Shuohao Lin;Li Chen;Zhi Zhou;Fen Xiao;Fangming Liu
{"title":"Tetris: Proactive Container Scheduling for Long-Term Load Balancing in Shared Clusters","authors":"Fei Xu;Xiyue Shen;Shuohao Lin;Li Chen;Zhi Zhou;Fen Xiao;Fangming Liu","doi":"10.1109/TSC.2024.3442544","DOIUrl":null,"url":null,"abstract":"Long-running containerized workloads (e.g., machine learning), which typically show \n<italic>time-varying</i>\n patterns, are increasingly prevailing in shared production clusters. To improve workload performance, current schedulers mainly focus on optimizing \n<italic>short-term</i>\n benefits of cluster load balancing or \n<italic>initial container placement</i>\n on servers. However, this would inevitably bring many \n<italic>invalid migrations</i>\n (i.e., containers are migrated back and forth among servers over a short time window), leading to significant service level objective (SLO) violations. This paper introduces \n<italic>Tetris</i>\n, a \n<italic>model predictive control</i>\n (MPC)-based container scheduling strategy to proactively migrate long-running workloads for cluster load balancing. Specifically, we first build a discrete-time dynamic model for \n<italic>long-term</i>\n optimization of container scheduling. To solve such an optimization problem, \n<italic>Tetris</i>\n then employs two main components: (1) a container resource predictor, which leverages time-series analysis approaches to accurately predict the container resource consumption; (2) an MPC-based container scheduler that jointly optimizes the cluster load balancing and container migration cost \n<italic>over a certain sliding time window</i>\n. We implement and open source a prototype of \n<italic>Tetris</i>\n based on K8s. Extensive prototype experiments and trace-driven simulations demonstrate that \n<italic>Tetris</i>\n can improve the cluster load balancing degree by up to 77.8% without incurring any SLO violations, compared to the state-of-the-art container scheduling strategies.","PeriodicalId":13255,"journal":{"name":"IEEE Transactions on Services Computing","volume":"17 5","pages":"2918-2930"},"PeriodicalIF":5.8000,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Services Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10634837/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Long-running containerized workloads (e.g., machine learning), which typically show time-varying patterns, are increasingly prevailing in shared production clusters. To improve workload performance, current schedulers mainly focus on optimizing short-term benefits of cluster load balancing or initial container placement on servers. However, this would inevitably bring many invalid migrations (i.e., containers are migrated back and forth among servers over a short time window), leading to significant service level objective (SLO) violations. This paper introduces Tetris , a model predictive control (MPC)-based container scheduling strategy to proactively migrate long-running workloads for cluster load balancing. Specifically, we first build a discrete-time dynamic model for long-term optimization of container scheduling. To solve such an optimization problem, Tetris then employs two main components: (1) a container resource predictor, which leverages time-series analysis approaches to accurately predict the container resource consumption; (2) an MPC-based container scheduler that jointly optimizes the cluster load balancing and container migration cost over a certain sliding time window . We implement and open source a prototype of Tetris based on K8s. Extensive prototype experiments and trace-driven simulations demonstrate that Tetris can improve the cluster load balancing degree by up to 77.8% without incurring any SLO violations, compared to the state-of-the-art container scheduling strategies.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
俄罗斯方块:共享集群中长期负载平衡的主动式容器调度
长期运行的容器化工作负载(如机器学习)通常会呈现时变模式,在共享生产集群中越来越普遍。为提高工作负载性能,当前的调度器主要侧重于优化集群负载均衡或服务器上初始容器放置的短期效益。然而,这将不可避免地带来许多无效迁移(即容器在短时间窗口内在服务器之间来回迁移),导致严重违反服务水平目标(SLO)。本文介绍了一种基于模型预测控制(MPC)的容器调度策略--Tetris,它能主动迁移长期运行的工作负载,以实现集群负载平衡。具体来说,我们首先建立了一个离散时间动态模型,用于长期优化容器调度。为了解决这样一个优化问题,Tetris 采用了两个主要组件:(1)容器资源预测器,利用时间序列分析方法准确预测容器资源消耗;(2)基于 MPC 的容器调度器,在一定的滑动时间窗口内联合优化集群负载平衡和容器迁移成本。我们实现并开源了基于 K8s 的俄罗斯方块原型。广泛的原型实验和跟踪仿真表明,与最先进的容器调度策略相比,Tetris 可将集群负载平衡度提高 77.8%,且不会发生任何违反 SLO 的情况。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Services Computing
IEEE Transactions on Services Computing COMPUTER SCIENCE, INFORMATION SYSTEMS-COMPUTER SCIENCE, SOFTWARE ENGINEERING
CiteScore
11.50
自引率
6.20%
发文量
278
审稿时长
>12 weeks
期刊介绍: IEEE Transactions on Services Computing encompasses the computing and software aspects of the science and technology of services innovation research and development. It places emphasis on algorithmic, mathematical, statistical, and computational methods central to services computing. Topics covered include Service Oriented Architecture, Web Services, Business Process Integration, Solution Performance Management, and Services Operations and Management. The transactions address mathematical foundations, security, privacy, agreement, contract, discovery, negotiation, collaboration, and quality of service for web services. It also covers areas like composite web service creation, business and scientific applications, standards, utility models, business process modeling, integration, collaboration, and more in the realm of Services Computing.
期刊最新文献
Error Correction Aware Dependent Task Offloading in Satellite Edge Computing Cooperative and Competitive Pricing in Collaborative Edge Computing Preference-Aware Fault-Tolerant Function Embedding in Energy-Harvesting Serverless Edge Computing Pricing and Trading of Data Options on Data-as-a-Service Platforms DOPD: A Dynamic PD-Disaggregation Architecture for Maximizing Goodput in LLM Inference Serving
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1