Fei Xu;Xiyue Shen;Shuohao Lin;Li Chen;Zhi Zhou;Fen Xiao;Fangming Liu
{"title":"Tetris: Proactive Container Scheduling for Long-Term Load Balancing in Shared Clusters","authors":"Fei Xu;Xiyue Shen;Shuohao Lin;Li Chen;Zhi Zhou;Fen Xiao;Fangming Liu","doi":"10.1109/TSC.2024.3442544","DOIUrl":null,"url":null,"abstract":"Long-running containerized workloads (e.g., machine learning), which typically show \n<italic>time-varying</i>\n patterns, are increasingly prevailing in shared production clusters. To improve workload performance, current schedulers mainly focus on optimizing \n<italic>short-term</i>\n benefits of cluster load balancing or \n<italic>initial container placement</i>\n on servers. However, this would inevitably bring many \n<italic>invalid migrations</i>\n (i.e., containers are migrated back and forth among servers over a short time window), leading to significant service level objective (SLO) violations. This paper introduces \n<italic>Tetris</i>\n, a \n<italic>model predictive control</i>\n (MPC)-based container scheduling strategy to proactively migrate long-running workloads for cluster load balancing. Specifically, we first build a discrete-time dynamic model for \n<italic>long-term</i>\n optimization of container scheduling. To solve such an optimization problem, \n<italic>Tetris</i>\n then employs two main components: (1) a container resource predictor, which leverages time-series analysis approaches to accurately predict the container resource consumption; (2) an MPC-based container scheduler that jointly optimizes the cluster load balancing and container migration cost \n<italic>over a certain sliding time window</i>\n. We implement and open source a prototype of \n<italic>Tetris</i>\n based on K8s. Extensive prototype experiments and trace-driven simulations demonstrate that \n<italic>Tetris</i>\n can improve the cluster load balancing degree by up to 77.8% without incurring any SLO violations, compared to the state-of-the-art container scheduling strategies.","PeriodicalId":13255,"journal":{"name":"IEEE Transactions on Services Computing","volume":"17 5","pages":"2918-2930"},"PeriodicalIF":5.8000,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Services Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10634837/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Long-running containerized workloads (e.g., machine learning), which typically show
time-varying
patterns, are increasingly prevailing in shared production clusters. To improve workload performance, current schedulers mainly focus on optimizing
short-term
benefits of cluster load balancing or
initial container placement
on servers. However, this would inevitably bring many
invalid migrations
(i.e., containers are migrated back and forth among servers over a short time window), leading to significant service level objective (SLO) violations. This paper introduces
Tetris
, a
model predictive control
(MPC)-based container scheduling strategy to proactively migrate long-running workloads for cluster load balancing. Specifically, we first build a discrete-time dynamic model for
long-term
optimization of container scheduling. To solve such an optimization problem,
Tetris
then employs two main components: (1) a container resource predictor, which leverages time-series analysis approaches to accurately predict the container resource consumption; (2) an MPC-based container scheduler that jointly optimizes the cluster load balancing and container migration cost
over a certain sliding time window
. We implement and open source a prototype of
Tetris
based on K8s. Extensive prototype experiments and trace-driven simulations demonstrate that
Tetris
can improve the cluster load balancing degree by up to 77.8% without incurring any SLO violations, compared to the state-of-the-art container scheduling strategies.
期刊介绍:
IEEE Transactions on Services Computing encompasses the computing and software aspects of the science and technology of services innovation research and development. It places emphasis on algorithmic, mathematical, statistical, and computational methods central to services computing. Topics covered include Service Oriented Architecture, Web Services, Business Process Integration, Solution Performance Management, and Services Operations and Management. The transactions address mathematical foundations, security, privacy, agreement, contract, discovery, negotiation, collaboration, and quality of service for web services. It also covers areas like composite web service creation, business and scientific applications, standards, utility models, business process modeling, integration, collaboration, and more in the realm of Services Computing.