DeepScaling: Autoscaling Microservices With Stable CPU Utilization for Large Scale Production Cloud Systems

IF 3.6 3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE IEEE/ACM Transactions on Networking Pub Date : 2024-03-31 DOI:10.1109/TNET.2024.3400953
Ziliang Wang;Shiyi Zhu;Jianguo Li;Wei Jiang;K. K. Ramakrishnan;Meng Yan;Xiaohong Zhang;Alex X. Liu
{"title":"DeepScaling: Autoscaling Microservices With Stable CPU Utilization for Large Scale Production Cloud Systems","authors":"Ziliang Wang;Shiyi Zhu;Jianguo Li;Wei Jiang;K. K. Ramakrishnan;Meng Yan;Xiaohong Zhang;Alex X. Liu","doi":"10.1109/TNET.2024.3400953","DOIUrl":null,"url":null,"abstract":"Cloud service providers often provision excessive resources to meet the desired Service Level Objectives (SLOs), by setting lower CPU utilization targets. This can result in a waste of resources and a noticeable increase in power consumption in large-scale cloud deployments. To address this issue, this paper presents DeepScaling, an innovative solution for minimizing resource cost while ensuring SLO requirements are met in a dynamic, large-scale production microservice-based system. We propose DeepScaling, which introduces three innovative components to adaptively refine the target CPU utilization of servers in the data center, and we maintain it at a stable value to meet SLO constraints while using minimum amount of system resources. First, DeepScaling forecasts workloads for each service using a Spatio-temporal Graph Neural Network. Secondly, it estimates CPU utilization with a Deep Neural Network, considering factors such as periodic tasks and traffic. Finally, it uses a modified Deep Q-Network (DQN) to generate an autoscaling policy that controls service resources to maximize service stability while meeting SLOs. Evaluation of DeepScaling in Ant Group’s large-scale cloud environment shows that it outperforms state-of-the-art autoscaling approaches in terms of maintaining stable performance and resource savings. The deployment of DeepScaling in the real-world environment of 1900+ microservices saves the provisioning of over 100,000 CPU cores per day, on average.","PeriodicalId":13443,"journal":{"name":"IEEE/ACM Transactions on Networking","volume":"32 5","pages":"3961-3976"},"PeriodicalIF":3.6000,"publicationDate":"2024-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE/ACM Transactions on Networking","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10542703/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

Abstract

Cloud service providers often provision excessive resources to meet the desired Service Level Objectives (SLOs), by setting lower CPU utilization targets. This can result in a waste of resources and a noticeable increase in power consumption in large-scale cloud deployments. To address this issue, this paper presents DeepScaling, an innovative solution for minimizing resource cost while ensuring SLO requirements are met in a dynamic, large-scale production microservice-based system. We propose DeepScaling, which introduces three innovative components to adaptively refine the target CPU utilization of servers in the data center, and we maintain it at a stable value to meet SLO constraints while using minimum amount of system resources. First, DeepScaling forecasts workloads for each service using a Spatio-temporal Graph Neural Network. Secondly, it estimates CPU utilization with a Deep Neural Network, considering factors such as periodic tasks and traffic. Finally, it uses a modified Deep Q-Network (DQN) to generate an autoscaling policy that controls service resources to maximize service stability while meeting SLOs. Evaluation of DeepScaling in Ant Group’s large-scale cloud environment shows that it outperforms state-of-the-art autoscaling approaches in terms of maintaining stable performance and resource savings. The deployment of DeepScaling in the real-world environment of 1900+ microservices saves the provisioning of over 100,000 CPU cores per day, on average.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
DeepScaling:为大规模生产云系统自动扩展具有稳定 CPU 利用率的微服务
云服务提供商通常会通过设定较低的 CPU 利用率目标来提供过多的资源,以满足所需的服务级别目标 (SLO)。这会造成资源浪费,并明显增加大规模云部署的功耗。为解决这一问题,本文提出了 DeepScaling,这是一种创新的解决方案,可最大限度地降低资源成本,同时确保在基于微服务的动态、大规模生产系统中满足 SLO 要求。我们提出的 DeepScaling 引入了三个创新组件,用于自适应地改进数据中心服务器的目标 CPU 利用率,并将其保持在一个稳定值,以满足 SLO 约束,同时使用最少的系统资源。首先,DeepScaling 利用时空图神经网络预测每个服务的工作负载。其次,它使用深度神经网络估算 CPU 利用率,同时考虑周期性任务和流量等因素。最后,它使用修改后的深度 Q 网络(DQN)生成自动缩放策略,控制服务资源,在满足 SLO 的同时最大限度地提高服务稳定性。在蚂蚁金服集团的大规模云环境中对DeepScaling的评估表明,它在保持性能稳定和节省资源方面优于最先进的自动缩放方法。在包含 1900 多个微服务的实际环境中部署 DeepScaling,平均每天可节省超过 10 万个 CPU 内核的调配。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE/ACM Transactions on Networking
IEEE/ACM Transactions on Networking 工程技术-电信学
CiteScore
8.20
自引率
5.40%
发文量
246
审稿时长
4-8 weeks
期刊介绍: The IEEE/ACM Transactions on Networking’s high-level objective is to publish high-quality, original research results derived from theoretical or experimental exploration of the area of communication/computer networking, covering all sorts of information transport networks over all sorts of physical layer technologies, both wireline (all kinds of guided media: e.g., copper, optical) and wireless (e.g., radio-frequency, acoustic (e.g., underwater), infra-red), or hybrids of these. The journal welcomes applied contributions reporting on novel experiences and experiments with actual systems.
期刊最新文献
Table of Contents IEEE/ACM Transactions on Networking Information for Authors IEEE/ACM Transactions on Networking Society Information IEEE/ACM Transactions on Networking Publication Information FPCA: Parasitic Coding Authentication for UAVs by FM Signals
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1