Radio: Reconciling Disk I/O Interference in a Para-virtualized Cloud

Q1 Computer Science IEEE Cloud Computing Pub Date : 2022-07-01 DOI:10.1109/CLOUD55607.2022.00034
Guangwen Yang, Liana Wane, W. Xue
{"title":"Radio: Reconciling Disk I/O Interference in a Para-virtualized Cloud","authors":"Guangwen Yang, Liana Wane, W. Xue","doi":"10.1109/CLOUD55607.2022.00034","DOIUrl":null,"url":null,"abstract":"As more virtual machines (VMs) are consolidated in the cloud system, interference among VMs sharing underlying resources may occur more frequently than ever. In particular, certain VMs’ disk I/O performance gets impacted, leading to related cloud services being seriously compromised. Existing interference analysis approaches cannot guarantee desired results due to 1) lack of effective techniques for characterizing disk I/O interference and 2) considerable runtime overhead for determining interference and related culprits. To overcome these barriers, we present Radio, an end-to-end analysis tool for disk I/O interference diagnostics in a para-virtualized cloud. Radio quantifies the dynamic changes in I/O strength across virtual CPUs (vCPUs), constructs the performance repository to efficiently identify VMs’ abnormal behaviors, and then exploits interference heat maps and non-constant correlation approaches to infer the culprits of interference. With Radio's deployment at the National Supercomputing Center in Wuxi for more than 10 months, we demonstrate its effectiveness in real-world use cases on the cloud system with more than 300 VMs deployed. Radio can effectively analyze the interference issues within 20 seconds, incurring only 0.2% extra CPU overhead on the host machine. With this achievement, Radio has successfully assisted system administrators in reducing the daily incidence of interference from more than 65% to less than 10% and improving the overall disk throughput of the cloud system by more than 27.5%.","PeriodicalId":54281,"journal":{"name":"IEEE Cloud Computing","volume":"1 1","pages":"144-156"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Cloud Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLOUD55607.2022.00034","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 0

Abstract

As more virtual machines (VMs) are consolidated in the cloud system, interference among VMs sharing underlying resources may occur more frequently than ever. In particular, certain VMs’ disk I/O performance gets impacted, leading to related cloud services being seriously compromised. Existing interference analysis approaches cannot guarantee desired results due to 1) lack of effective techniques for characterizing disk I/O interference and 2) considerable runtime overhead for determining interference and related culprits. To overcome these barriers, we present Radio, an end-to-end analysis tool for disk I/O interference diagnostics in a para-virtualized cloud. Radio quantifies the dynamic changes in I/O strength across virtual CPUs (vCPUs), constructs the performance repository to efficiently identify VMs’ abnormal behaviors, and then exploits interference heat maps and non-constant correlation approaches to infer the culprits of interference. With Radio's deployment at the National Supercomputing Center in Wuxi for more than 10 months, we demonstrate its effectiveness in real-world use cases on the cloud system with more than 300 VMs deployed. Radio can effectively analyze the interference issues within 20 seconds, incurring only 0.2% extra CPU overhead on the host machine. With this achievement, Radio has successfully assisted system administrators in reducing the daily incidence of interference from more than 65% to less than 10% and improving the overall disk throughput of the cloud system by more than 27.5%.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
无线电:协调准虚拟化云中的磁盘I/O干扰
随着云系统中越来越多的虚拟机被整合,共享底层资源的虚拟机之间的相互干扰可能比以往任何时候都更加频繁。特别是影响部分虚拟机的磁盘I/O性能,导致相关云业务受到严重影响。现有的干扰分析方法不能保证预期的结果,因为1)缺乏表征磁盘I/O干扰的有效技术,2)确定干扰和相关罪魁祸首的运行时开销很大。为了克服这些障碍,我们提出了Radio,这是一种端到端分析工具,用于准虚拟化云中的磁盘I/O干扰诊断。无线电量化了虚拟cpu (vcpu)之间I/O强度的动态变化,构建了性能存储库来有效识别虚拟机的异常行为,然后利用干扰热图和非恒定相关方法来推断干扰的罪魁祸首。随着Radio在无锡国家超级计算中心的部署超过10个月,我们在部署了300多个虚拟机的云系统的实际用例中展示了它的有效性。无线电可以在20秒内有效地分析干扰问题,在主机上只产生0.2%的额外CPU开销。凭借这一成就,Radio成功地帮助系统管理员将日常干扰发生率从65%以上降低到10%以下,并将云系统的整体磁盘吞吐量提高了27.5%以上。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Cloud Computing
IEEE Cloud Computing Computer Science-Computer Networks and Communications
CiteScore
11.20
自引率
0.00%
发文量
0
期刊介绍: Cessation. IEEE Cloud Computing is committed to the timely publication of peer-reviewed articles that provide innovative research ideas, applications results, and case studies in all areas of cloud computing. Topics relating to novel theory, algorithms, performance analyses and applications of techniques are covered. More specifically: Cloud software, Cloud security, Trade-offs between privacy and utility of cloud, Cloud in the business environment, Cloud economics, Cloud governance, Migrating to the cloud, Cloud standards, Development tools, Backup and recovery, Interoperability, Applications management, Data analytics, Communications protocols, Mobile cloud, Private clouds, Liability issues for data loss on clouds, Data integration, Big data, Cloud education, Cloud skill sets, Cloud energy consumption, The architecture of cloud computing, Applications in commerce, education, and industry, Infrastructure as a Service (IaaS), Platform as a Service (PaaS), Software as a Service (SaaS), Business Process as a Service (BPaaS)
期刊最新文献
Different in different ways: A network-analysis approach to voice and prosody in Autism Spectrum Disorder. Layered Contention Mitigation for Cloud Storage Towards More Effective and Explainable Fault Management Using Cross-Layer Service Topology Bypass Container Overlay Networks with Transparent BPF-driven Socket Replacement Event-Driven Approach for Monitoring and Orchestration of Cloud and Edge-Enabled IoT Systems
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1