OFP-TM: an online VM failure prediction and tolerance model towards high availability of cloud computing environments.

IF 2.7 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Journal of Supercomputing Pub Date : 2022-01-01 Epub Date: 2022-01-06 DOI:10.1007/s11227-021-04235-z

Deepika Saxena, Ashutosh Kumar Singh

{"title":"OFP-TM: an online VM failure prediction and tolerance model towards high availability of cloud computing environments.","authors":"Deepika Saxena, Ashutosh Kumar Singh","doi":"10.1007/s11227-021-04235-z","DOIUrl":null,"url":null,"abstract":"<p><p>The indispensable collaboration of cloud computing in every digital service has raised its resource usage exponentially. The ever-growing demand of cloud resources evades service availability leading to critical challenges such as cloud outages, SLA violation, and excessive power consumption. Previous approaches have addressed this problem by utilizing multiple cloud platforms or running multiple replicas of a Virtual Machine (VM) resulting into high operational cost. This paper has addressed this alarming problem from a different perspective by proposing a novel <math><mi>O</mi></math> nline virtual machine <math><mi>F</mi></math> ailure <math><mi>P</mi></math> rediction and <math><mi>T</mi></math> olerance <math><mi>M</mi></math> odel (OFP-TM) with high availability awareness embedded in physical machines as well as virtual machines. The failure-prone VMs are estimated in real-time based on their future resource usage by developing an ensemble approach-based resource predictor. These VMs are assigned to a failure tolerance unit comprising of a resource provision matrix and Selection Box (S-Box) mechanism which triggers the migration of failure-prone VMs and handle any outage beforehand while maintaining the desired level of availability for cloud users. The proposed model is evaluated and compared against existing related approaches by simulating cloud environment and executing several experiments using a real-world workload Google Cluster dataset. Consequently, it has been concluded that OFP-TM improves availability and scales down the number of live VM migrations up to 33.5% and 83.3%, respectively, over without OFP-TM.</p>","PeriodicalId":50034,"journal":{"name":"Journal of Supercomputing","volume":"78 6","pages":"8003-8024"},"PeriodicalIF":2.7000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8731188/pdf/","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Supercomputing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11227-021-04235-z","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/1/6 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 13

Abstract

The indispensable collaboration of cloud computing in every digital service has raised its resource usage exponentially. The ever-growing demand of cloud resources evades service availability leading to critical challenges such as cloud outages, SLA violation, and excessive power consumption. Previous approaches have addressed this problem by utilizing multiple cloud platforms or running multiple replicas of a Virtual Machine (VM) resulting into high operational cost. This paper has addressed this alarming problem from a different perspective by proposing a novel $O$ nline virtual machine $F$ ailure $P$ rediction and $T$ olerance $M$ odel (OFP-TM) with high availability awareness embedded in physical machines as well as virtual machines. The failure-prone VMs are estimated in real-time based on their future resource usage by developing an ensemble approach-based resource predictor. These VMs are assigned to a failure tolerance unit comprising of a resource provision matrix and Selection Box (S-Box) mechanism which triggers the migration of failure-prone VMs and handle any outage beforehand while maintaining the desired level of availability for cloud users. The proposed model is evaluated and compared against existing related approaches by simulating cloud environment and executing several experiments using a real-world workload Google Cluster dataset. Consequently, it has been concluded that OFP-TM improves availability and scales down the number of live VM migrations up to 33.5% and 83.3%, respectively, over without OFP-TM.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

OFP-TM:面向高可用性云计算环境的在线虚拟机故障预测和容错模型。

云计算在各种数字服务中不可或缺的协作使得其资源使用量呈指数级增长。不断增长的云资源需求回避了服务可用性，导致诸如云中断、SLA违反和过度功耗等关键挑战。以前的方法通过利用多个云平台或运行虚拟机(VM)的多个副本来解决此问题，从而导致高运营成本。本文从不同的角度解决了这一令人担忧的问题，提出了一种新颖的在线虚拟机故障预测和容错模型(OFP-TM)，该模型在物理机和虚拟机中嵌入了高可用性感知。通过开发基于集成方法的资源预测器，实时估计易故障虚拟机的未来资源使用情况。这些虚拟机被分配到一个容错单元，该单元由资源供应矩阵和选择框(S-Box)机制组成，该机制触发易发生故障的虚拟机的迁移，并提前处理任何中断，同时保持云用户所需的可用性水平。通过模拟云环境和使用真实工作负载的Google Cluster数据集执行几个实验，对所提出的模型进行了评估并与现有的相关方法进行了比较。因此，得出的结论是，与没有OFP-TM相比，OFP-TM提高了可用性，并将活动VM迁移的数量分别减少了33.5%和83.3%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Supercomputing 工程技术-工程：电子与电气

CiteScore

6.30

自引率

12.10%

发文量

734

审稿时长

13 months

期刊介绍： The Journal of Supercomputing publishes papers on the technology, architecture and systems, algorithms, languages and programs, performance measures and methods, and applications of all aspects of Supercomputing. Tutorial and survey papers are intended for workers and students in the fields associated with and employing advanced computer systems. The journal also publishes letters to the editor, especially in areas relating to policy, succinct statements of paradoxes, intuitively puzzling results, partial results and real needs. Published theoretical and practical papers are advanced, in-depth treatments describing new developments and new ideas. Each includes an introduction summarizing prior, directly pertinent work that is useful for the reader to understand, in order to appreciate the advances being described.