利用基于机器学习的预测模型优化云计算中的预复制实时虚拟机迁移

IF 3.3 3区计算机科学 Q2 COMPUTER SCIENCE, THEORY & METHODS Computing Pub Date : 2024-07-08 DOI:10.1007/s00607-024-01318-6

Raseena M. Haris, Mahmoud Barhamgi, Armstrong Nhlabatsi, Khaled M. Khan

{"title":"利用基于机器学习的预测模型优化云计算中的预复制实时虚拟机迁移","authors":"Raseena M. Haris, Mahmoud Barhamgi, Armstrong Nhlabatsi, Khaled M. Khan","doi":"10.1007/s00607-024-01318-6","DOIUrl":null,"url":null,"abstract":"<p>One of the preconditions for efficient cloud computing services is the continuous availability of services to clients. However, there are various reasons for temporary service unavailability due to routine maintenance, load balancing, cyber-attacks, power management, fault tolerance, emergency incident response, and resource usage. Live Virtual Machine Migration (LVM) is an option to address service unavailability by moving virtual machines between hosts without disrupting running services. Pre-copy memory migration is a common LVM approach used in cloud systems, but it faces challenges due to the high rate of frequently updated memory pages known as dirty pages. Transferring these dirty pages during pre-copy migration prolongs the overall migration time. If there are large numbers of remaining memory pages after a predefined iteration of page transfer, the stop-and-copy phase is initiated, which significantly increases downtime and negatively impacts service availability. To mitigate this issue, we introduce a prediction-based approach that optimizes the migration process by dynamically halting the iteration phase when the predicted downtime falls below a predefined threshold. Our proposed machine learning method was rigorously evaluated through experiments conducted on a dedicated testbed using KVM/QEMU technology, involving different VM sizes and memory-intensive workloads. A comparative analysis against proposed pre-copy methods and default migration approach reveals a remarkable improvement, with an average 64.91% reduction in downtime for different RAM configurations in high-write-intensive workloads, along with an average reduction in total migration time of approximately 85.81%. These findings underscore the practical advantages of our method in reducing service disruptions during live virtual machine migration in cloud systems.</p>","PeriodicalId":10718,"journal":{"name":"Computing","volume":"40 1","pages":""},"PeriodicalIF":3.3000,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Optimizing pre-copy live virtual machine migration in cloud computing using machine learning-based prediction model\",\"authors\":\"Raseena M. Haris, Mahmoud Barhamgi, Armstrong Nhlabatsi, Khaled M. Khan\",\"doi\":\"10.1007/s00607-024-01318-6\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>One of the preconditions for efficient cloud computing services is the continuous availability of services to clients. However, there are various reasons for temporary service unavailability due to routine maintenance, load balancing, cyber-attacks, power management, fault tolerance, emergency incident response, and resource usage. Live Virtual Machine Migration (LVM) is an option to address service unavailability by moving virtual machines between hosts without disrupting running services. Pre-copy memory migration is a common LVM approach used in cloud systems, but it faces challenges due to the high rate of frequently updated memory pages known as dirty pages. Transferring these dirty pages during pre-copy migration prolongs the overall migration time. If there are large numbers of remaining memory pages after a predefined iteration of page transfer, the stop-and-copy phase is initiated, which significantly increases downtime and negatively impacts service availability. To mitigate this issue, we introduce a prediction-based approach that optimizes the migration process by dynamically halting the iteration phase when the predicted downtime falls below a predefined threshold. Our proposed machine learning method was rigorously evaluated through experiments conducted on a dedicated testbed using KVM/QEMU technology, involving different VM sizes and memory-intensive workloads. A comparative analysis against proposed pre-copy methods and default migration approach reveals a remarkable improvement, with an average 64.91% reduction in downtime for different RAM configurations in high-write-intensive workloads, along with an average reduction in total migration time of approximately 85.81%. These findings underscore the practical advantages of our method in reducing service disruptions during live virtual machine migration in cloud systems.</p>\",\"PeriodicalId\":10718,\"journal\":{\"name\":\"Computing\",\"volume\":\"40 1\",\"pages\":\"\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2024-07-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s00607-024-01318-6\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00607-024-01318-6","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

摘要

高效云计算服务的先决条件之一是为客户提供持续可用的服务。然而，由于日常维护、负载平衡、网络攻击、电源管理、容错、紧急事件响应和资源使用等各种原因，导致服务暂时不可用。实时虚拟机迁移（LVM）是在不中断运行服务的情况下，通过在主机间移动虚拟机来解决服务不可用问题的一种选择。预复制内存迁移是云系统中常用的 LVM 方法，但由于频繁更新的内存页（称为脏页）的高更新率，这种方法面临着挑战。在预复制迁移过程中转移这些脏页会延长整体迁移时间。如果在预定的页面传输迭代后存在大量剩余内存页面，则会启动停止和复制阶段，这将大大增加停机时间，并对服务可用性产生负面影响。为缓解这一问题，我们引入了一种基于预测的方法，当预测的停机时间低于预定阈值时，该方法会动态停止迭代阶段，从而优化迁移过程。通过在使用 KVM/QEMU 技术的专用测试平台上进行实验，对我们提出的机器学习方法进行了严格评估，实验涉及不同的虚拟机规模和内存密集型工作负载。通过与建议的预复制方法和默认迁移方法进行比较分析，发现了显著的改进，在高写入密集型工作负载中，不同内存配置的停机时间平均减少了 64.91%，总迁移时间平均减少了约 85.81%。这些发现强调了我们的方法在减少云系统中实时虚拟机迁移期间服务中断方面的实际优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Optimizing pre-copy live virtual machine migration in cloud computing using machine learning-based prediction model

One of the preconditions for efficient cloud computing services is the continuous availability of services to clients. However, there are various reasons for temporary service unavailability due to routine maintenance, load balancing, cyber-attacks, power management, fault tolerance, emergency incident response, and resource usage. Live Virtual Machine Migration (LVM) is an option to address service unavailability by moving virtual machines between hosts without disrupting running services. Pre-copy memory migration is a common LVM approach used in cloud systems, but it faces challenges due to the high rate of frequently updated memory pages known as dirty pages. Transferring these dirty pages during pre-copy migration prolongs the overall migration time. If there are large numbers of remaining memory pages after a predefined iteration of page transfer, the stop-and-copy phase is initiated, which significantly increases downtime and negatively impacts service availability. To mitigate this issue, we introduce a prediction-based approach that optimizes the migration process by dynamically halting the iteration phase when the predicted downtime falls below a predefined threshold. Our proposed machine learning method was rigorously evaluated through experiments conducted on a dedicated testbed using KVM/QEMU technology, involving different VM sizes and memory-intensive workloads. A comparative analysis against proposed pre-copy methods and default migration approach reveals a remarkable improvement, with an average 64.91% reduction in downtime for different RAM configurations in high-write-intensive workloads, along with an average reduction in total migration time of approximately 85.81%. These findings underscore the practical advantages of our method in reducing service disruptions during live virtual machine migration in cloud systems.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computing 工程技术-计算机：理论方法

CiteScore

8.20

自引率

2.70%

发文量

107

审稿时长

3 months

期刊介绍： Computing publishes original papers, short communications and surveys on all fields of computing. The contributions should be written in English and may be of theoretical or applied nature, the essential criteria are computational relevance and systematic foundation of results.