Luan Teylo, R. Brum, L. Arantes, Pierre Sens, Lúcia M. A. Drummond
{"title":"Developing Checkpointing and Recovery Procedures with the Storage Services of Amazon Web Services","authors":"Luan Teylo, R. Brum, L. Arantes, Pierre Sens, Lúcia M. A. Drummond","doi":"10.1145/3409390.3409407","DOIUrl":null,"url":null,"abstract":"In recent years, cloud computing has grown in popularity as they give users easy and almost instantaneous access to different computational resources. Some cloud providers, like Amazon, took advantage of the growing popularity and offered their VMs in some different hiring types: on-demand, reserved, and spot. The last type is usually offered at lower prices but can be terminated by the provider at any time. To deal with those failures, checkpoint and recovery procedures are typically used. In this context, we propose and analyze checkpoint and recovery procedures using three different storage services from Amazon: Amazon Simple Storage Service (S3), Amazon Elastic Block Store (EBS) and Amazon Elastic File System (EFS), considering spot VMs. These procedures were built upon the HADS framework, designed to schedule bag-of-tasks applications to spot and on-demand VMs. Our results showed that EBS outperformed the other approaches in terms of time spent on recording a checkpoint. But it required more time in the recovery procedure. EFS presented checkpointing and recovery times close to EBS but with higher monetary costs than the other services. S3 proved to be the best option in terms of monetary cost but required a longer time for recording a checkpoint, individually. However, when concurrent checkpoints were analysed, which can occur in a real application with lots of tasks, in our tests, S3 outperformed EFS in terms of execution time also.","PeriodicalId":350506,"journal":{"name":"Workshop Proceedings of the 49th International Conference on Parallel Processing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop Proceedings of the 49th International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3409390.3409407","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
In recent years, cloud computing has grown in popularity as they give users easy and almost instantaneous access to different computational resources. Some cloud providers, like Amazon, took advantage of the growing popularity and offered their VMs in some different hiring types: on-demand, reserved, and spot. The last type is usually offered at lower prices but can be terminated by the provider at any time. To deal with those failures, checkpoint and recovery procedures are typically used. In this context, we propose and analyze checkpoint and recovery procedures using three different storage services from Amazon: Amazon Simple Storage Service (S3), Amazon Elastic Block Store (EBS) and Amazon Elastic File System (EFS), considering spot VMs. These procedures were built upon the HADS framework, designed to schedule bag-of-tasks applications to spot and on-demand VMs. Our results showed that EBS outperformed the other approaches in terms of time spent on recording a checkpoint. But it required more time in the recovery procedure. EFS presented checkpointing and recovery times close to EBS but with higher monetary costs than the other services. S3 proved to be the best option in terms of monetary cost but required a longer time for recording a checkpoint, individually. However, when concurrent checkpoints were analysed, which can occur in a real application with lots of tasks, in our tests, S3 outperformed EFS in terms of execution time also.
近年来,云计算越来越受欢迎,因为它们使用户可以轻松且几乎即时地访问不同的计算资源。一些云提供商,如亚马逊,利用了日益流行的趋势,并提供了一些不同类型的虚拟机:按需、保留和现货。最后一种通常以较低的价格提供,但提供商可以随时终止。为了处理这些故障,通常使用检查点和恢复过程。在此背景下,我们使用Amazon的三种不同的存储服务:Amazon Simple storage Service (S3)、Amazon Elastic Block Store (EBS)和Amazon Elastic File System (EFS)提出并分析了检查点和恢复过程,并考虑了spot vm。这些过程构建在HADS框架之上,旨在将任务包应用程序调度到指定vm和按需vm。我们的结果表明,EBS在记录检查点所花费的时间方面优于其他方法。但在恢复过程中需要更多时间。EFS提供的检查点和恢复时间接近EBS,但比其他服务的货币成本更高。S3被证明是在金钱成本方面的最佳选择,但是单独记录检查点需要更长的时间。但是,当分析并发检查点时(这可能发生在具有许多任务的实际应用程序中),在我们的测试中,S3在执行时间方面也优于EFS。