Employment of Optimal Approximations on Apache Hadoop Checkpoint Technique for Performance Improvements

2017 IEEE International Conference on Software Architecture (ICSA) Pub Date : 2020-03-01 DOI:10.1109/ICSA47634.2020.00009

Paulo Vinicius Cardoso, R. Fazul, P. Barcelos

{"title":"Employment of Optimal Approximations on Apache Hadoop Checkpoint Technique for Performance Improvements","authors":"Paulo Vinicius Cardoso, R. Fazul, P. Barcelos","doi":"10.1109/ICSA47634.2020.00009","DOIUrl":null,"url":null,"abstract":"The Checkpoint and Recovery (CR) technique is widely used due to its fault tolerance efficiency. The Apache Hadoop framework uses this technique as a way to avoid failures in its distributed file system. However, determining the optimal interval between successive checkpoints is a challenge, mainly inside Hadoop as it does not allow real-time modifications. The Dynamic Configuration Architecture (DCA) was created to solve this issue by enabling changes in the checkpoint period without any interruption of the Hadoop services. This paper presents improvements for the DCA through the configuration of the Hadoop checkpoint period in real-time based on optimal period approximations that were already endorsed by the literature. The proposed improvement depends on the tracking of the system resources. The data collected from these resources are stored in a history of attributes: a tree of monitored elements where data is updated as new observations are experienced in the system. This feature enables the user to estimate system factors so that our solution computes the checkpoints costs and the mean time between failures (MTBF). For the validation, experiments with transient failure in the NameNode were created and the usage of the history of attributes was tested in different scenarios. The evaluation results show that an adaptive configuration of checkpoint periods reduces the wasted time caused by failures in the NameNode and improves Hadoop performance. Also, the history of attributes demonstrated its value by providing an efficient way to estimate the system factors.","PeriodicalId":6599,"journal":{"name":"2017 IEEE International Conference on Software Architecture (ICSA)","volume":"32 1","pages":"1-10"},"PeriodicalIF":0.0000,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Software Architecture (ICSA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSA47634.2020.00009","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

The Checkpoint and Recovery (CR) technique is widely used due to its fault tolerance efficiency. The Apache Hadoop framework uses this technique as a way to avoid failures in its distributed file system. However, determining the optimal interval between successive checkpoints is a challenge, mainly inside Hadoop as it does not allow real-time modifications. The Dynamic Configuration Architecture (DCA) was created to solve this issue by enabling changes in the checkpoint period without any interruption of the Hadoop services. This paper presents improvements for the DCA through the configuration of the Hadoop checkpoint period in real-time based on optimal period approximations that were already endorsed by the literature. The proposed improvement depends on the tracking of the system resources. The data collected from these resources are stored in a history of attributes: a tree of monitored elements where data is updated as new observations are experienced in the system. This feature enables the user to estimate system factors so that our solution computes the checkpoints costs and the mean time between failures (MTBF). For the validation, experiments with transient failure in the NameNode were created and the usage of the history of attributes was tested in different scenarios. The evaluation results show that an adaptive configuration of checkpoint periods reduces the wasted time caused by failures in the NameNode and improves Hadoop performance. Also, the history of attributes demonstrated its value by providing an efficient way to estimate the system factors.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用Apache Hadoop检查点技术的最优逼近来提高性能

检查点与恢复(CR)技术以其容错效率得到了广泛的应用。Apache Hadoop框架使用这种技术来避免分布式文件系统中的故障。然而，确定连续检查点之间的最佳间隔是一个挑战，主要是在Hadoop内部，因为它不允许实时修改。动态配置架构(Dynamic Configuration Architecture, DCA)就是为了解决这个问题而创建的，它允许在检查点期间进行更改，而不会中断Hadoop服务。本文通过基于文献已经认可的最优周期近似的实时配置Hadoop检查点周期，提出了对DCA的改进。提出的改进依赖于对系统资源的跟踪。从这些资源收集的数据存储在属性的历史记录中:一个被监视元素的树，其中的数据随着系统中出现新的观察结果而更新。此功能使用户能够估计系统因素，以便我们的解决方案计算检查点成本和平均故障间隔时间(MTBF)。为了验证，在NameNode中创建了瞬态故障实验，并在不同场景中测试了属性历史的使用情况。评估结果表明，自适应配置检查点周期减少了由于NameNode故障造成的浪费时间，提高了Hadoop的性能。此外，属性的历史通过提供一种有效的方法来估计系统因素，从而证明了它的价值。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2017 IEEE International Conference on Software Architecture (ICSA)

自引率

0.00%

发文量