从备份到热备:HDFS的高可用性

Andrew Oriani, Islene C. Garcia
{"title":"从备份到热备:HDFS的高可用性","authors":"Andrew Oriani, Islene C. Garcia","doi":"10.1109/SRDS.2012.33","DOIUrl":null,"url":null,"abstract":"Cluster-based distributed file systems generally have a single master to service clients and manage the namespace. Although simple and efficient, that design compromises availability, because the failure of the master takes the entire system down. Before version 2.0.0-alpha, the Hadoop Distributed File System (HDFS) -- an open-source storage, widely used by applications that operate over large datasets, such as MapReduce, and for which an uptime of 24x7 is becoming essential -- was an example of such systems. Given that scenario, this paper proposes a hot standby for the master of HDFS achieved by (i) extending the master's state replication performed by its check pointer helper, the Backup Node, and by (ii) introducing an automatic fail over mechanism. The step (i) took advantage of the message duplication technique developed by other high availability solution for HDFS named Avatar Nodes. The step (ii) employed another Hadoop software: ZooKeeper, a distributed coordination service. That approach resulted in small code changes, 1373 lines, not requiring external components to the Hadoop project. Thus, easing the maintenance and deployment of the file system. Compared to HDFS 0.21, tests showed that both in loads dominated by metadata operations or I/O operations, the reduction of data throughput is no more than 15% on average, and the time to switch the hot standby to active is less than 100 ms. Those results demonstrate the applicability of our solution to real systems. We also present related work on high availability for other file systems and HDFS, including the official solution, recently included in HDFS 2.0.0-alpha.","PeriodicalId":447700,"journal":{"name":"2012 IEEE 31st Symposium on Reliable Distributed Systems","volume":"50 4","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"31","resultStr":"{\"title\":\"From Backup to Hot Standby: High Availability for HDFS\",\"authors\":\"Andrew Oriani, Islene C. Garcia\",\"doi\":\"10.1109/SRDS.2012.33\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Cluster-based distributed file systems generally have a single master to service clients and manage the namespace. Although simple and efficient, that design compromises availability, because the failure of the master takes the entire system down. Before version 2.0.0-alpha, the Hadoop Distributed File System (HDFS) -- an open-source storage, widely used by applications that operate over large datasets, such as MapReduce, and for which an uptime of 24x7 is becoming essential -- was an example of such systems. Given that scenario, this paper proposes a hot standby for the master of HDFS achieved by (i) extending the master's state replication performed by its check pointer helper, the Backup Node, and by (ii) introducing an automatic fail over mechanism. The step (i) took advantage of the message duplication technique developed by other high availability solution for HDFS named Avatar Nodes. The step (ii) employed another Hadoop software: ZooKeeper, a distributed coordination service. That approach resulted in small code changes, 1373 lines, not requiring external components to the Hadoop project. Thus, easing the maintenance and deployment of the file system. Compared to HDFS 0.21, tests showed that both in loads dominated by metadata operations or I/O operations, the reduction of data throughput is no more than 15% on average, and the time to switch the hot standby to active is less than 100 ms. Those results demonstrate the applicability of our solution to real systems. We also present related work on high availability for other file systems and HDFS, including the official solution, recently included in HDFS 2.0.0-alpha.\",\"PeriodicalId\":447700,\"journal\":{\"name\":\"2012 IEEE 31st Symposium on Reliable Distributed Systems\",\"volume\":\"50 4\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-10-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"31\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 IEEE 31st Symposium on Reliable Distributed Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SRDS.2012.33\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE 31st Symposium on Reliable Distributed Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SRDS.2012.33","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 31

摘要

基于集群的分布式文件系统通常有一个主服务器来服务客户机和管理名称空间。尽管这种设计简单而有效,但它会损害可用性,因为主服务器的故障会使整个系统瘫痪。在2.0.0-alpha版本之前,Hadoop分布式文件系统(HDFS)就是此类系统的一个例子。HDFS是一种开源存储,广泛用于运行大型数据集的应用程序,如MapReduce,并且24x7的正常运行时间变得至关重要。在这种情况下,本文提出了HDFS主节点的热备,通过以下方式实现:(i)扩展主节点的状态复制,由其检查指针助手Backup Node执行,以及(ii)引入自动故障转移机制。步骤(i)利用了其他高可用性解决方案开发的消息复制技术,名为Avatar Nodes。步骤(ii)使用了另一个Hadoop软件:ZooKeeper,一个分布式协调服务。这种方法只对代码进行了很小的修改,只有1373行,不需要Hadoop项目的外部组件。从而简化了文件系统的维护和部署。与HDFS 0.21相比,测试表明,无论是元数据操作为主的负载还是I/O操作为主的负载,数据吞吐量的平均下降幅度都不超过15%,双机热备切换到主用的时间都在100ms以内。这些结果证明了我们的解决方案在实际系统中的适用性。我们还介绍了其他文件系统和HDFS的高可用性相关工作,包括最近包含在HDFS 2.0.0-alpha中的官方解决方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
From Backup to Hot Standby: High Availability for HDFS
Cluster-based distributed file systems generally have a single master to service clients and manage the namespace. Although simple and efficient, that design compromises availability, because the failure of the master takes the entire system down. Before version 2.0.0-alpha, the Hadoop Distributed File System (HDFS) -- an open-source storage, widely used by applications that operate over large datasets, such as MapReduce, and for which an uptime of 24x7 is becoming essential -- was an example of such systems. Given that scenario, this paper proposes a hot standby for the master of HDFS achieved by (i) extending the master's state replication performed by its check pointer helper, the Backup Node, and by (ii) introducing an automatic fail over mechanism. The step (i) took advantage of the message duplication technique developed by other high availability solution for HDFS named Avatar Nodes. The step (ii) employed another Hadoop software: ZooKeeper, a distributed coordination service. That approach resulted in small code changes, 1373 lines, not requiring external components to the Hadoop project. Thus, easing the maintenance and deployment of the file system. Compared to HDFS 0.21, tests showed that both in loads dominated by metadata operations or I/O operations, the reduction of data throughput is no more than 15% on average, and the time to switch the hot standby to active is less than 100 ms. Those results demonstrate the applicability of our solution to real systems. We also present related work on high availability for other file systems and HDFS, including the official solution, recently included in HDFS 2.0.0-alpha.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Towards Identifying Root Causes of Faults in Service-Based Applications Query Plan Execution in a Heterogeneous Stream Management System for Situational Awareness Towards Reliable Communication in Intelligent Transportation Systems RADAR: Adaptive Rate Allocation in Distributed Stream Processing Systems under Bursty Workloads Availability Modeling and Analysis for Data Backup and Restore Operations
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1