EDRFS:适用于小文件和数据密集型应用的高效分布式复制文件系统

Bin Cai, C. Xie, Guangxi Zhu
{"title":"EDRFS:适用于小文件和数据密集型应用的高效分布式复制文件系统","authors":"Bin Cai, C. Xie, Guangxi Zhu","doi":"10.1109/COMSWA.2007.382422","DOIUrl":null,"url":null,"abstract":"With the system scale keeping grown, the key challenge is to mask the failures that arise among the system components and to improve the performance of data-intensive applications. This paper designs and implements a cluster-based distributed replication file system EDRFS to meet these critical demands. EDRFS works with a single metadata server and multiple storage nodes, deploys whole-file replication scheme at the file level, and tracks what storage node a file is replicated on. We use a linear hash algorithm to evenly distribute data and load across multiple storage nodes so as to achieve balancing workload and incremental scalability of throughput and storage capacity as the system scale grows. In addition, we employ metadata caches and file data caches in clients to enhance system performance. Furthermore, we deploy a concurrency lock scheme to avoid namespace operation bottleneck and a replicas consistency method to keep a consistent mutation order among replicas of a file. We provide the initial experimental evaluations of our prototypical system on a small-file and data-intensive workload.","PeriodicalId":191295,"journal":{"name":"2007 2nd International Conference on Communication Systems Software and Middleware","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"EDRFS: An Effective Distributed Replication File System for Small-File and Data-Intensive Application\",\"authors\":\"Bin Cai, C. Xie, Guangxi Zhu\",\"doi\":\"10.1109/COMSWA.2007.382422\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the system scale keeping grown, the key challenge is to mask the failures that arise among the system components and to improve the performance of data-intensive applications. This paper designs and implements a cluster-based distributed replication file system EDRFS to meet these critical demands. EDRFS works with a single metadata server and multiple storage nodes, deploys whole-file replication scheme at the file level, and tracks what storage node a file is replicated on. We use a linear hash algorithm to evenly distribute data and load across multiple storage nodes so as to achieve balancing workload and incremental scalability of throughput and storage capacity as the system scale grows. In addition, we employ metadata caches and file data caches in clients to enhance system performance. Furthermore, we deploy a concurrency lock scheme to avoid namespace operation bottleneck and a replicas consistency method to keep a consistent mutation order among replicas of a file. We provide the initial experimental evaluations of our prototypical system on a small-file and data-intensive workload.\",\"PeriodicalId\":191295,\"journal\":{\"name\":\"2007 2nd International Conference on Communication Systems Software and Middleware\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-07-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2007 2nd International Conference on Communication Systems Software and Middleware\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/COMSWA.2007.382422\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 2nd International Conference on Communication Systems Software and Middleware","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMSWA.2007.382422","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

摘要

随着系统规模的不断扩大,关键的挑战是掩盖系统组件之间出现的故障,并提高数据密集型应用程序的性能。为了满足这些需求,本文设计并实现了一个基于集群的分布式复制文件系统EDRFS。EDRFS支持单个元数据服务器和多个存储节点,在文件级部署全文件复制方案,并跟踪文件被复制到哪个存储节点。我们使用线性哈希算法将数据和负载均匀分布在多个存储节点上,从而实现负载均衡,并随着系统规模的增长实现吞吐量和存储容量的增量可扩展性。此外,我们在客户端使用元数据缓存和文件数据缓存来提高系统性能。此外,我们部署了并发锁方案以避免命名空间操作瓶颈,并部署了副本一致性方法以保持文件副本之间的一致突变顺序。我们在一个小文件和数据密集型工作负载上对我们的原型系统进行了初步的实验评估。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
EDRFS: An Effective Distributed Replication File System for Small-File and Data-Intensive Application
With the system scale keeping grown, the key challenge is to mask the failures that arise among the system components and to improve the performance of data-intensive applications. This paper designs and implements a cluster-based distributed replication file system EDRFS to meet these critical demands. EDRFS works with a single metadata server and multiple storage nodes, deploys whole-file replication scheme at the file level, and tracks what storage node a file is replicated on. We use a linear hash algorithm to evenly distribute data and load across multiple storage nodes so as to achieve balancing workload and incremental scalability of throughput and storage capacity as the system scale grows. In addition, we employ metadata caches and file data caches in clients to enhance system performance. Furthermore, we deploy a concurrency lock scheme to avoid namespace operation bottleneck and a replicas consistency method to keep a consistent mutation order among replicas of a file. We provide the initial experimental evaluations of our prototypical system on a small-file and data-intensive workload.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Fast and Efficient Authentication Protocol for a Seamless Handover between a WLAN and WiBro On Utilizing Directional Antenna in 802.11 Networks: Deafness Study An Architecture and a Programming Interface for Application-Aware Data Dissemination Using Overlay Networks An Efficient Management Method of Access Policies for Hierarchical Virtual Private Networks Real-time End-to-end Network Monitoring in Large Distributed Systems
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1