Improved read performance in a cost-effective, fault-tolerant parallel virtual file system (CEFT-PVFS)

Yifeng Zhu, Hong Jiang, X. Qin, D. Feng, D. Swanson
{"title":"Improved read performance in a cost-effective, fault-tolerant parallel virtual file system (CEFT-PVFS)","authors":"Yifeng Zhu, Hong Jiang, X. Qin, D. Feng, D. Swanson","doi":"10.1109/CCGRID.2003.1199440","DOIUrl":null,"url":null,"abstract":"Due to the ever-widening performance gap between processors and disks, I/O operations tend to become the major performance bottleneck of data-intensive applications on modern clusters. If all the existing disks on the nodes of a cluster are connected together to establish high performance parallel storage systems, the cluster's overall performance can be boosted at no additional cost. CEFT-PVFS (a RAID 10 style parallel file system that extends the original PVFS), as one such system, divides the cluster nodes into two groups, stripes the data across one group in a round-robin fashion, and then duplicates the same data to the other group to provide storage service of high performance and high reliability. Previous research has shown that the system reliability is improved by a factor of more than 40 with mirroring while maintaining a comparable write performance. This paper presents another benefit of CEFT-PVFS in which the aggregate peak read performance can be improved by as much as 100% over that of the original PVFS by exploiting the increased parallelism. Additionally, when the data servers, which typically are also computational nodes in a cluster environment, are loaded in an unbalanced way by applications running in the cluster, the read performance of PVFS will be degraded significantly. On the contrary, in the CEFT-PVFS, a heavily loaded data server can be skipped and all the desired data is read from its mirroring node. Thus the performance will not be affected unless both the server node and its mirroring node are heavily loaded.","PeriodicalId":433323,"journal":{"name":"CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings.","volume":"56 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"30","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGRID.2003.1199440","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 30

Abstract

Due to the ever-widening performance gap between processors and disks, I/O operations tend to become the major performance bottleneck of data-intensive applications on modern clusters. If all the existing disks on the nodes of a cluster are connected together to establish high performance parallel storage systems, the cluster's overall performance can be boosted at no additional cost. CEFT-PVFS (a RAID 10 style parallel file system that extends the original PVFS), as one such system, divides the cluster nodes into two groups, stripes the data across one group in a round-robin fashion, and then duplicates the same data to the other group to provide storage service of high performance and high reliability. Previous research has shown that the system reliability is improved by a factor of more than 40 with mirroring while maintaining a comparable write performance. This paper presents another benefit of CEFT-PVFS in which the aggregate peak read performance can be improved by as much as 100% over that of the original PVFS by exploiting the increased parallelism. Additionally, when the data servers, which typically are also computational nodes in a cluster environment, are loaded in an unbalanced way by applications running in the cluster, the read performance of PVFS will be degraded significantly. On the contrary, in the CEFT-PVFS, a heavily loaded data server can be skipped and all the desired data is read from its mirroring node. Thus the performance will not be affected unless both the server node and its mirroring node are heavily loaded.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
在经济高效、容错的并行虚拟文件系统(CEFT-PVFS)中提高了读性能
由于处理器和磁盘之间的性能差距越来越大,I/O操作往往成为现代集群上数据密集型应用程序的主要性能瓶颈。如果将集群节点上现有的所有磁盘连接在一起,建立高性能的并行存储系统,则可以在不增加成本的情况下提高集群的整体性能。CEFT-PVFS(一种扩展了原有PVFS的RAID 10风格的并行文件系统)就是这样一个系统,它将集群节点分成两组,以轮询的方式在一组中分条,然后将相同的数据复制到另一组中,提供高性能、高可靠性的存储服务。以前的研究表明,在保持相当的写性能的同时,镜像系统的可靠性提高了40倍以上。本文介绍了CEFT-PVFS的另一个优点,其中通过利用增加的并行性,聚合峰值读取性能可以比原始PVFS提高多达100%。此外,当数据服务器(通常也是集群环境中的计算节点)被集群中运行的应用程序以不平衡的方式加载时,PVFS的读取性能将显著降低。相反,在CEFT-PVFS中,可以跳过负载沉重的数据服务器,并从其镜像节点读取所需的所有数据。因此,除非服务器节点及其镜像节点都负载过重,否则性能不会受到影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
An overlay-network approach for distributed access to SRS Large scale dissemination using a peer-to-peer network MPICH/MADIII : a cluster of clusters enabled MPI implementation DKS(N, k, f): a family of low communication, scalable and fault-tolerant infrastructures for P2P applications Fault-tolerant distributed mass storage for LHC computing
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1