A Generic Framework for Testing Parallel File Systems

Jinrui Cao, Simeng Wang, Dong Dai, Mai Zheng, Yong Chen
{"title":"A Generic Framework for Testing Parallel File Systems","authors":"Jinrui Cao, Simeng Wang, Dong Dai, Mai Zheng, Yong Chen","doi":"10.1109/PDSW-DISCS.2016.12","DOIUrl":null,"url":null,"abstract":"Large-scale parallel file systems are of prime importance today. However, despite of the importance, their failure-recovery capability is much less studied compared with local storage systems. Recent studies on local storage systems have exposed various vulnerabilities that could lead to data loss under failure events, which raise the concern for parallel file systems built on top of them.This paper proposes a generic framework for testing the failure handling of large-scale parallel file systems. The framework captures all disk I/O commands on all storage nodes of the target system to emulate realistic failure states, and checks if the target system can recover to a consistent state without incurring data loss. We have built a prototype for the Lustre file system. Our preliminary results show that the framework is able to uncover the internal I/O behavior of Lustre under different workloads and failure conditions, which provides a solid foundation for further analyzing the failure recovery of parallel file systems.","PeriodicalId":375550,"journal":{"name":"2016 1st Joint International Workshop on Parallel Data Storage and data Intensive Scalable Computing Systems (PDSW-DISCS)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 1st Joint International Workshop on Parallel Data Storage and data Intensive Scalable Computing Systems (PDSW-DISCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDSW-DISCS.2016.12","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13

Abstract

Large-scale parallel file systems are of prime importance today. However, despite of the importance, their failure-recovery capability is much less studied compared with local storage systems. Recent studies on local storage systems have exposed various vulnerabilities that could lead to data loss under failure events, which raise the concern for parallel file systems built on top of them.This paper proposes a generic framework for testing the failure handling of large-scale parallel file systems. The framework captures all disk I/O commands on all storage nodes of the target system to emulate realistic failure states, and checks if the target system can recover to a consistent state without incurring data loss. We have built a prototype for the Lustre file system. Our preliminary results show that the framework is able to uncover the internal I/O behavior of Lustre under different workloads and failure conditions, which provides a solid foundation for further analyzing the failure recovery of parallel file systems.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
测试并行文件系统的通用框架
大规模并行文件系统在今天是非常重要的。然而,尽管它们很重要,但与本地存储系统相比,它们的故障恢复能力研究得很少。最近对本地存储系统的研究暴露了在故障事件下可能导致数据丢失的各种漏洞,这引起了人们对构建在其上的并行文件系统的关注。本文提出了一个测试大规模并行文件系统故障处理的通用框架。该框架捕获目标系统所有存储节点上的所有磁盘I/O命令,以模拟实际的故障状态,并检查目标系统是否可以在不导致数据丢失的情况下恢复到一致状态。我们已经为Lustre文件系统构建了一个原型。初步结果表明,该框架能够揭示Lustre在不同工作负载和故障条件下的内部I/O行为,为进一步分析并行文件系统的故障恢复提供了坚实的基础。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Klimatic: A Virtual Data Lake for Harvesting and Distribution of Geospatial Data Towards Energy Efficient Data Management in HPC: The Open Ethernet Drive Approach FatMan vs. LittleBoy: Scaling Up Linear Algebraic Operations in Scale-Out Data Platforms A Bloom Filter Based Scalable Data Integrity Check Tool for Large-Scale Dataset Can Non-volatile Memory Benefit MapReduce Applications on HPC Clusters?
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1