FaultyRank: A Graph-based Parallel File System Checker

Saisha Kamat, Abdullah Al Raqibul Islam, Mai Zheng, Dong Dai
{"title":"FaultyRank: A Graph-based Parallel File System Checker","authors":"Saisha Kamat, Abdullah Al Raqibul Islam, Mai Zheng, Dong Dai","doi":"10.1109/IPDPS54959.2023.00029","DOIUrl":null,"url":null,"abstract":"Similar to local file system checkers such as e2fsck for Ext4, a parallel file system (PFS) checker ensures the file system's correctness. The basic idea of file system checkers is straightforward: important metadata are stored redundantly in separate places for cross-checking; inconsistent metadata will be repaired or overwritten by its ‘more correct' counterpart, which is defined by the developers. Unfortunately, implementing the idea for PFSes is non-trivial due to the system complexity. Although many popular parallel file systems already contain dedicated checkers (e.g., LFSCK for Lustre, BeeGFS-FSCK for BeeGFS, mmfsck for GPFS), the existing checkers often cannot detect or repair inconsistencies accurately due to one fundamental limitation: they rely on a fixed set of consistency rules predefined by developers, which cannot cover the various failure scenarios that may occur in practice.In this study, we propose a new graph-based method to build PFS checkers. Specifically, we model important PFS metadata into graphs, then generalize the logic of cross-checking and repairing into graph analytic tasks. We design a new graph algorithm, FaultyRank, to quantitatively calculate the correctness of each metadata object. By leveraging the calculated correctness, we are able to recommend the most promising repairs to users. Based on the idea, we implement a prototype of FaultyRank on Lustre, one of the most widely used parallel file systems, and compare it with Lustre's default file system checker LFSCK. Our experiments show that FaultyRank can achieve the same checking and repairing logic as LFSCK. Moreover, it is capable of detecting and repairing complicated PFS consistency issues that LFSCK can not handle. We also show the performance advantage of FaultyRank compared with LFSCK. Through this study, we believe FaultyRank opens a new opportunity for building PFS checkers effectively and efficiently.","PeriodicalId":343684,"journal":{"name":"2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS54959.2023.00029","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Similar to local file system checkers such as e2fsck for Ext4, a parallel file system (PFS) checker ensures the file system's correctness. The basic idea of file system checkers is straightforward: important metadata are stored redundantly in separate places for cross-checking; inconsistent metadata will be repaired or overwritten by its ‘more correct' counterpart, which is defined by the developers. Unfortunately, implementing the idea for PFSes is non-trivial due to the system complexity. Although many popular parallel file systems already contain dedicated checkers (e.g., LFSCK for Lustre, BeeGFS-FSCK for BeeGFS, mmfsck for GPFS), the existing checkers often cannot detect or repair inconsistencies accurately due to one fundamental limitation: they rely on a fixed set of consistency rules predefined by developers, which cannot cover the various failure scenarios that may occur in practice.In this study, we propose a new graph-based method to build PFS checkers. Specifically, we model important PFS metadata into graphs, then generalize the logic of cross-checking and repairing into graph analytic tasks. We design a new graph algorithm, FaultyRank, to quantitatively calculate the correctness of each metadata object. By leveraging the calculated correctness, we are able to recommend the most promising repairs to users. Based on the idea, we implement a prototype of FaultyRank on Lustre, one of the most widely used parallel file systems, and compare it with Lustre's default file system checker LFSCK. Our experiments show that FaultyRank can achieve the same checking and repairing logic as LFSCK. Moreover, it is capable of detecting and repairing complicated PFS consistency issues that LFSCK can not handle. We also show the performance advantage of FaultyRank compared with LFSCK. Through this study, we believe FaultyRank opens a new opportunity for building PFS checkers effectively and efficiently.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
FaultyRank:一个基于图的并行文件系统检查器
与Ext4的e2fsck等本地文件系统检查器类似,并行文件系统(PFS)检查器可确保文件系统的正确性。文件系统检查器的基本思想很简单:重要的元数据被冗余地存储在不同的地方进行交叉检查;不一致的元数据将由其“更正确”的对应物(由开发人员定义)修复或覆盖。不幸的是,由于系统的复杂性,为pfse实现这个想法并不是微不足道的。尽管许多流行的并行文件系统已经包含了专用的检查器(例如,Lustre的LFSCK, BeeGFS的BeeGFS- fsck, GPFS的mmfsck),但是由于一个基本的限制,现有的检查器通常不能准确地检测或修复不一致性:它们依赖于开发人员预定义的一组固定的一致性规则,这些规则不能涵盖实践中可能发生的各种故障场景。在这项研究中,我们提出了一种新的基于图的方法来构建PFS检查器。具体来说,我们将重要的PFS元数据建模为图,然后将交叉检查和修复的逻辑推广到图分析任务中。我们设计了一种新的图算法FaultyRank来定量计算每个元数据对象的正确性。通过利用计算的正确性,我们能够向用户推荐最有希望的维修。基于这一思想,我们在Lustre上实现了一个FaultyRank的原型,并与Lustre的默认文件系统检查器LFSCK进行了比较。实验表明,FaultyRank可以实现与LFSCK相同的检测和修复逻辑。此外,它还能够检测和修复LFSCK无法处理的复杂PFS一致性问题。我们还展示了FaultyRank与LFSCK相比的性能优势。通过这项研究,我们相信FaultyRank为有效和高效地构建PFS检查器提供了新的机会。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
GPU-Accelerated Error-Bounded Compression Framework for Quantum Circuit Simulations Generalizable Reinforcement Learning-Based Coarsening Model for Resource Allocation over Large and Diverse Stream Processing Graphs Smart Redbelly Blockchain: Reducing Congestion for Web3 QoS-Aware and Cost-Efficient Dynamic Resource Allocation for Serverless ML Workflows Fast Sparse GPU Kernels for Accelerated Training of Graph Neural Networks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1