Enabling Practical Transparent Checkpointing for MPI: A Topological Sort Approach

Yao Xu, Gene Cooperman
{"title":"Enabling Practical Transparent Checkpointing for MPI: A Topological Sort Approach","authors":"Yao Xu, Gene Cooperman","doi":"arxiv-2408.02218","DOIUrl":null,"url":null,"abstract":"MPI is the de facto standard for parallel computing on a cluster of\ncomputers. Checkpointing is an important component in any strategy for software\nresilience and for long-running jobs that must be executed by chaining together\ntime-bounded resource allocations. This work solves an old problem: a practical\nand general algorithm for transparent checkpointing of MPI that is both\nefficient and compatible with most of the latest network software. Transparent\ncheckpointing is attractive due to its generality and ease of use for most MPI\napplication developers. Earlier efforts at transparent checkpointing for MPI,\none decade ago, had two difficult problems: (i) by relying on a specific MPI\nimplementation tied to a specific network technology; and (ii) by failing to\ndemonstrate sufficiently low runtime overhead. Problem (i) (network dependence) was already solved in 2019 by MANA's\nintroduction of split processes. Problem (ii) (efficient runtime overhead) is\nsolved in this work. This paper introduces an approach that avoids these\nlimitations, employing a novel topological sort to algorithmically determine a\nsafe future synchronization point. The algorithm is valid for both blocking and\nnon-blocking collective communication in MPI. We demonstrate the efficacy and\nscalability of our approach through both micro-benchmarks and a set of five\nreal-world MPI applications, notably including the widely used VASP (Vienna Ab\nInitio Simulation Package), which is responsible for 11% of the workload on the\nPerlmutter supercomputer at Lawrence Berkley National Laboratory. VASP was\npreviously cited as a special challenge for checkpointing, in part due to its\nmulti-algorithm codes.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"58 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Distributed, Parallel, and Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.02218","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

MPI is the de facto standard for parallel computing on a cluster of computers. Checkpointing is an important component in any strategy for software resilience and for long-running jobs that must be executed by chaining together time-bounded resource allocations. This work solves an old problem: a practical and general algorithm for transparent checkpointing of MPI that is both efficient and compatible with most of the latest network software. Transparent checkpointing is attractive due to its generality and ease of use for most MPI application developers. Earlier efforts at transparent checkpointing for MPI, one decade ago, had two difficult problems: (i) by relying on a specific MPI implementation tied to a specific network technology; and (ii) by failing to demonstrate sufficiently low runtime overhead. Problem (i) (network dependence) was already solved in 2019 by MANA's introduction of split processes. Problem (ii) (efficient runtime overhead) is solved in this work. This paper introduces an approach that avoids these limitations, employing a novel topological sort to algorithmically determine a safe future synchronization point. The algorithm is valid for both blocking and non-blocking collective communication in MPI. We demonstrate the efficacy and scalability of our approach through both micro-benchmarks and a set of five real-world MPI applications, notably including the widely used VASP (Vienna Ab Initio Simulation Package), which is responsible for 11% of the workload on the Perlmutter supercomputer at Lawrence Berkley National Laboratory. VASP was previously cited as a special challenge for checkpointing, in part due to its multi-algorithm codes.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
为 MPI 启用实用的透明检查点:拓扑排序方法
MPI 是计算机集群上并行计算的事实标准。检查点是任何软件弹性策略的重要组成部分,也是必须通过有时间限制的资源分配连锁执行的长期运行作业的重要组成部分。这项工作解决了一个老问题:为 MPI 的透明检查点提供了一种实用的通用算法,它既高效又与大多数最新的网络软件兼容。透明检查点因其通用性和对大多数 MPI 应用开发人员的易用性而极具吸引力。十年前,早期的 MPI 透明检查点技术遇到了两个棘手的问题:(i) 依赖于特定网络技术的特定 MPI 实现;(ii) 无法证明足够低的运行时开销。问题(i)(网络依赖性)已经在2019年通过MANA引入分裂进程得到解决。问题(ii)(高效运行时开销)在本文中得到了解决。本文介绍了一种避免上述限制的方法,它采用一种新颖的拓扑排序算法来确定安全的未来同步点。该算法适用于 MPI 中的阻塞和非阻塞集体通信。我们通过微基准测试和一组真实世界的 MPI 应用证明了我们方法的有效性和可扩展性,其中主要包括广泛使用的 VASP(Vienna AbInitio Simulation Package),它占劳伦斯伯克利国家实验室 Perlmutter 超级计算机 11% 的工作量。VASP 以前被认为是检查点的一个特殊挑战,部分原因是它的多算法代码。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Massively parallel CMA-ES with increasing population Communication Lower Bounds and Optimal Algorithms for Symmetric Matrix Computations Energy Efficiency Support for Software Defined Networks: a Serverless Computing Approach CountChain: A Decentralized Oracle Network for Counting Systems Delay Analysis of EIP-4844
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1