Compiler-Assisted Application-Level Checkpointing for MPI Programs

Xuejun Yang, Panfeng Wang, Hongyi Fu, Yunfei Du, Zhiyuan Wang, Jia Jia
{"title":"Compiler-Assisted Application-Level Checkpointing for MPI Programs","authors":"Xuejun Yang, Panfeng Wang, Hongyi Fu, Yunfei Du, Zhiyuan Wang, Jia Jia","doi":"10.1109/ICDCS.2008.25","DOIUrl":null,"url":null,"abstract":"Application-level checkpointing can decrease the overhead of fault tolerance by minimizing the amount of checkpoint data. However this technique requires the programmer to manually choose the critical data that should be saved. In this paper, we firstly propose a live-variable analysis method for MPI programs. Then, we provide an optimization method of data saving for application-level checkpointing based on the analysis method. Based on the theoretical foundation, we implement a source-to-source precompiler (ALEC) to automate application-level checkpointing. Finally, we evaluate the performance of five FORTRAN/MPI programs which are transformed and integrated checkpointing features by ALEC on a 512-CPU cluster system. The experimental results show that i) the application-level checkpointing based on live-variable analysis for MPI programs can efficiently reduce the amount of checkpoint data, thereby decrease the overhead of checkpoint and restart; ii) ALEC is capable of automating application-level checkpointing correctly and effectively.","PeriodicalId":240205,"journal":{"name":"2008 The 28th International Conference on Distributed Computing Systems","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 The 28th International Conference on Distributed Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDCS.2008.25","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

Application-level checkpointing can decrease the overhead of fault tolerance by minimizing the amount of checkpoint data. However this technique requires the programmer to manually choose the critical data that should be saved. In this paper, we firstly propose a live-variable analysis method for MPI programs. Then, we provide an optimization method of data saving for application-level checkpointing based on the analysis method. Based on the theoretical foundation, we implement a source-to-source precompiler (ALEC) to automate application-level checkpointing. Finally, we evaluate the performance of five FORTRAN/MPI programs which are transformed and integrated checkpointing features by ALEC on a 512-CPU cluster system. The experimental results show that i) the application-level checkpointing based on live-variable analysis for MPI programs can efficiently reduce the amount of checkpoint data, thereby decrease the overhead of checkpoint and restart; ii) ALEC is capable of automating application-level checkpointing correctly and effectively.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
MPI程序的编译器辅助应用程序级检查点
应用程序级检查点可以通过最小化检查点数据量来减少容错的开销。然而,这种技术要求程序员手动选择应该保存的关键数据。本文首先提出了一种MPI程序的活变量分析方法。然后,在分析方法的基础上,提出了一种应用级检查点数据保存的优化方法。在理论基础上,我们实现了一个源到源的预编译器(ALEC)来实现应用级检查点的自动化。最后,我们在一个512 cpu的集群系统上对5个FORTRAN/MPI程序进行了性能评估,这些程序经过ALEC的转换并集成了检查点特性。实验结果表明:1)基于活变量分析的MPI程序应用级检查点可以有效地减少检查点数据量,从而降低检查点和重启的开销;ii) ALEC能够正确有效地自动化应用程序级检查点。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Relative Network Positioning via CDN Redirections Compiler-Assisted Application-Level Checkpointing for MPI Programs Exploring Anti-Spam Models in Large Scale VoIP Systems Correlation-Aware Object Placement for Multi-Object Operations Probing Queries in Wireless Sensor Networks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1