stdchk: A Checkpoint Storage System for Desktop Grid Computing

S. Al-Kiswany, M. Ripeanu, Sudharshan S. Vazhkudai, Abdullah Gharaibeh
{"title":"stdchk: A Checkpoint Storage System for Desktop Grid Computing","authors":"S. Al-Kiswany, M. Ripeanu, Sudharshan S. Vazhkudai, Abdullah Gharaibeh","doi":"10.1109/ICDCS.2008.19","DOIUrl":null,"url":null,"abstract":"Checkpointing is an indispensable technique to provide fault tolerance for long-running high-throughput applications like those running on desktop grids. This article argues that a checkpoint storage system, optimized to operate in these environments, can offer multiple benefits: reduce the load on a traditional file system, offer high-performance through specialization, and, finally, optimize data management by taking into account checkpoint application semantics. Such a storage system can present a unifying abstraction to checkpoint operations, while hiding the fact that there are no dedicated resources to store the checkpoint data. We prototype stdchk, a checkpoint storage system that uses scavenged disk space from participating desktops to build a low-cost storage system, offering a traditional file system interface for easy integration with applications. This article presents the stdchk architecture, key performance optimizations, and its support for incremental checkpointing and increased data availability. Our evaluation confirms that the stdchk approach is viable in a desktop grid setting and offers a low cost storage system with desirable performance characteristics: high write throughput as well as reduced storage space and network effort to save checkpoint images.","PeriodicalId":240205,"journal":{"name":"2008 The 28th International Conference on Distributed Computing Systems","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"76","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 The 28th International Conference on Distributed Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDCS.2008.19","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 76

Abstract

Checkpointing is an indispensable technique to provide fault tolerance for long-running high-throughput applications like those running on desktop grids. This article argues that a checkpoint storage system, optimized to operate in these environments, can offer multiple benefits: reduce the load on a traditional file system, offer high-performance through specialization, and, finally, optimize data management by taking into account checkpoint application semantics. Such a storage system can present a unifying abstraction to checkpoint operations, while hiding the fact that there are no dedicated resources to store the checkpoint data. We prototype stdchk, a checkpoint storage system that uses scavenged disk space from participating desktops to build a low-cost storage system, offering a traditional file system interface for easy integration with applications. This article presents the stdchk architecture, key performance optimizations, and its support for incremental checkpointing and increased data availability. Our evaluation confirms that the stdchk approach is viable in a desktop grid setting and offers a low cost storage system with desirable performance characteristics: high write throughput as well as reduced storage space and network effort to save checkpoint images.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
stdchk:桌面网格计算的检查点存储系统
检查点是为长时间运行的高吞吐量应用程序(如在桌面网格上运行的应用程序)提供容错的一项不可或缺的技术。本文认为,针对这些环境进行了优化的检查点存储系统可以提供多种好处:减少传统文件系统的负载,通过专门化提供高性能,最后,通过考虑检查点应用程序语义来优化数据管理。这样的存储系统可以为检查点操作提供统一的抽象,同时隐藏了没有专用资源来存储检查点数据的事实。我们对stdchk进行了原型设计,这是一个检查点存储系统,它使用从参与的桌面收集的磁盘空间来构建一个低成本的存储系统,提供一个传统的文件系统接口,以便与应用程序轻松集成。本文介绍了stdchk架构、关键性能优化,以及它对增量检查点和增强数据可用性的支持。我们的评估证实,stdchk方法在桌面网格设置中是可行的,并且提供了具有理想性能特征的低成本存储系统:高写吞吐量以及减少存储空间和网络保存检查点映像的工作量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Relative Network Positioning via CDN Redirections Compiler-Assisted Application-Level Checkpointing for MPI Programs Exploring Anti-Spam Models in Large Scale VoIP Systems Correlation-Aware Object Placement for Multi-Object Operations Probing Queries in Wireless Sensor Networks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1