基于点对点志愿计算工作流的自适应检查点方案

Lei Ni, A. Harwood
{"title":"基于点对点志愿计算工作流的自适应检查点方案","authors":"Lei Ni, A. Harwood","doi":"10.1109/PDCAT.2008.53","DOIUrl":null,"url":null,"abstract":"Volunteer computing, sometimes called public resource computing, is an emerging computational model that is very suitable for work-pooled parallel processing. As more complex grid applications make use of work flows in their design and deployment it is reasonable to consider the impact of work flow deployment over a volunteer computing infrastructure. In this case, the inter work flow I/O can lead to a significant increase in I/O demands at the work pool server. A possible solution is the use of a peer-to-peer based parallel computing architecture to off-load this I/O demand to the workers; where the workers can fulfill some aspects of work flow coordination and I/O checking, etc. However, achieving robustness in such a large scale system is a challenging hurdle towards the decentralized execution of work flows and general parallel processes. To increase robustness, we propose and show the merits of using an adaptive checkpoint scheme that efficiently checkpoints the status of the parallel processes according to the estimation of relevant network and peer parameters. Based on our proposed mathematical checkpoint model, our scheme uses statistical data observed during runtime to dynamically make checkpoint decisions in a completely decentralized manner. The results of simulation show support for our proposed approach in terms of reduced required runtime.","PeriodicalId":282779,"journal":{"name":"2008 Ninth International Conference on Parallel and Distributed Computing, Applications and Technologies","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"An Adaptive Checkpointing Scheme for Peer-to-Peer Based Volunteer Computing Work Flows\",\"authors\":\"Lei Ni, A. Harwood\",\"doi\":\"10.1109/PDCAT.2008.53\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Volunteer computing, sometimes called public resource computing, is an emerging computational model that is very suitable for work-pooled parallel processing. As more complex grid applications make use of work flows in their design and deployment it is reasonable to consider the impact of work flow deployment over a volunteer computing infrastructure. In this case, the inter work flow I/O can lead to a significant increase in I/O demands at the work pool server. A possible solution is the use of a peer-to-peer based parallel computing architecture to off-load this I/O demand to the workers; where the workers can fulfill some aspects of work flow coordination and I/O checking, etc. However, achieving robustness in such a large scale system is a challenging hurdle towards the decentralized execution of work flows and general parallel processes. To increase robustness, we propose and show the merits of using an adaptive checkpoint scheme that efficiently checkpoints the status of the parallel processes according to the estimation of relevant network and peer parameters. Based on our proposed mathematical checkpoint model, our scheme uses statistical data observed during runtime to dynamically make checkpoint decisions in a completely decentralized manner. The results of simulation show support for our proposed approach in terms of reduced required runtime.\",\"PeriodicalId\":282779,\"journal\":{\"name\":\"2008 Ninth International Conference on Parallel and Distributed Computing, Applications and Technologies\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-11-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 Ninth International Conference on Parallel and Distributed Computing, Applications and Technologies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PDCAT.2008.53\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 Ninth International Conference on Parallel and Distributed Computing, Applications and Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDCAT.2008.53","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

摘要

志愿计算,有时也称为公共资源计算,是一种新兴的计算模型,非常适合工作池并行处理。随着更复杂的网格应用程序在其设计和部署中使用工作流,考虑工作流部署对志愿计算基础设施的影响是合理的。在这种情况下,工作流间的I/O可能导致工作池服务器上的I/O需求显著增加。一种可能的解决方案是使用基于点对点的并行计算架构来将这种I/O需求卸载给工作人员;其中工人可以完成工作流协调和I/O检查等方面的工作。然而,在如此大规模的系统中实现健壮性对于分散执行工作流和一般并行流程来说是一个具有挑战性的障碍。为了提高鲁棒性,我们提出并展示了使用自适应检查点方案的优点,该方案根据相关网络和对等参数的估计有效地检查点并行进程的状态。基于我们提出的数学检查点模型,我们的方案使用运行时观察到的统计数据,以完全分散的方式动态地做出检查点决策。仿真结果表明,在减少所需的运行时间方面,支持我们提出的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
An Adaptive Checkpointing Scheme for Peer-to-Peer Based Volunteer Computing Work Flows
Volunteer computing, sometimes called public resource computing, is an emerging computational model that is very suitable for work-pooled parallel processing. As more complex grid applications make use of work flows in their design and deployment it is reasonable to consider the impact of work flow deployment over a volunteer computing infrastructure. In this case, the inter work flow I/O can lead to a significant increase in I/O demands at the work pool server. A possible solution is the use of a peer-to-peer based parallel computing architecture to off-load this I/O demand to the workers; where the workers can fulfill some aspects of work flow coordination and I/O checking, etc. However, achieving robustness in such a large scale system is a challenging hurdle towards the decentralized execution of work flows and general parallel processes. To increase robustness, we propose and show the merits of using an adaptive checkpoint scheme that efficiently checkpoints the status of the parallel processes according to the estimation of relevant network and peer parameters. Based on our proposed mathematical checkpoint model, our scheme uses statistical data observed during runtime to dynamically make checkpoint decisions in a completely decentralized manner. The results of simulation show support for our proposed approach in terms of reduced required runtime.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Case Studies in Computer Network Measurement Advances in the ProGenGrid Workflow Management System Finding Interaction Partners Using Attitude-Based Decision Strategies Agent Migration and Communication in WSNs Portable Object Thermal Awareness: Modeling Intelligent Sensor Networks for Cool Store Applications
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1