Rebound: Scalable checkpointing for coherent shared memory

Rishi Agarwal, P. Garg, J. Torrellas
{"title":"Rebound: Scalable checkpointing for coherent shared memory","authors":"Rishi Agarwal, P. Garg, J. Torrellas","doi":"10.1145/2000064.2000083","DOIUrl":null,"url":null,"abstract":"As we move to large manycores, the hardware-based global check-pointing schemes that have been proposed for small shared-memory machines do not scale. Scalability barriers include global operations, work lost to global rollback, and inefficiencies in imbalanced or I/O-intensive loads. Scalable checkpointing requires tracking inter-thread dependences and building the checkpoint and rollback operations around dynamic groups of communicating processors. To address this problem, this paper introduces Rebound, the first hardware-based scheme for coordinated local checkpointing in multi-processors with directory-based cache coherence. Rebound leverages the transactions of a directory protocol to track inter-thread dependences. In addition, it boosts checkpointing efficiency by: (i) delaying the writeback of data to safe memory at checkpoints, (ii) supporting operation with multiple checkpoints, and (iii) optimizing checkpointing at barrier synchronization. Finally, Rebound introduces distributed algorithms for checkpointing and rollback sets of processors. Simulations of parallel programs with up to 64 threads show that Rebound is scalable and has very low overhead. For 64 processors, its average performance overhead is only 2%, compared to 15% for global checkpointing.","PeriodicalId":340732,"journal":{"name":"2011 38th Annual International Symposium on Computer Architecture (ISCA)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"34","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 38th Annual International Symposium on Computer Architecture (ISCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2000064.2000083","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 34

Abstract

As we move to large manycores, the hardware-based global check-pointing schemes that have been proposed for small shared-memory machines do not scale. Scalability barriers include global operations, work lost to global rollback, and inefficiencies in imbalanced or I/O-intensive loads. Scalable checkpointing requires tracking inter-thread dependences and building the checkpoint and rollback operations around dynamic groups of communicating processors. To address this problem, this paper introduces Rebound, the first hardware-based scheme for coordinated local checkpointing in multi-processors with directory-based cache coherence. Rebound leverages the transactions of a directory protocol to track inter-thread dependences. In addition, it boosts checkpointing efficiency by: (i) delaying the writeback of data to safe memory at checkpoints, (ii) supporting operation with multiple checkpoints, and (iii) optimizing checkpointing at barrier synchronization. Finally, Rebound introduces distributed algorithms for checkpointing and rollback sets of processors. Simulations of parallel programs with up to 64 threads show that Rebound is scalable and has very low overhead. For 64 processors, its average performance overhead is only 2%, compared to 15% for global checkpointing.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
反弹:用于一致共享内存的可伸缩检查点
当我们转向大型多核时,为小型共享内存机器提出的基于硬件的全局检查点方案无法扩展。可伸缩性障碍包括全局操作、全局回滚所损失的工作,以及不平衡负载或I/ o密集型负载的低效率。可伸缩的检查点需要跟踪线程间依赖关系,并围绕动态通信处理器组构建检查点和回滚操作。为了解决这个问题,本文引入了第一个基于硬件的多处理器协调本地检查点方案,该方案具有基于目录的缓存一致性。Rebound利用目录协议的事务来跟踪线程间的依赖关系。此外,它还通过以下方式提高检查点效率:(i)延迟检查点上的数据回写到安全内存,(ii)支持使用多个检查点的操作,以及(iii)优化屏障同步时的检查点。最后,Rebound介绍了用于检查点和回滚处理器集的分布式算法。对多达64个线程的并行程序的模拟表明,反弹是可伸缩的,并且开销非常低。对于64个处理器,其平均性能开销仅为2%,而全局检查点则为15%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Crafting a usable microkernel, processor, and I/O system with strict and provable information flow security Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators Scalable power control for many-core architectures running multi-threaded applications Virtualizing performance asymmetric multi-core systems DBAR: An efficient routing algorithm to support multiple concurrent applications in networks-on-chip
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1