Reliability algorithms for network swapping systems with page migration

Ben Mitchell, J. Rosse, T. Newhall
{"title":"Reliability algorithms for network swapping systems with page migration","authors":"Ben Mitchell, J. Rosse, T. Newhall","doi":"10.1109/CLUSTR.2004.1392655","DOIUrl":null,"url":null,"abstract":"Summary form only given. Network swapping systems allow individual cluster nodes with over-committed memory to use the idle memory of remote nodes as their backing store, and to swap pages over the network. Without reliability support a single node crash can affect programs running on other nodes by losing their remotely swapped page data. RAID-based (Patterson et al., 1988; Markatos and Dramitinos, 1996) reliability solutions promise the best alternative in terms of flexibility and performance. However, two important features of our network swapping system, Nswap (Newhall et al., 2003), make direct application of RAID-based schemes impossible. First, Nswap adapts to each node's local memory load, adjusting the amount of RAM space it makes available for remote swapping, which results in a variable capacity \"backing store\". Second, Nswap supports migration of remotely swapped pages between cluster nodes, which occurs when a node needs to reclaim some of its RAM from Nswap to use for local processing. Page migration complicates reliability if, for example, two pages in the same parity group end up on the same node. We present novel reliability algorithms that solve these problems. Our Parity algorithm uses dynamic parity group membership to match Nswap's dynamic nature. We show that our algorithms add minimal overhead to remote swapping.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLUSTR.2004.1392655","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Summary form only given. Network swapping systems allow individual cluster nodes with over-committed memory to use the idle memory of remote nodes as their backing store, and to swap pages over the network. Without reliability support a single node crash can affect programs running on other nodes by losing their remotely swapped page data. RAID-based (Patterson et al., 1988; Markatos and Dramitinos, 1996) reliability solutions promise the best alternative in terms of flexibility and performance. However, two important features of our network swapping system, Nswap (Newhall et al., 2003), make direct application of RAID-based schemes impossible. First, Nswap adapts to each node's local memory load, adjusting the amount of RAM space it makes available for remote swapping, which results in a variable capacity "backing store". Second, Nswap supports migration of remotely swapped pages between cluster nodes, which occurs when a node needs to reclaim some of its RAM from Nswap to use for local processing. Page migration complicates reliability if, for example, two pages in the same parity group end up on the same node. We present novel reliability algorithms that solve these problems. Our Parity algorithm uses dynamic parity group membership to match Nswap's dynamic nature. We show that our algorithms add minimal overhead to remote swapping.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
具有页面迁移的网络交换系统的可靠性算法
只提供摘要形式。网络交换系统允许具有过度使用内存的单个集群节点使用远程节点的空闲内存作为其后备存储,并在网络上交换页面。如果没有可靠性支持,单个节点崩溃可能会丢失远程交换的页面数据,从而影响在其他节点上运行的程序。基于raid (Patterson et al., 1988;Markatos和Dramitinos, 1996)可靠性解决方案承诺在灵活性和性能方面的最佳选择。然而,我们的网络交换系统swap (Newhall et al., 2003)的两个重要特性使得直接应用基于raid的方案变得不可能。首先,swap适应每个节点的本地内存负载,调整可用于远程交换的RAM空间的数量,从而产生可变容量的“后备存储”。其次,swap支持在集群节点之间迁移远程交换的页面,当节点需要从swap中回收一些RAM用于本地处理时,就会发生这种情况。例如,如果同一奇偶校验组中的两个页面最终位于同一节点上,则页面迁移会使可靠性复杂化。我们提出了新的可靠性算法来解决这些问题。我们的奇偶校验算法使用动态奇偶校验组成员来匹配swap的动态特性。我们展示了我们的算法为远程交换增加了最小的开销。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Kerrighed and data parallelism: cluster computing on single system image operating systems Management of grid jobs and data within SAMGrid MPIIMGEN - a code transformer that parallelizes image processing codes to run on a cluster of workstations FTC-Charm++: an in-memory checkpoint-based fault tolerant runtime for Charm++ and MPI Bandwidth-aware co-allocating meta-schedulers for mini-grid architectures
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1