Load Sharing In Hypercube Multicomputers In The Presence Of Node Failures

Yi-Chieh Chang, K. Shin
{"title":"Load Sharing In Hypercube Multicomputers In The Presence Of Node Failures","authors":"Yi-Chieh Chang, K. Shin","doi":"10.1109/DMCC.1990.556410","DOIUrl":null,"url":null,"abstract":"This paper discusses and analyzes two load sharing (LS) issues: adjusting preferred lists and implementing a fault-tolerant mechanism in the presence of node failures. In an early paper, we have proposed to order the nodes in each node's proximity into a preferred list for the purpose of load sharing in distributed real-time systems. The preferred list of each node is constructed in such a way that each node will be selected as the kth preferred node by one and only one other node. Such lists are proven to allow the tasks to be evenly distributed in a system. However, the presence of faulty nodes will destroy the original structure of a preferred list if the faulty nodes are simply skipped in the preferred list. An algorithm is therefore proposed to modify each preferred list to retain its original features regardless of the number of faulty nodes in the system. The communication overhead introduced by this algorithm is shown to be minimal. Based on the modified preferred lists, a simple fault-tolerant mechanism is implemented. Each node is equipped with a backup queue which 'The work reported in this paper was supported in part by the Office of Naval Research under contract N0001485-K-0122, and the NSF under grant DMC-8721492. Any opinions, findings, and recommendations expressed in this publication are those of the authors and do not necessarily reflect the view of the funding agencies. stores and updates the arriving/completing tasks at its most preferrecd node. Whenever a node becomes faulty, its m.ost preferred node will treat the tasks in the baxkup queue as externally axriving tasks. Our simulation results show that this approach, despite of the simplicity, can reduce the number of task losses dramatically, as compared to the approaches without any faulttolerant mechanism.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"337 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DMCC.1990.556410","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

This paper discusses and analyzes two load sharing (LS) issues: adjusting preferred lists and implementing a fault-tolerant mechanism in the presence of node failures. In an early paper, we have proposed to order the nodes in each node's proximity into a preferred list for the purpose of load sharing in distributed real-time systems. The preferred list of each node is constructed in such a way that each node will be selected as the kth preferred node by one and only one other node. Such lists are proven to allow the tasks to be evenly distributed in a system. However, the presence of faulty nodes will destroy the original structure of a preferred list if the faulty nodes are simply skipped in the preferred list. An algorithm is therefore proposed to modify each preferred list to retain its original features regardless of the number of faulty nodes in the system. The communication overhead introduced by this algorithm is shown to be minimal. Based on the modified preferred lists, a simple fault-tolerant mechanism is implemented. Each node is equipped with a backup queue which 'The work reported in this paper was supported in part by the Office of Naval Research under contract N0001485-K-0122, and the NSF under grant DMC-8721492. Any opinions, findings, and recommendations expressed in this publication are those of the authors and do not necessarily reflect the view of the funding agencies. stores and updates the arriving/completing tasks at its most preferrecd node. Whenever a node becomes faulty, its m.ost preferred node will treat the tasks in the baxkup queue as externally axriving tasks. Our simulation results show that this approach, despite of the simplicity, can reduce the number of task losses dramatically, as compared to the approaches without any faulttolerant mechanism.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
节点故障情况下超立方体多计算机的负载分担
本文讨论并分析了负载共享(LS)的两个问题:调整首选列表和实现节点故障时的容错机制。在早期的一篇论文中,我们提出了在分布式实时系统中,为了负载共享的目的,将每个节点邻近的节点排序到一个首选列表中。构建每个节点的首选列表的方式是,每个节点将被一个且仅一个其他节点选择为第k个首选节点。这样的列表被证明允许任务在系统中均匀分布。但是,如果在首选列表中跳过故障节点,则故障节点的存在将破坏首选列表的原始结构。因此,提出了一种算法,无论系统中故障节点的数量如何,都可以修改每个首选列表以保留其原始特征。该算法带来的通信开销是最小的。基于修改后的首选列表,实现了简单的容错机制。每个节点都配备了一个备份队列,本文报告的工作部分由海军研究办公室根据合同N0001485-K-0122和NSF根据拨款DMC-8721492提供支持。本出版物中表达的任何观点、发现和建议均为作者的观点,并不一定反映资助机构的观点。在其最优选的节点上存储和更新到达/完成的任务。当一个节点发生故障时,它的最优节点将把备份队列中的任务视为外部到达的任务。我们的仿真结果表明,尽管这种方法简单,但与没有容错机制的方法相比,可以显着减少任务损失的数量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Reducing Inner Product Computation in the Parallel One-Sided Jacobi Algorithm Experience with Concurrent Aggregates (CA): Implementation and Programming A Distributed Memory Implementation of SISAL Performance Results on the Intel Touchstone Gamma Prototype Quick Recovery of Embedded Structures in Hypercube Computers
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1