{"title":"Load Sharing In Hypercube Multicomputers In The Presence Of Node Failures","authors":"Yi-Chieh Chang, K. Shin","doi":"10.1109/DMCC.1990.556410","DOIUrl":null,"url":null,"abstract":"This paper discusses and analyzes two load sharing (LS) issues: adjusting preferred lists and implementing a fault-tolerant mechanism in the presence of node failures. In an early paper, we have proposed to order the nodes in each node's proximity into a preferred list for the purpose of load sharing in distributed real-time systems. The preferred list of each node is constructed in such a way that each node will be selected as the kth preferred node by one and only one other node. Such lists are proven to allow the tasks to be evenly distributed in a system. However, the presence of faulty nodes will destroy the original structure of a preferred list if the faulty nodes are simply skipped in the preferred list. An algorithm is therefore proposed to modify each preferred list to retain its original features regardless of the number of faulty nodes in the system. The communication overhead introduced by this algorithm is shown to be minimal. Based on the modified preferred lists, a simple fault-tolerant mechanism is implemented. Each node is equipped with a backup queue which 'The work reported in this paper was supported in part by the Office of Naval Research under contract N0001485-K-0122, and the NSF under grant DMC-8721492. Any opinions, findings, and recommendations expressed in this publication are those of the authors and do not necessarily reflect the view of the funding agencies. stores and updates the arriving/completing tasks at its most preferrecd node. Whenever a node becomes faulty, its m.ost preferred node will treat the tasks in the baxkup queue as externally axriving tasks. Our simulation results show that this approach, despite of the simplicity, can reduce the number of task losses dramatically, as compared to the approaches without any faulttolerant mechanism.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"337 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DMCC.1990.556410","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
This paper discusses and analyzes two load sharing (LS) issues: adjusting preferred lists and implementing a fault-tolerant mechanism in the presence of node failures. In an early paper, we have proposed to order the nodes in each node's proximity into a preferred list for the purpose of load sharing in distributed real-time systems. The preferred list of each node is constructed in such a way that each node will be selected as the kth preferred node by one and only one other node. Such lists are proven to allow the tasks to be evenly distributed in a system. However, the presence of faulty nodes will destroy the original structure of a preferred list if the faulty nodes are simply skipped in the preferred list. An algorithm is therefore proposed to modify each preferred list to retain its original features regardless of the number of faulty nodes in the system. The communication overhead introduced by this algorithm is shown to be minimal. Based on the modified preferred lists, a simple fault-tolerant mechanism is implemented. Each node is equipped with a backup queue which 'The work reported in this paper was supported in part by the Office of Naval Research under contract N0001485-K-0122, and the NSF under grant DMC-8721492. Any opinions, findings, and recommendations expressed in this publication are those of the authors and do not necessarily reflect the view of the funding agencies. stores and updates the arriving/completing tasks at its most preferrecd node. Whenever a node becomes faulty, its m.ost preferred node will treat the tasks in the baxkup queue as externally axriving tasks. Our simulation results show that this approach, despite of the simplicity, can reduce the number of task losses dramatically, as compared to the approaches without any faulttolerant mechanism.