{"title":"Fault tolerant programming for network based parallel computing","authors":"A. Clematis","doi":"10.1016/0165-6074(94)90036-1","DOIUrl":null,"url":null,"abstract":"<div><p>Methods and tools for fault tolerance in network based parallel computing are analyzed. Different communications libraries are now available which make it possible to use a local area network as a parallel computer. These libraries provide different services, however only little attention is devoted to the problem of fault tolerance. Using the fact that most of parallel applications exhibit a regular structure it is shown that to provide fault tolerance to this type of computation is much more simple than to provide fault tolerance in a general concurrent program. The primitives which are necessary to support the proposed design, and general implementation problems with respect to shared virtual memory and message passing communications models are briefly considered.</p></div>","PeriodicalId":100927,"journal":{"name":"Microprocessing and Microprogramming","volume":"40 10","pages":"Pages 765-768"},"PeriodicalIF":0.0000,"publicationDate":"1994-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0165-6074(94)90036-1","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Microprocessing and Microprogramming","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/0165607494900361","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Methods and tools for fault tolerance in network based parallel computing are analyzed. Different communications libraries are now available which make it possible to use a local area network as a parallel computer. These libraries provide different services, however only little attention is devoted to the problem of fault tolerance. Using the fact that most of parallel applications exhibit a regular structure it is shown that to provide fault tolerance to this type of computation is much more simple than to provide fault tolerance in a general concurrent program. The primitives which are necessary to support the proposed design, and general implementation problems with respect to shared virtual memory and message passing communications models are briefly considered.