{"title":"Distributed computing systems and checkpointing","authors":"Kenneth F. Wong, M. Franklin","doi":"10.1109/HPDC.1993.263838","DOIUrl":null,"url":null,"abstract":"This paper examines the performance of synchronous checkpointing in a distributed computing environment with and without load redistribution. Performance models are developed, and optimum checkpoint intervals are determined. The analysis extends earlier work by allowing for multiple nodes, state dependent checkpoint intervals, and a performance metric which is coupled with failure-free performance and the speedup functions associated with implementation of parallel algorithms. Expressions for the optimum checkpoint intervals for synchronous checkpointing with and without load redistribution are derived and the results are then used to determine when load redistribution is advantageous.<<ETX>>","PeriodicalId":226280,"journal":{"name":"[1993] Proceedings The 2nd International Symposium on High Performance Distributed Computing","volume":"209 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1993-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"[1993] Proceedings The 2nd International Symposium on High Performance Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPDC.1993.263838","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 22
Abstract
This paper examines the performance of synchronous checkpointing in a distributed computing environment with and without load redistribution. Performance models are developed, and optimum checkpoint intervals are determined. The analysis extends earlier work by allowing for multiple nodes, state dependent checkpoint intervals, and a performance metric which is coupled with failure-free performance and the speedup functions associated with implementation of parallel algorithms. Expressions for the optimum checkpoint intervals for synchronous checkpointing with and without load redistribution are derived and the results are then used to determine when load redistribution is advantageous.<>