{"title":"Scalable Distributed Checkpointing Algorithm","authors":"Jinho Ahn","doi":"10.1109/CSCI51800.2020.00237","DOIUrl":null,"url":null,"abstract":"A communication-induced checkpointing algorithm, named HMNR, was introduced to effectively use control information of every other process piggybacked on each sent message for minimizing the number of forced checkpoints. Then, an improved algorithm, called Lazy-HMNR, was presented to lower the possibility of taking forced checkpoints incurred by the asymmetry between checkpointing frequencies of processes. Despite these two different minimization techniques, if the high message interaction traffic occurs, Lazy-HMNR may considerably lower the probability of detecting Z-cycle free patterns due to its shortcoming. Also, there is no prior research attempt to design the algorithms considering network topologies for making the number of forced checkpoints as few as possible with control information piggybacked on each message. This paper introduces a new Lazy-HMNR algorithm for group communication-based distributed systems to synergistically decrease the number of forced checkpoints in a more effective manner.","PeriodicalId":336929,"journal":{"name":"2020 International Conference on Computational Science and Computational Intelligence (CSCI)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Computational Science and Computational Intelligence (CSCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSCI51800.2020.00237","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
A communication-induced checkpointing algorithm, named HMNR, was introduced to effectively use control information of every other process piggybacked on each sent message for minimizing the number of forced checkpoints. Then, an improved algorithm, called Lazy-HMNR, was presented to lower the possibility of taking forced checkpoints incurred by the asymmetry between checkpointing frequencies of processes. Despite these two different minimization techniques, if the high message interaction traffic occurs, Lazy-HMNR may considerably lower the probability of detecting Z-cycle free patterns due to its shortcoming. Also, there is no prior research attempt to design the algorithms considering network topologies for making the number of forced checkpoints as few as possible with control information piggybacked on each message. This paper introduces a new Lazy-HMNR algorithm for group communication-based distributed systems to synergistically decrease the number of forced checkpoints in a more effective manner.