{"title":"对使用增量快照的协调检查点技术的建议和评估","authors":"Mamoru Ohara, M. Arai, S. Fukumoto, K. Iwasaki","doi":"10.1002/ECJC.20296","DOIUrl":null,"url":null,"abstract":"Coordinated checkpointing techniques ensure that a consistent global state is maintained by means of coordination between processes. The approach requires that application messages temporarily cease to be exchanged but the rollback procedure when recovering from a fault is consequently simplified and the recovery costs are small. With current reductions in communications costs, the importance of coordinated techniques may be seen to be growing. However, in large-scale systems there is a possibility that performance will be seriously impaired due to the frequent halting of the exchange of messages. In this paper we propose a method whereby coordination is performed at only a subset of the checkpoint generation points that are periodically visited while at the remaining points each process independently generates an incremental snapshot. This method aims to both alleviate the performance degradation incurred from coordination and to realize relatively high-speed recovery. In evaluating the effectiveness of this method we estimate the checkpointing overheads and recovery costs using a probabilistic model and simulations and compare it with existing coordination methods. The results show that the proposed method is more effective than existing coordination methods from the perspective of both performance and reliability in environments with a relatively low frequency of messages. In addition, we perform comparisons of two different delta schemes for representing the incremental snapshots and discuss which environments they are each respectively suited to. © 2007 Wiley Periodicals, Inc. Electron Comm Jpn Pt 3, 90(8): 39– 53, 2007; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/ecjc.20296","PeriodicalId":100407,"journal":{"name":"Electronics and Communications in Japan (Part III: Fundamental Electronic Science)","volume":"27 1","pages":"39-53"},"PeriodicalIF":0.0000,"publicationDate":"2007-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A proposal and evaluation of a coordinated checkpointing technique using incremental snapshots\",\"authors\":\"Mamoru Ohara, M. Arai, S. Fukumoto, K. Iwasaki\",\"doi\":\"10.1002/ECJC.20296\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Coordinated checkpointing techniques ensure that a consistent global state is maintained by means of coordination between processes. The approach requires that application messages temporarily cease to be exchanged but the rollback procedure when recovering from a fault is consequently simplified and the recovery costs are small. With current reductions in communications costs, the importance of coordinated techniques may be seen to be growing. However, in large-scale systems there is a possibility that performance will be seriously impaired due to the frequent halting of the exchange of messages. In this paper we propose a method whereby coordination is performed at only a subset of the checkpoint generation points that are periodically visited while at the remaining points each process independently generates an incremental snapshot. This method aims to both alleviate the performance degradation incurred from coordination and to realize relatively high-speed recovery. In evaluating the effectiveness of this method we estimate the checkpointing overheads and recovery costs using a probabilistic model and simulations and compare it with existing coordination methods. The results show that the proposed method is more effective than existing coordination methods from the perspective of both performance and reliability in environments with a relatively low frequency of messages. In addition, we perform comparisons of two different delta schemes for representing the incremental snapshots and discuss which environments they are each respectively suited to. © 2007 Wiley Periodicals, Inc. Electron Comm Jpn Pt 3, 90(8): 39– 53, 2007; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/ecjc.20296\",\"PeriodicalId\":100407,\"journal\":{\"name\":\"Electronics and Communications in Japan (Part III: Fundamental Electronic Science)\",\"volume\":\"27 1\",\"pages\":\"39-53\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-04-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Electronics and Communications in Japan (Part III: Fundamental Electronic Science)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1002/ECJC.20296\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Electronics and Communications in Japan (Part III: Fundamental Electronic Science)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/ECJC.20296","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1