{"title":"基于主工计算环境下基于日志的两阶段故障恢复机制","authors":"Ting Chen, Yongjian Wang, Yuanqiang Huang, Cheng Luo, D. Qian, Zhongzhi Luan","doi":"10.1109/ISPA.2009.53","DOIUrl":null,"url":null,"abstract":"The Master/worker pattern is widely used to construct the cross-domain, large scale computing infrastructure. The applications supported by this kind of infrastructure usually features long-running, speculative execution etc. Fault recovery mechanism is significant to them especially in the wide area network environment, which consists of error prone components. Inter-node cooperation is urgent to make the recovery process more efficient. The traditional log-based rollback recovery mechanism which features independent recovery cannot fulfill the global cooperation requirement due to the waste of bandwidth and slow application data transfer which is caused by the exchange of a large amount of logs. In this paper, we propose a two-phase log-based recovery mechanism which is of merits such as space saving and global optimization and can be used as a complement of the current log-based rollback recovery approach in some specific situations. We have demonstrated the use of this mechanism in the Drug Discovery Grid environment, which is supported by China National Grid. Experiment results have proved efficiency of this mechanism.","PeriodicalId":346815,"journal":{"name":"2009 IEEE International Symposium on Parallel and Distributed Processing with Applications","volume":"258 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Two-Phase Log-Based Fault Recovery Mechanism in Master/Worker Based Computing Environment\",\"authors\":\"Ting Chen, Yongjian Wang, Yuanqiang Huang, Cheng Luo, D. Qian, Zhongzhi Luan\",\"doi\":\"10.1109/ISPA.2009.53\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Master/worker pattern is widely used to construct the cross-domain, large scale computing infrastructure. The applications supported by this kind of infrastructure usually features long-running, speculative execution etc. Fault recovery mechanism is significant to them especially in the wide area network environment, which consists of error prone components. Inter-node cooperation is urgent to make the recovery process more efficient. The traditional log-based rollback recovery mechanism which features independent recovery cannot fulfill the global cooperation requirement due to the waste of bandwidth and slow application data transfer which is caused by the exchange of a large amount of logs. In this paper, we propose a two-phase log-based recovery mechanism which is of merits such as space saving and global optimization and can be used as a complement of the current log-based rollback recovery approach in some specific situations. We have demonstrated the use of this mechanism in the Drug Discovery Grid environment, which is supported by China National Grid. Experiment results have proved efficiency of this mechanism.\",\"PeriodicalId\":346815,\"journal\":{\"name\":\"2009 IEEE International Symposium on Parallel and Distributed Processing with Applications\",\"volume\":\"258 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-08-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 IEEE International Symposium on Parallel and Distributed Processing with Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISPA.2009.53\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE International Symposium on Parallel and Distributed Processing with Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPA.2009.53","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Two-Phase Log-Based Fault Recovery Mechanism in Master/Worker Based Computing Environment
The Master/worker pattern is widely used to construct the cross-domain, large scale computing infrastructure. The applications supported by this kind of infrastructure usually features long-running, speculative execution etc. Fault recovery mechanism is significant to them especially in the wide area network environment, which consists of error prone components. Inter-node cooperation is urgent to make the recovery process more efficient. The traditional log-based rollback recovery mechanism which features independent recovery cannot fulfill the global cooperation requirement due to the waste of bandwidth and slow application data transfer which is caused by the exchange of a large amount of logs. In this paper, we propose a two-phase log-based recovery mechanism which is of merits such as space saving and global optimization and can be used as a complement of the current log-based rollback recovery approach in some specific situations. We have demonstrated the use of this mechanism in the Drug Discovery Grid environment, which is supported by China National Grid. Experiment results have proved efficiency of this mechanism.