{"title":"软件实现容错的恢复语言方法","authors":"V. D. Florio, Geert Deconinck, R. Lauwereins","doi":"10.1109/EMPDP.2001.905070","DOIUrl":null,"url":null,"abstract":"We describe a novel approach for software-implemented fault tolerance that separates error detection from error recovery and offers a distinct programming and processing context for the latter. This allows the application developer to address separately the non-functional aspects of error recovery from those pertaining to the functional behaviour that the user application is supposed to have in the absence of faults. We conjecture that this way only a limited amount of non-functional code intrusion affects the user application, while the bulk of the strategy to cope with errors is to be expressed by the user in a \"recovery script\", conceptually as well physically distinct from the functional application layer. Such script is to be written in what we call a \"recovery language\", i.e. a specialised linguistic framework devoted to the management of the fault tolerance strategies that allows to express scenarios of isolation, reconfiguration, and recovery. These are to be executed on meta-entities of the application with physical or logical counterparts (processing nodes, tasks, or user-defined groups of tasks). The developer is therefore made able to modify the fault tolerance strategy with only a few or no modifications in the application part, or vice-versa, tackling more easily and effectively any of these two fronts. This can result in a better maintainability of the target fault-tolerant application and in support for reaching portability of the service while moving the application to different unfavourable environments. The paper positions and discusses the recovery language approach and a prototypal implementation for embedded applications developed within project TIRAN on a number of distributed platforms.","PeriodicalId":262971,"journal":{"name":"Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"The recovery language approach for software-implemented fault tolerance\",\"authors\":\"V. D. Florio, Geert Deconinck, R. Lauwereins\",\"doi\":\"10.1109/EMPDP.2001.905070\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We describe a novel approach for software-implemented fault tolerance that separates error detection from error recovery and offers a distinct programming and processing context for the latter. This allows the application developer to address separately the non-functional aspects of error recovery from those pertaining to the functional behaviour that the user application is supposed to have in the absence of faults. We conjecture that this way only a limited amount of non-functional code intrusion affects the user application, while the bulk of the strategy to cope with errors is to be expressed by the user in a \\\"recovery script\\\", conceptually as well physically distinct from the functional application layer. Such script is to be written in what we call a \\\"recovery language\\\", i.e. a specialised linguistic framework devoted to the management of the fault tolerance strategies that allows to express scenarios of isolation, reconfiguration, and recovery. These are to be executed on meta-entities of the application with physical or logical counterparts (processing nodes, tasks, or user-defined groups of tasks). The developer is therefore made able to modify the fault tolerance strategy with only a few or no modifications in the application part, or vice-versa, tackling more easily and effectively any of these two fronts. This can result in a better maintainability of the target fault-tolerant application and in support for reaching portability of the service while moving the application to different unfavourable environments. The paper positions and discusses the recovery language approach and a prototypal implementation for embedded applications developed within project TIRAN on a number of distributed platforms.\",\"PeriodicalId\":262971,\"journal\":{\"name\":\"Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2001-02-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/EMPDP.2001.905070\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EMPDP.2001.905070","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The recovery language approach for software-implemented fault tolerance
We describe a novel approach for software-implemented fault tolerance that separates error detection from error recovery and offers a distinct programming and processing context for the latter. This allows the application developer to address separately the non-functional aspects of error recovery from those pertaining to the functional behaviour that the user application is supposed to have in the absence of faults. We conjecture that this way only a limited amount of non-functional code intrusion affects the user application, while the bulk of the strategy to cope with errors is to be expressed by the user in a "recovery script", conceptually as well physically distinct from the functional application layer. Such script is to be written in what we call a "recovery language", i.e. a specialised linguistic framework devoted to the management of the fault tolerance strategies that allows to express scenarios of isolation, reconfiguration, and recovery. These are to be executed on meta-entities of the application with physical or logical counterparts (processing nodes, tasks, or user-defined groups of tasks). The developer is therefore made able to modify the fault tolerance strategy with only a few or no modifications in the application part, or vice-versa, tackling more easily and effectively any of these two fronts. This can result in a better maintainability of the target fault-tolerant application and in support for reaching portability of the service while moving the application to different unfavourable environments. The paper positions and discusses the recovery language approach and a prototypal implementation for embedded applications developed within project TIRAN on a number of distributed platforms.