C. E. Salloum, A. Steininger, Peter Tummeltshammer, Werner Harter
{"title":"Recovery Mechanisms for Dual Core Architectures","authors":"C. E. Salloum, A. Steininger, Peter Tummeltshammer, Werner Harter","doi":"10.1109/DFT.2006.52","DOIUrl":null,"url":null,"abstract":"Dual core architectures are commonly used to establish fault tolerance on the node level. Since comparison is usually performed for the outputs only, no precise diagnostic information is available, and error handling comes down to a reset of both cores. The strategy proposed in this paper allows a more fine-grained error handling. It is based on the following steps: (1) Identification of those registers that are actually relevant for recovering the last known correct core state. (2) Protection of these registers by additional comparators. (3) Use of the trap mechanism for recovering a consistent state of the complete core. (4) (Optional) provision of rollback capability for the relevant registers in order to relax the critical path constraints. In the paper these individual steps was discussed and motivated, and put them into context. In many cases the speed-up that was gained for the recovery was sufficient for using a dual core as a fail-operational instead of a fail-silent component with respect to transient faults. Rather than being restricted to a specific processor design our mechanisms can be employed in a wide variety of dual-core architectures","PeriodicalId":113870,"journal":{"name":"2006 21st IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 21st IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DFT.2006.52","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
Dual core architectures are commonly used to establish fault tolerance on the node level. Since comparison is usually performed for the outputs only, no precise diagnostic information is available, and error handling comes down to a reset of both cores. The strategy proposed in this paper allows a more fine-grained error handling. It is based on the following steps: (1) Identification of those registers that are actually relevant for recovering the last known correct core state. (2) Protection of these registers by additional comparators. (3) Use of the trap mechanism for recovering a consistent state of the complete core. (4) (Optional) provision of rollback capability for the relevant registers in order to relax the critical path constraints. In the paper these individual steps was discussed and motivated, and put them into context. In many cases the speed-up that was gained for the recovery was sufficient for using a dual core as a fail-operational instead of a fail-silent component with respect to transient faults. Rather than being restricted to a specific processor design our mechanisms can be employed in a wide variety of dual-core architectures