Moslem Didehban, Aviral Shrivastava, Sai Ram Dheeraj Lokam
{"title":"NEMESIS: A software approach for computing in presence of soft errors","authors":"Moslem Didehban, Aviral Shrivastava, Sai Ram Dheeraj Lokam","doi":"10.1109/ICCAD.2017.8203792","DOIUrl":null,"url":null,"abstract":"Soft errors are considered as the main reliability challenge for sub-nanoscale microprocessors. Software-level soft error resilience schemes are desirable because they require no hardware modifications and their protection can be tuned based on the application requirements. However, existing software-level error tolerant schemes do not provide high-level of protection. In this work, we present NEMESIS — a compiler-level fine-grain soft error detection, diagnosis and recovery technique that can provide high degree of error-resiliency. NEMESIS runs three versions of computations and detects soft errors by checking the results of all memory write and branch operations. In the case of mismatch, NEMESIS recovery routine reverts the effect of error from the architectural state of the program and program resumes its normal execution. Our extensive μ-architectural-level fault injection experiments results show that NEMESIS transformation is able to detect all soft errors and recover from 97% of detected errors.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCAD.2017.8203792","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 23
Abstract
Soft errors are considered as the main reliability challenge for sub-nanoscale microprocessors. Software-level soft error resilience schemes are desirable because they require no hardware modifications and their protection can be tuned based on the application requirements. However, existing software-level error tolerant schemes do not provide high-level of protection. In this work, we present NEMESIS — a compiler-level fine-grain soft error detection, diagnosis and recovery technique that can provide high degree of error-resiliency. NEMESIS runs three versions of computations and detects soft errors by checking the results of all memory write and branch operations. In the case of mismatch, NEMESIS recovery routine reverts the effect of error from the architectural state of the program and program resumes its normal execution. Our extensive μ-architectural-level fault injection experiments results show that NEMESIS transformation is able to detect all soft errors and recover from 97% of detected errors.