{"title":"Silent Data Corruption Estimation and Mitigation Without Fault Injection","authors":"Moona Yakhchi;Mahdi Fazeli;Seyyed Amir Asghari","doi":"10.1109/ICJECE.2022.3189043","DOIUrl":null,"url":null,"abstract":"Silent data corruptions (SDCs) have been always regarded as the serious effect of radiation-induced faults. Traditional solutions based on redundancies are very expensive in terms of chip area, energy consumption, and performance. Consequently, providing low-cost and efficient approaches to cope with SDCs has received researchers’ attention more than ever. On the other hand, identifying SDC-prone data and instruction in a program is a very challenging issue, as it requires time-consuming fault injection processes into different parts of a program. In this article, we present a cost-efficient approach to detecting and mitigating the rate of SDCs in the whole program with the presence of multibit faults without a fault injection process. This approach uses a combination of machine learning and a metaheuristic algorithm that predicts the SDC event rate of each instruction. The evaluation results show that the proposed approach provides a high level of detection accuracy of 99% while offering a low-performance overhead of 58%.","PeriodicalId":100619,"journal":{"name":"IEEE Canadian Journal of Electrical and Computer Engineering","volume":"45 3","pages":"318-327"},"PeriodicalIF":2.1000,"publicationDate":"2022-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Canadian Journal of Electrical and Computer Engineering","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/9880922/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Silent data corruptions (SDCs) have been always regarded as the serious effect of radiation-induced faults. Traditional solutions based on redundancies are very expensive in terms of chip area, energy consumption, and performance. Consequently, providing low-cost and efficient approaches to cope with SDCs has received researchers’ attention more than ever. On the other hand, identifying SDC-prone data and instruction in a program is a very challenging issue, as it requires time-consuming fault injection processes into different parts of a program. In this article, we present a cost-efficient approach to detecting and mitigating the rate of SDCs in the whole program with the presence of multibit faults without a fault injection process. This approach uses a combination of machine learning and a metaheuristic algorithm that predicts the SDC event rate of each instruction. The evaluation results show that the proposed approach provides a high level of detection accuracy of 99% while offering a low-performance overhead of 58%.