{"title":"Bi-Source Verification Against Silent Data Corruption in High Performance Computing","authors":"Era Ajdaraga Krluku, M. Gusev, Vladimir Zdraveski","doi":"10.1145/3351556.3351567","DOIUrl":null,"url":null,"abstract":"This paper proposes a continuous health-check approach for detecting Silent Data Corruption (SCD) in High Performance Computing (HPC) environments. The goal is to minimize the effect of hardware errors in the overall reliability and accuracy of the system by overseeing and validating the accuracy of data. Our work focuses on comparing and presenting the advantages and shortcomings of two approaches to overcoming SDC. Our research shows that from the two proposed methods - threshold triggered and continuous verification - the latter is superior in terms of latency.","PeriodicalId":126836,"journal":{"name":"Proceedings of the 9th Balkan Conference on Informatics","volume":"77 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 9th Balkan Conference on Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3351556.3351567","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
This paper proposes a continuous health-check approach for detecting Silent Data Corruption (SCD) in High Performance Computing (HPC) environments. The goal is to minimize the effect of hardware errors in the overall reliability and accuracy of the system by overseeing and validating the accuracy of data. Our work focuses on comparing and presenting the advantages and shortcomings of two approaches to overcoming SDC. Our research shows that from the two proposed methods - threshold triggered and continuous verification - the latter is superior in terms of latency.