Perturbation-based Fault Screening

2007 IEEE 13th International Symposium on High Performance Computer Architecture Pub Date : 2007-02-10 DOI:10.1109/HPCA.2007.346195

Paul Racunas, Kypros Constantinides, Srilatha Manne, Shubhendu S. Mukherjee

{"title":"Perturbation-based Fault Screening","authors":"Paul Racunas, Kypros Constantinides, Srilatha Manne, Shubhendu S. Mukherjee","doi":"10.1109/HPCA.2007.346195","DOIUrl":null,"url":null,"abstract":"Fault screeners are a new breed of fault identification technique that can probabilistically detect if a transient fault has affected the state of a processor. We demonstrate that fault screeners function because of two key characteristics. First, we show that much of the intermediate data generated by a program inherently falls within certain consistent bounds. Second, we observe that these bounds are often violated by the introduction of a fault. Thus, fault screeners can identify faults by directly watching for any data inconsistencies arising in an application's behavior. We present an idealized algorithm capable of identifying over 85% of injected faults on the SpecInt suite and over 75% overall. Further, in a realistic implementation on a simulated Pentium-III-like processor, about half of the errors due to injected faults are identified while still in speculative state. Errors detected this early can be eliminated by a pipeline flush. In this paper, we present several hardware-based versions of this screening algorithm and show that flushing the pipeline every time the hardware screener triggers reduces overall performance by less than 1%","PeriodicalId":177324,"journal":{"name":"2007 IEEE 13th International Symposium on High Performance Computer Architecture","volume":"50 3","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"122","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 IEEE 13th International Symposium on High Performance Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2007.346195","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 122

Abstract

Fault screeners are a new breed of fault identification technique that can probabilistically detect if a transient fault has affected the state of a processor. We demonstrate that fault screeners function because of two key characteristics. First, we show that much of the intermediate data generated by a program inherently falls within certain consistent bounds. Second, we observe that these bounds are often violated by the introduction of a fault. Thus, fault screeners can identify faults by directly watching for any data inconsistencies arising in an application's behavior. We present an idealized algorithm capable of identifying over 85% of injected faults on the SpecInt suite and over 75% overall. Further, in a realistic implementation on a simulated Pentium-III-like processor, about half of the errors due to injected faults are identified while still in speculative state. Errors detected this early can be eliminated by a pipeline flush. In this paper, we present several hardware-based versions of this screening algorithm and show that flushing the pipeline every time the hardware screener triggers reduces overall performance by less than 1%

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于扰动的故障筛选

故障筛选是一种新型的故障识别技术，它可以概率地检测出一个暂态故障是否影响了处理器的状态。我们证明故障筛选器的功能是因为两个关键特征。首先，我们证明了程序生成的大部分中间数据固有地落在某些一致的范围内。其次，我们观察到这些界限经常被引入错误所违反。因此，故障筛选器可以通过直接观察应用程序行为中产生的任何数据不一致来识别故障。我们提出了一种理想化的算法，能够识别SpecInt套件中超过85%的注入故障，总体上超过75%。此外，在模拟的类似pentium - iii的处理器上的实际实现中，大约有一半由注入故障引起的错误在仍然处于推测状态时被识别出来。早期检测到的错误可以通过管道清除来消除。在本文中，我们提出了这种筛选算法的几个基于硬件的版本，并表明每次硬件筛选器触发时刷新管道会使整体性能降低不到1%

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2007 IEEE 13th International Symposium on High Performance Computer Architecture

自引率

0.00%

发文量