Assuring application-level correctness against soft errors

J. Cong, Karthik Gururaj
{"title":"Assuring application-level correctness against soft errors","authors":"J. Cong, Karthik Gururaj","doi":"10.1109/ICCAD.2011.6105319","DOIUrl":null,"url":null,"abstract":"Traditionally, research in fault tolerance has required architectural state to be numerically perfect for program execution to be correct. However, in many programs, even if execution is not 100% numerically correct, the program can still appear to execute correctly from the user's perspective. To quantify user satisfaction, application-level fidelity metrics (such as PSNR) can be used. The output for such applications is defined to be correct if the fidelity metrics satisfy a certain threshold. However, such applications still contain instructions whose outputs are critical — i.e. their correctness decides if the overall quality of the program output is acceptable. In this paper, we present an analysis technique for identifying such critical program segments. More importantly, our technique is capable of guaranteeing application-level correctness through a combination of static analysis and runtime monitoring. Our static analysis consists of data flow analysis followed by control flow analysis to find static critical instructions which affect several instructions. Critical instructions are further refined into likely non-critical and likely critical sets in a profiling phase. At runtime, we use a monitoring scheme to monitor likely non-critical instructions and take remedial actions if some likely non-critical instructions become critical. Based on this analysis, we minimize the number of instructions that are duplicated and checked at runtime using a software-based fault detection and recovery technique [20]. Put together, our approach can lead to 22% average energy savings for multimedia applications while guaranteeing application-level correctness, when compared to a recent work [9], which cannot guarantee application-level correctness. Comparing to the approach proposed in [20] which guarantees both application-level and numerical correctness, our method achieves 79% energy reduction.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"24 1","pages":"150-157"},"PeriodicalIF":0.0000,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"47","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCAD.2011.6105319","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 47

Abstract

Traditionally, research in fault tolerance has required architectural state to be numerically perfect for program execution to be correct. However, in many programs, even if execution is not 100% numerically correct, the program can still appear to execute correctly from the user's perspective. To quantify user satisfaction, application-level fidelity metrics (such as PSNR) can be used. The output for such applications is defined to be correct if the fidelity metrics satisfy a certain threshold. However, such applications still contain instructions whose outputs are critical — i.e. their correctness decides if the overall quality of the program output is acceptable. In this paper, we present an analysis technique for identifying such critical program segments. More importantly, our technique is capable of guaranteeing application-level correctness through a combination of static analysis and runtime monitoring. Our static analysis consists of data flow analysis followed by control flow analysis to find static critical instructions which affect several instructions. Critical instructions are further refined into likely non-critical and likely critical sets in a profiling phase. At runtime, we use a monitoring scheme to monitor likely non-critical instructions and take remedial actions if some likely non-critical instructions become critical. Based on this analysis, we minimize the number of instructions that are duplicated and checked at runtime using a software-based fault detection and recovery technique [20]. Put together, our approach can lead to 22% average energy savings for multimedia applications while guaranteeing application-level correctness, when compared to a recent work [9], which cannot guarantee application-level correctness. Comparing to the approach proposed in [20] which guarantees both application-level and numerical correctness, our method achieves 79% energy reduction.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
确保应用程序级别对软错误的正确性
传统上,对容错的研究要求体系结构状态在数字上是完美的,以保证程序的正确执行。然而,在许多程序中,即使执行不是100%的数字正确,从用户的角度来看,程序仍然可以正确执行。为了量化用户满意度,可以使用应用级保真度度量(如PSNR)。如果保真度度量满足某个阈值,则定义此类应用程序的输出是正确的。然而,这样的应用程序仍然包含输出至关重要的指令——即它们的正确性决定了程序输出的整体质量是否可以接受。在本文中,我们提出了一种分析技术来识别这样的关键程序段。更重要的是,我们的技术能够通过静态分析和运行时监视的组合来保证应用程序级的正确性。我们的静态分析包括数据流分析,然后是控制流分析,以找到影响多个指令的静态关键指令。在分析阶段,关键指令进一步细化为可能的非关键集和可能的关键集。在运行时,我们使用监视方案来监视可能的非关键指令,并在一些可能的非关键指令变得关键时采取补救措施。基于这一分析,我们使用基于软件的故障检测和恢复技术,最大限度地减少了在运行时重复和检查的指令数量[20]。总的来说,我们的方法可以为多媒体应用程序平均节省22%的能源,同时保证应用程序级的正确性,而最近的一项工作[9]不能保证应用程序级的正确性。与[20]中提出的同时保证应用层和数值正确性的方法相比,我们的方法减少了79%的能量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A framework for accelerating neuromorphic-vision algorithms on FPGAs Alternative design methodologies for the next generation logic switch Property-specific sequential invariant extraction for SAT-based unbounded model checking A corner stitching compliant B∗-tree representation and its applications to analog placement Heterogeneous B∗-trees for analog placement with symmetry and regularity considerations
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1