基于编译器管理软件的瞬态故障检测冗余多线程

Cheng Wang, Ho-Seop Kim, Youfeng Wu, V. Ying
{"title":"基于编译器管理软件的瞬态故障检测冗余多线程","authors":"Cheng Wang, Ho-Seop Kim, Youfeng Wu, V. Ying","doi":"10.1109/CGO.2007.7","DOIUrl":null,"url":null,"abstract":"As transistors become increasingly smaller and faster with tighter noise margins, modern processors are becoming increasingly more susceptible to transient hardware faults. Existing hardware-based redundant multi-threading (HRMT) approaches rely mostly on special-purpose hardware to replicate the program into redundant execution threads and compare their computation results. In this paper, we present a software-based redundant multi-threading (SRMT) approach for transient fault detection. Our SRMT technique uses compiler to automatically generate redundant threads so they can run on general-purpose chip multi-processors (CMPs). We exploit high-level program information available at compile time to optimize data communication between redundant threads. Furthermore, our software-based technique provides flexible program execution environment where the legacy binary codes and the reliability-enhanced codes can co-exist in a mix-and-match fashion, depending on the desired level of reliability and software compatibility. Our experimental results show that compiler analysis and optimization techniques can reduce data communication requirement by up to 88% of HRMT. With general-purpose intra-chip communication mechanisms in CMP machine, SRMT overhead can be as low as 19%. Moreover, SRMT technique achieves error coverage rates of 99.98% and 99.6% for SPEC CPU2000 integer and floating-point benchmarks, respectively. These results demonstrate the competitiveness of SRMT to HRMT approaches","PeriodicalId":244171,"journal":{"name":"International Symposium on Code Generation and Optimization (CGO'07)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"142","resultStr":"{\"title\":\"Compiler-Managed Software-based Redundant Multi-Threading for Transient Fault Detection\",\"authors\":\"Cheng Wang, Ho-Seop Kim, Youfeng Wu, V. Ying\",\"doi\":\"10.1109/CGO.2007.7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As transistors become increasingly smaller and faster with tighter noise margins, modern processors are becoming increasingly more susceptible to transient hardware faults. Existing hardware-based redundant multi-threading (HRMT) approaches rely mostly on special-purpose hardware to replicate the program into redundant execution threads and compare their computation results. In this paper, we present a software-based redundant multi-threading (SRMT) approach for transient fault detection. Our SRMT technique uses compiler to automatically generate redundant threads so they can run on general-purpose chip multi-processors (CMPs). We exploit high-level program information available at compile time to optimize data communication between redundant threads. Furthermore, our software-based technique provides flexible program execution environment where the legacy binary codes and the reliability-enhanced codes can co-exist in a mix-and-match fashion, depending on the desired level of reliability and software compatibility. Our experimental results show that compiler analysis and optimization techniques can reduce data communication requirement by up to 88% of HRMT. With general-purpose intra-chip communication mechanisms in CMP machine, SRMT overhead can be as low as 19%. Moreover, SRMT technique achieves error coverage rates of 99.98% and 99.6% for SPEC CPU2000 integer and floating-point benchmarks, respectively. These results demonstrate the competitiveness of SRMT to HRMT approaches\",\"PeriodicalId\":244171,\"journal\":{\"name\":\"International Symposium on Code Generation and Optimization (CGO'07)\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-03-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"142\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Symposium on Code Generation and Optimization (CGO'07)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CGO.2007.7\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Symposium on Code Generation and Optimization (CGO'07)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CGO.2007.7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 142

摘要

随着晶体管变得越来越小,越来越快,噪声范围越来越小,现代处理器越来越容易受到瞬态硬件故障的影响。现有的基于硬件的冗余多线程(HRMT)方法主要依靠专用硬件将程序复制到冗余执行线程中,并比较它们的计算结果。本文提出了一种基于软件的冗余多线程(SRMT)暂态故障检测方法。我们的SRMT技术使用编译器自动生成冗余线程,以便它们可以在通用芯片多处理器(cmp)上运行。我们利用编译时可用的高级程序信息来优化冗余线程之间的数据通信。此外,我们基于软件的技术提供了灵活的程序执行环境,其中遗留二进制代码和增强可靠性的代码可以以混合匹配的方式共存,这取决于期望的可靠性和软件兼容性级别。我们的实验结果表明,编译器分析和优化技术可以减少高达88%的数据通信需求。在CMP机中使用通用的片内通信机制,SRMT开销可以低至19%。此外,SRMT技术在SPEC CPU2000整数和浮点基准测试中分别实现了99.98%和99.6%的错误覆盖率。这些结果证明了SRMT与HRMT方法的竞争力
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Compiler-Managed Software-based Redundant Multi-Threading for Transient Fault Detection
As transistors become increasingly smaller and faster with tighter noise margins, modern processors are becoming increasingly more susceptible to transient hardware faults. Existing hardware-based redundant multi-threading (HRMT) approaches rely mostly on special-purpose hardware to replicate the program into redundant execution threads and compare their computation results. In this paper, we present a software-based redundant multi-threading (SRMT) approach for transient fault detection. Our SRMT technique uses compiler to automatically generate redundant threads so they can run on general-purpose chip multi-processors (CMPs). We exploit high-level program information available at compile time to optimize data communication between redundant threads. Furthermore, our software-based technique provides flexible program execution environment where the legacy binary codes and the reliability-enhanced codes can co-exist in a mix-and-match fashion, depending on the desired level of reliability and software compatibility. Our experimental results show that compiler analysis and optimization techniques can reduce data communication requirement by up to 88% of HRMT. With general-purpose intra-chip communication mechanisms in CMP machine, SRMT overhead can be as low as 19%. Moreover, SRMT technique achieves error coverage rates of 99.98% and 99.6% for SPEC CPU2000 integer and floating-point benchmarks, respectively. These results demonstrate the competitiveness of SRMT to HRMT approaches
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Evaluating Indirect Branch Handling Mechanisms in Software Dynamic Translation Systems Graph-Based Procedural Abstraction Parallel Programming Environment: A Key to Translating Tera-Scale Platforms into a Big Success Microarchitecture Sensitive Empirical Models for Compiler Optimizations Loop Optimization using Hierarchical Compilation and Kernel Decomposition
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1