AN编码中交易容错性能研究

Proceedings of the Computing Frontiers Conference Pub Date : 2017-05-15 DOI:10.1145/3075564.3075565

Norman A. Rink, J. Castrillón

{"title":"AN编码中交易容错性能研究","authors":"Norman A. Rink, J. Castrillón","doi":"10.1145/3075564.3075565","DOIUrl":null,"url":null,"abstract":"Increasing rates of transient hardware faults pose a problem for computing applications. Current and future trends are likely to exacerbate this problem. When a transient fault occurs during program execution, data in the output can become corrupted. The severity of output corruptions depends on the application domain. Hence, different applications require different levels of fault tolerance. We present an LLVM-based AN encoder that can equip programs with an error detection mechanism at configurable levels of rigor. Based on our AN encoder, the trade-off between fault tolerance and runtime overhead is analyzed. It is found that, by suitably configuring our AN encoder, the runtime overhead can be reduced from 9.9x to 2.1x. At the same time, however, the probability that a hardware fault in the CPU will result in silent data corruption rises from 0.007 to over 0.022. The same probability for memory faults increases from 0.009 to over 0.032. It is further demonstrated, by applying different configurations of our AN encoder to the components of an arithmetic expression interpreter, that having fine-grained control over levels of fault tolerance can be beneficial.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Trading Fault Tolerance for Performance in AN Encoding\",\"authors\":\"Norman A. Rink, J. Castrillón\",\"doi\":\"10.1145/3075564.3075565\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Increasing rates of transient hardware faults pose a problem for computing applications. Current and future trends are likely to exacerbate this problem. When a transient fault occurs during program execution, data in the output can become corrupted. The severity of output corruptions depends on the application domain. Hence, different applications require different levels of fault tolerance. We present an LLVM-based AN encoder that can equip programs with an error detection mechanism at configurable levels of rigor. Based on our AN encoder, the trade-off between fault tolerance and runtime overhead is analyzed. It is found that, by suitably configuring our AN encoder, the runtime overhead can be reduced from 9.9x to 2.1x. At the same time, however, the probability that a hardware fault in the CPU will result in silent data corruption rises from 0.007 to over 0.022. The same probability for memory faults increases from 0.009 to over 0.032. It is further demonstrated, by applying different configurations of our AN encoder to the components of an arithmetic expression interpreter, that having fine-grained control over levels of fault tolerance can be beneficial.\",\"PeriodicalId\":398898,\"journal\":{\"name\":\"Proceedings of the Computing Frontiers Conference\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-05-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Computing Frontiers Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3075564.3075565\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Computing Frontiers Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3075564.3075565","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

不断增加的暂态硬件故障率给计算应用带来了问题。当前和未来的趋势可能会加剧这一问题。当程序执行过程中发生短暂故障时，输出中的数据可能会损坏。输出损坏的严重程度取决于应用程序域。因此，不同的应用程序需要不同级别的容错。我们提出了一个基于llvm的an编码器，它可以在可配置的严格级别上为程序配备错误检测机制。基于我们的编码器，分析了容错性和运行时开销之间的权衡。通过适当配置我们的AN编码器，可以将运行时开销从9.9倍降低到2.1倍。与此同时，CPU硬件故障导致数据静默损坏的概率从0.007上升到0.022以上。内存故障的相同概率从0.009增加到0.032以上。通过将AN编码器的不同配置应用于算术表达式解释器的组件，进一步证明了对容错级别进行细粒度控制是有益的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Trading Fault Tolerance for Performance in AN Encoding

Increasing rates of transient hardware faults pose a problem for computing applications. Current and future trends are likely to exacerbate this problem. When a transient fault occurs during program execution, data in the output can become corrupted. The severity of output corruptions depends on the application domain. Hence, different applications require different levels of fault tolerance. We present an LLVM-based AN encoder that can equip programs with an error detection mechanism at configurable levels of rigor. Based on our AN encoder, the trade-off between fault tolerance and runtime overhead is analyzed. It is found that, by suitably configuring our AN encoder, the runtime overhead can be reduced from 9.9x to 2.1x. At the same time, however, the probability that a hardware fault in the CPU will result in silent data corruption rises from 0.007 to over 0.022. The same probability for memory faults increases from 0.009 to over 0.032. It is further demonstrated, by applying different configurations of our AN encoder to the components of an arithmetic expression interpreter, that having fine-grained control over levels of fault tolerance can be beneficial.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the Computing Frontiers Conference

自引率

0.00%

发文量

期刊最新文献

Hardware Support for Secure Stream Processing in Cloud Environments Private inter-network routing for Wireless Sensor Networks and the Internet of Things Analytical Performance Modeling and Validation of Intel's Xeon Phi Architecture Design of S-boxes Defined with Cellular Automata Rules Cloud Workload Prediction by Means of Simulations