编译器扩展到可靠的多核处理器

Y. Nezzari, C. Bridges
{"title":"编译器扩展到可靠的多核处理器","authors":"Y. Nezzari, C. Bridges","doi":"10.1109/AERO.2017.7943714","DOIUrl":null,"url":null,"abstract":"The current trend in commercial processors is producing multi-core architectures which pose both an opportunity and a challenge for future space based processing. The opportunity is how to leverage multi-core processors for high intensity computing applications and thus provide an order of magnitude increase in onboard processing capability with less size, mass, and power. The challenge is to provide the requisite safety and reliability in an extremely challenging radiation environment. The objective is to advance from multiple single processor systems typically flown to a fault tolerant multi-core system. Software based methods for multi-core processor fault tolerance to single event effects (SEEs) causing interrupts or ‘bit-flips’ are investigated and we propose to utilize additional cores and memory resources together with newly developed software protection techniques. This work also assesses the optimal trade space between reliability and performance. Our work is based on the modern compiler “LLVM” as it is ported to many architectures, where we implement optimization passes that enable automatic addition of protection techniques including N-modular redundancy (NMR) and error detection and correction (EDAC) at assembly/instruction level to languages supported. The optimization passes modify the intermediate representation of the source code meaning it could be applied for any high level language, and any processor architecture supported by the LLVM framework. In our initial experiments, we implement separately triple modular redundancy (TMR) and error detection and correction codes including (Hamming, BCH) at instruction level. We combine these two methods for critical applications, where we first TMR our instructions, and then use EDAC as a further measure, when TMR is not able to correct the errors originating from the SEE. Our initial experiments show good performance (about 10% overhead) when protecting the memory of code using double error detection single error correction hamming code and TMR (Triple modular redundancy), further work is needed to improve the performance when protecting the memory of code using the BCH code. This work would be highly valuable, both to satellites/space but also in general computing such as in in aircraft, automotive, server farms, and medical equipment (or anywhere that needs safety critical performance) as hardware gets smaller and more susceptible.","PeriodicalId":224475,"journal":{"name":"2017 IEEE Aerospace Conference","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Compiler extensions towards reliable multicore processors\",\"authors\":\"Y. Nezzari, C. Bridges\",\"doi\":\"10.1109/AERO.2017.7943714\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The current trend in commercial processors is producing multi-core architectures which pose both an opportunity and a challenge for future space based processing. The opportunity is how to leverage multi-core processors for high intensity computing applications and thus provide an order of magnitude increase in onboard processing capability with less size, mass, and power. The challenge is to provide the requisite safety and reliability in an extremely challenging radiation environment. The objective is to advance from multiple single processor systems typically flown to a fault tolerant multi-core system. Software based methods for multi-core processor fault tolerance to single event effects (SEEs) causing interrupts or ‘bit-flips’ are investigated and we propose to utilize additional cores and memory resources together with newly developed software protection techniques. This work also assesses the optimal trade space between reliability and performance. Our work is based on the modern compiler “LLVM” as it is ported to many architectures, where we implement optimization passes that enable automatic addition of protection techniques including N-modular redundancy (NMR) and error detection and correction (EDAC) at assembly/instruction level to languages supported. The optimization passes modify the intermediate representation of the source code meaning it could be applied for any high level language, and any processor architecture supported by the LLVM framework. In our initial experiments, we implement separately triple modular redundancy (TMR) and error detection and correction codes including (Hamming, BCH) at instruction level. We combine these two methods for critical applications, where we first TMR our instructions, and then use EDAC as a further measure, when TMR is not able to correct the errors originating from the SEE. Our initial experiments show good performance (about 10% overhead) when protecting the memory of code using double error detection single error correction hamming code and TMR (Triple modular redundancy), further work is needed to improve the performance when protecting the memory of code using the BCH code. This work would be highly valuable, both to satellites/space but also in general computing such as in in aircraft, automotive, server farms, and medical equipment (or anywhere that needs safety critical performance) as hardware gets smaller and more susceptible.\",\"PeriodicalId\":224475,\"journal\":{\"name\":\"2017 IEEE Aerospace Conference\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-03-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE Aerospace Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AERO.2017.7943714\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Aerospace Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AERO.2017.7943714","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

目前商用处理器的趋势是生产多核架构,这对未来的空间处理既是机遇也是挑战。机会在于如何利用多核处理器进行高强度计算应用,从而以更小的尺寸、质量和功耗提供板载处理能力的数量级增长。面临的挑战是在极具挑战性的辐射环境中提供必要的安全性和可靠性。其目标是从通常的多个单处理器系统发展到容错的多核系统。研究了基于软件的多核处理器对导致中断或“位翻转”的单事件效应(SEEs)容错的方法,并建议利用额外的内核和内存资源以及新开发的软件保护技术。这项工作还评估了可靠性和性能之间的最佳交易空间。我们的工作是基于现代编译器“LLVM”,因为它被移植到许多架构中,在那里我们实现了优化通道,可以自动添加保护技术,包括n模块冗余(NMR)和错误检测和纠正(EDAC)在汇编/指令级别支持的语言。优化通过修改源代码的中间表示,这意味着它可以应用于任何高级语言,以及LLVM框架支持的任何处理器体系结构。在我们最初的实验中,我们在指令级分别实现了三模冗余(TMR)和错误检测和纠错码,包括(Hamming, BCH)。我们将这两种方法结合起来用于关键应用,在这些应用中,我们首先TMR我们的指令,然后使用EDAC作为进一步的措施,当TMR无法纠正来自SEE的错误时。我们的初步实验表明,使用双错误检测单错误校正汉明码和TMR(三模冗余)保护代码内存时,性能良好(约10%的开销),使用BCH码保护代码内存时,需要进一步的工作来提高性能。随着硬件变得越来越小,越来越容易受到影响,这项工作对卫星/太空以及飞机、汽车、服务器群和医疗设备(或任何需要安全关键性能的地方)等一般计算都非常有价值。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Compiler extensions towards reliable multicore processors
The current trend in commercial processors is producing multi-core architectures which pose both an opportunity and a challenge for future space based processing. The opportunity is how to leverage multi-core processors for high intensity computing applications and thus provide an order of magnitude increase in onboard processing capability with less size, mass, and power. The challenge is to provide the requisite safety and reliability in an extremely challenging radiation environment. The objective is to advance from multiple single processor systems typically flown to a fault tolerant multi-core system. Software based methods for multi-core processor fault tolerance to single event effects (SEEs) causing interrupts or ‘bit-flips’ are investigated and we propose to utilize additional cores and memory resources together with newly developed software protection techniques. This work also assesses the optimal trade space between reliability and performance. Our work is based on the modern compiler “LLVM” as it is ported to many architectures, where we implement optimization passes that enable automatic addition of protection techniques including N-modular redundancy (NMR) and error detection and correction (EDAC) at assembly/instruction level to languages supported. The optimization passes modify the intermediate representation of the source code meaning it could be applied for any high level language, and any processor architecture supported by the LLVM framework. In our initial experiments, we implement separately triple modular redundancy (TMR) and error detection and correction codes including (Hamming, BCH) at instruction level. We combine these two methods for critical applications, where we first TMR our instructions, and then use EDAC as a further measure, when TMR is not able to correct the errors originating from the SEE. Our initial experiments show good performance (about 10% overhead) when protecting the memory of code using double error detection single error correction hamming code and TMR (Triple modular redundancy), further work is needed to improve the performance when protecting the memory of code using the BCH code. This work would be highly valuable, both to satellites/space but also in general computing such as in in aircraft, automotive, server farms, and medical equipment (or anywhere that needs safety critical performance) as hardware gets smaller and more susceptible.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Schedule and program The search for exoplanets using ultra-long wavelength radio astronomy Molecular analyzer for Complex Refractory Organic-rich Surfaces (MACROS) GPU accelerated multispectral EO imagery optimised CCSDS-123 lossless compression implementation Ground based test verification of a nonlinear vibration isolation system for cryocoolers of the Soft X-ray Spectrometer (SXS) onboard ASTRO-H (Hitomi)
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1