提高软错误可靠性的GPU体系结构感知指令调度

Haeseung Lee;Mohammad Abdullah Al Faruque
{"title":"提高软错误可靠性的GPU体系结构感知指令调度","authors":"Haeseung Lee;Mohammad Abdullah Al Faruque","doi":"10.1109/TMSCS.2017.2667661","DOIUrl":null,"url":null,"abstract":"The demand for low-power and high-performance computing has been driving the semiconductor industry for decades. The semiconductor technology has been scaled down to satisfy these demands. At the same time, the semiconductor technology has faced severe reliability challenges like soft-error. Research has been conducted to improve the soft-error reliability of the GPU, which has been improved by using various methodologies such as redundancy methodologies. However, the GPU compiler has yet to be considered for improving the soft-error reliability of the GPU. In this paper, in order to improve the soft-error reliability of the GPU, we propose a novel GPU architecture aware compilation methodology. The proposed methodology jointly considers the parallel behavior of the GPU hardware and the applications, and minimizes the vulnerability of the GPU applications during instruction scheduling. In addition, the proposed methodology is able to complement any hardware based soft-error reliability improvement techniques. We compared our compilation methodology with the state-of-the-art soft-error reliability aware techniques and the performance aware instruction scheduling. We have injected the soft-errors during the experiments and have compared the number of correct executions that have no erroneous output. Our methodology requires less performance and power overhead than the state-of-the-art soft-error reliability methodologies in most cases. Compilation time overhead of our methodology is 8.13 seconds on average. The experimental results show that our methodology improves the soft-error reliability by 23 percent and 12 percent (up to 64 percent and 52 percent) compared to the state-of-the-art soft-error reliability and performance aware compilation techniques, respectively. Moreover, we have shown that the soft-error reliability of a GPU is not related to the performance, but to the fine-grained timing behavior of an application.","PeriodicalId":100643,"journal":{"name":"IEEE Transactions on Multi-Scale Computing Systems","volume":"3 2","pages":"86-99"},"PeriodicalIF":0.0000,"publicationDate":"2017-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TMSCS.2017.2667661","citationCount":"6","resultStr":"{\"title\":\"GPU Architecture Aware Instruction Scheduling for Improving Soft-Error Reliability\",\"authors\":\"Haeseung Lee;Mohammad Abdullah Al Faruque\",\"doi\":\"10.1109/TMSCS.2017.2667661\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The demand for low-power and high-performance computing has been driving the semiconductor industry for decades. The semiconductor technology has been scaled down to satisfy these demands. At the same time, the semiconductor technology has faced severe reliability challenges like soft-error. Research has been conducted to improve the soft-error reliability of the GPU, which has been improved by using various methodologies such as redundancy methodologies. However, the GPU compiler has yet to be considered for improving the soft-error reliability of the GPU. In this paper, in order to improve the soft-error reliability of the GPU, we propose a novel GPU architecture aware compilation methodology. The proposed methodology jointly considers the parallel behavior of the GPU hardware and the applications, and minimizes the vulnerability of the GPU applications during instruction scheduling. In addition, the proposed methodology is able to complement any hardware based soft-error reliability improvement techniques. We compared our compilation methodology with the state-of-the-art soft-error reliability aware techniques and the performance aware instruction scheduling. We have injected the soft-errors during the experiments and have compared the number of correct executions that have no erroneous output. Our methodology requires less performance and power overhead than the state-of-the-art soft-error reliability methodologies in most cases. Compilation time overhead of our methodology is 8.13 seconds on average. The experimental results show that our methodology improves the soft-error reliability by 23 percent and 12 percent (up to 64 percent and 52 percent) compared to the state-of-the-art soft-error reliability and performance aware compilation techniques, respectively. Moreover, we have shown that the soft-error reliability of a GPU is not related to the performance, but to the fine-grained timing behavior of an application.\",\"PeriodicalId\":100643,\"journal\":{\"name\":\"IEEE Transactions on Multi-Scale Computing Systems\",\"volume\":\"3 2\",\"pages\":\"86-99\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-02-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1109/TMSCS.2017.2667661\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Multi-Scale Computing Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/7851010/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multi-Scale Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/7851010/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

摘要

几十年来,对低功耗和高性能计算的需求一直在推动半导体行业的发展。为了满足这些需求,半导体技术已经缩小了规模。与此同时,半导体技术也面临着严重的可靠性挑战,如软错误。已经进行了提高GPU的软错误可靠性的研究,通过使用诸如冗余方法之类的各种方法来提高软错误可靠性。然而,GPU编译器尚未被考虑用于提高GPU的软错误可靠性。在本文中,为了提高GPU的软错误可靠性,我们提出了一种新的GPU架构感知编译方法。所提出的方法联合考虑了GPU硬件和应用程序的并行行为,并最大限度地减少了GPU应用程序在指令调度过程中的漏洞。此外,所提出的方法能够补充任何基于硬件的软错误可靠性改进技术。我们将我们的编译方法与最先进的软错误可靠性感知技术和性能感知指令调度进行了比较。我们在实验过程中注入了软错误,并比较了没有错误输出的正确执行次数。在大多数情况下,我们的方法比最先进的软错误可靠性方法需要更少的性能和功率开销。我们方法的编译时间开销平均为8.13秒。实验结果表明,与最先进的软错误可靠性和性能感知编译技术相比,我们的方法将软错误可靠性分别提高了23%和12%(高达64%和52%)。此外,我们已经表明,GPU的软错误可靠性与性能无关,而是与应用程序的细粒度时序行为有关。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
GPU Architecture Aware Instruction Scheduling for Improving Soft-Error Reliability
The demand for low-power and high-performance computing has been driving the semiconductor industry for decades. The semiconductor technology has been scaled down to satisfy these demands. At the same time, the semiconductor technology has faced severe reliability challenges like soft-error. Research has been conducted to improve the soft-error reliability of the GPU, which has been improved by using various methodologies such as redundancy methodologies. However, the GPU compiler has yet to be considered for improving the soft-error reliability of the GPU. In this paper, in order to improve the soft-error reliability of the GPU, we propose a novel GPU architecture aware compilation methodology. The proposed methodology jointly considers the parallel behavior of the GPU hardware and the applications, and minimizes the vulnerability of the GPU applications during instruction scheduling. In addition, the proposed methodology is able to complement any hardware based soft-error reliability improvement techniques. We compared our compilation methodology with the state-of-the-art soft-error reliability aware techniques and the performance aware instruction scheduling. We have injected the soft-errors during the experiments and have compared the number of correct executions that have no erroneous output. Our methodology requires less performance and power overhead than the state-of-the-art soft-error reliability methodologies in most cases. Compilation time overhead of our methodology is 8.13 seconds on average. The experimental results show that our methodology improves the soft-error reliability by 23 percent and 12 percent (up to 64 percent and 52 percent) compared to the state-of-the-art soft-error reliability and performance aware compilation techniques, respectively. Moreover, we have shown that the soft-error reliability of a GPU is not related to the performance, but to the fine-grained timing behavior of an application.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Monolithic 3D Hybrid Architecture for Energy-Efficient Computation H$^2$OEIN: A Hierarchical Hybrid Optical/Electrical Interconnection Network for Exascale Computing Systems A Novel, Simulator for Heterogeneous Cloud Systems that Incorporate Custom Hardware Accelerators Enforcing End-to-End I/O Policies for Scientific Workflows Using Software-Defined Storage Resource Enclaves Low Register-Complexity Systolic Digit-Serial Multiplier Over $GF(2^m)$ Based on Trinomials
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1