提高软错误可靠性的GPU体系结构感知指令调度

IEEE Transactions on Multi-Scale Computing Systems Pub Date : 2017-02-13 DOI:10.1109/TMSCS.2017.2667661

Haeseung Lee;Mohammad Abdullah Al Faruque

{"title":"提高软错误可靠性的GPU体系结构感知指令调度","authors":"Haeseung Lee;Mohammad Abdullah Al Faruque","doi":"10.1109/TMSCS.2017.2667661","DOIUrl":null,"url":null,"abstract":"The demand for low-power and high-performance computing has been driving the semiconductor industry for decades. The semiconductor technology has been scaled down to satisfy these demands. At the same time, the semiconductor technology has faced severe reliability challenges like soft-error. Research has been conducted to improve the soft-error reliability of the GPU, which has been improved by using various methodologies such as redundancy methodologies. However, the GPU compiler has yet to be considered for improving the soft-error reliability of the GPU. In this paper, in order to improve the soft-error reliability of the GPU, we propose a novel GPU architecture aware compilation methodology. The proposed methodology jointly considers the parallel behavior of the GPU hardware and the applications, and minimizes the vulnerability of the GPU applications during instruction scheduling. In addition, the proposed methodology is able to complement any hardware based soft-error reliability improvement techniques. We compared our compilation methodology with the state-of-the-art soft-error reliability aware techniques and the performance aware instruction scheduling. We have injected the soft-errors during the experiments and have compared the number of correct executions that have no erroneous output. Our methodology requires less performance and power overhead than the state-of-the-art soft-error reliability methodologies in most cases. Compilation time overhead of our methodology is 8.13 seconds on average. The experimental results show that our methodology improves the soft-error reliability by 23 percent and 12 percent (up to 64 percent and 52 percent) compared to the state-of-the-art soft-error reliability and performance aware compilation techniques, respectively. Moreover, we have shown that the soft-error reliability of a GPU is not related to the performance, but to the fine-grained timing behavior of an application.","PeriodicalId":100643,"journal":{"name":"IEEE Transactions on Multi-Scale Computing Systems","volume":"3 2","pages":"86-99"},"PeriodicalIF":0.0000,"publicationDate":"2017-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TMSCS.2017.2667661","citationCount":"6","resultStr":"{\"title\":\"GPU Architecture Aware Instruction Scheduling for Improving Soft-Error Reliability\",\"authors\":\"Haeseung Lee;Mohammad Abdullah Al Faruque\",\"doi\":\"10.1109/TMSCS.2017.2667661\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The demand for low-power and high-performance computing has been driving the semiconductor industry for decades. The semiconductor technology has been scaled down to satisfy these demands. At the same time, the semiconductor technology has faced severe reliability challenges like soft-error. Research has been conducted to improve the soft-error reliability of the GPU, which has been improved by using various methodologies such as redundancy methodologies. However, the GPU compiler has yet to be considered for improving the soft-error reliability of the GPU. In this paper, in order to improve the soft-error reliability of the GPU, we propose a novel GPU architecture aware compilation methodology. The proposed methodology jointly considers the parallel behavior of the GPU hardware and the applications, and minimizes the vulnerability of the GPU applications during instruction scheduling. In addition, the proposed methodology is able to complement any hardware based soft-error reliability improvement techniques. We compared our compilation methodology with the state-of-the-art soft-error reliability aware techniques and the performance aware instruction scheduling. We have injected the soft-errors during the experiments and have compared the number of correct executions that have no erroneous output. Our methodology requires less performance and power overhead than the state-of-the-art soft-error reliability methodologies in most cases. Compilation time overhead of our methodology is 8.13 seconds on average. The experimental results show that our methodology improves the soft-error reliability by 23 percent and 12 percent (up to 64 percent and 52 percent) compared to the state-of-the-art soft-error reliability and performance aware compilation techniques, respectively. Moreover, we have shown that the soft-error reliability of a GPU is not related to the performance, but to the fine-grained timing behavior of an application.\",\"PeriodicalId\":100643,\"journal\":{\"name\":\"IEEE Transactions on Multi-Scale Computing Systems\",\"volume\":\"3 2\",\"pages\":\"86-99\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-02-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1109/TMSCS.2017.2667661\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Multi-Scale Computing Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/7851010/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multi-Scale Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/7851010/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

几十年来，对低功耗和高性能计算的需求一直在推动半导体行业的发展。为了满足这些需求，半导体技术已经缩小了规模。与此同时，半导体技术也面临着严重的可靠性挑战，如软错误。已经进行了提高GPU的软错误可靠性的研究，通过使用诸如冗余方法之类的各种方法来提高软错误可靠性。然而，GPU编译器尚未被考虑用于提高GPU的软错误可靠性。在本文中，为了提高GPU的软错误可靠性，我们提出了一种新的GPU架构感知编译方法。所提出的方法联合考虑了GPU硬件和应用程序的并行行为，并最大限度地减少了GPU应用程序在指令调度过程中的漏洞。此外，所提出的方法能够补充任何基于硬件的软错误可靠性改进技术。我们将我们的编译方法与最先进的软错误可靠性感知技术和性能感知指令调度进行了比较。我们在实验过程中注入了软错误，并比较了没有错误输出的正确执行次数。在大多数情况下，我们的方法比最先进的软错误可靠性方法需要更少的性能和功率开销。我们方法的编译时间开销平均为8.13秒。实验结果表明，与最先进的软错误可靠性和性能感知编译技术相比，我们的方法将软错误可靠性分别提高了23%和12%（高达64%和52%）。此外，我们已经表明，GPU的软错误可靠性与性能无关，而是与应用程序的细粒度时序行为有关。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

GPU Architecture Aware Instruction Scheduling for Improving Soft-Error Reliability

The demand for low-power and high-performance computing has been driving the semiconductor industry for decades. The semiconductor technology has been scaled down to satisfy these demands. At the same time, the semiconductor technology has faced severe reliability challenges like soft-error. Research has been conducted to improve the soft-error reliability of the GPU, which has been improved by using various methodologies such as redundancy methodologies. However, the GPU compiler has yet to be considered for improving the soft-error reliability of the GPU. In this paper, in order to improve the soft-error reliability of the GPU, we propose a novel GPU architecture aware compilation methodology. The proposed methodology jointly considers the parallel behavior of the GPU hardware and the applications, and minimizes the vulnerability of the GPU applications during instruction scheduling. In addition, the proposed methodology is able to complement any hardware based soft-error reliability improvement techniques. We compared our compilation methodology with the state-of-the-art soft-error reliability aware techniques and the performance aware instruction scheduling. We have injected the soft-errors during the experiments and have compared the number of correct executions that have no erroneous output. Our methodology requires less performance and power overhead than the state-of-the-art soft-error reliability methodologies in most cases. Compilation time overhead of our methodology is 8.13 seconds on average. The experimental results show that our methodology improves the soft-error reliability by 23 percent and 12 percent (up to 64 percent and 52 percent) compared to the state-of-the-art soft-error reliability and performance aware compilation techniques, respectively. Moreover, we have shown that the soft-error reliability of a GPU is not related to the performance, but to the fine-grained timing behavior of an application.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Multi-Scale Computing Systems

自引率

0.00%

发文量