Juan-David Guerrero-Balaguera, Robert Limas Sierra, M. Reorda
{"title":"对GPU永久故障进行有效的故障仿真,用于cnn的可靠性估计","authors":"Juan-David Guerrero-Balaguera, Robert Limas Sierra, M. Reorda","doi":"10.1109/IOLTS56730.2022.9897823","DOIUrl":null,"url":null,"abstract":"Convolutional Neural Networks (CNNs) and Graphic Processing Units (GPUs) are now increasingly adopted in many cutting edge safety-critical applications. Consequently, it is crucial to evaluate the reliability of these systems, since the hardware can be affected by several phenomena (e.g., wear out of the device), producing permanent defects in the GPU. These defects may induce wrong outcomes in the CNN that may endanger the application. Traditionally, the study of the effects of permanent faults on CNNs has been approached by resorting to application-level fault injection (e.g., acting on the weights). However, this approach has restricted scope, and it may not reveal the actual vulnerabilities in the GPU device. Hence, a more accurate evaluation of the fault effects is required, considering more in-depth details of the device’s hardware. This work introduces a more elaborated experimental evaluation of the impact of GPU’s permanent faults on the reliability of a CNN by resorting to a Software-Implemented Fault Injection(SWIFI) strategy, considering faults at the hardware level. The results of the fault simulation campaigns we performed on the GPU data-path cores are compared with those at the application level, proving that the latter ones are generally optimistic.","PeriodicalId":274595,"journal":{"name":"2022 IEEE 28th International Symposium on On-Line Testing and Robust System Design (IOLTS)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Effective fault simulation of GPU’s permanent faults for reliability estimation of CNNs\",\"authors\":\"Juan-David Guerrero-Balaguera, Robert Limas Sierra, M. Reorda\",\"doi\":\"10.1109/IOLTS56730.2022.9897823\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Convolutional Neural Networks (CNNs) and Graphic Processing Units (GPUs) are now increasingly adopted in many cutting edge safety-critical applications. Consequently, it is crucial to evaluate the reliability of these systems, since the hardware can be affected by several phenomena (e.g., wear out of the device), producing permanent defects in the GPU. These defects may induce wrong outcomes in the CNN that may endanger the application. Traditionally, the study of the effects of permanent faults on CNNs has been approached by resorting to application-level fault injection (e.g., acting on the weights). However, this approach has restricted scope, and it may not reveal the actual vulnerabilities in the GPU device. Hence, a more accurate evaluation of the fault effects is required, considering more in-depth details of the device’s hardware. This work introduces a more elaborated experimental evaluation of the impact of GPU’s permanent faults on the reliability of a CNN by resorting to a Software-Implemented Fault Injection(SWIFI) strategy, considering faults at the hardware level. The results of the fault simulation campaigns we performed on the GPU data-path cores are compared with those at the application level, proving that the latter ones are generally optimistic.\",\"PeriodicalId\":274595,\"journal\":{\"name\":\"2022 IEEE 28th International Symposium on On-Line Testing and Robust System Design (IOLTS)\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 28th International Symposium on On-Line Testing and Robust System Design (IOLTS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IOLTS56730.2022.9897823\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 28th International Symposium on On-Line Testing and Robust System Design (IOLTS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IOLTS56730.2022.9897823","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Effective fault simulation of GPU’s permanent faults for reliability estimation of CNNs
Convolutional Neural Networks (CNNs) and Graphic Processing Units (GPUs) are now increasingly adopted in many cutting edge safety-critical applications. Consequently, it is crucial to evaluate the reliability of these systems, since the hardware can be affected by several phenomena (e.g., wear out of the device), producing permanent defects in the GPU. These defects may induce wrong outcomes in the CNN that may endanger the application. Traditionally, the study of the effects of permanent faults on CNNs has been approached by resorting to application-level fault injection (e.g., acting on the weights). However, this approach has restricted scope, and it may not reveal the actual vulnerabilities in the GPU device. Hence, a more accurate evaluation of the fault effects is required, considering more in-depth details of the device’s hardware. This work introduces a more elaborated experimental evaluation of the impact of GPU’s permanent faults on the reliability of a CNN by resorting to a Software-Implemented Fault Injection(SWIFI) strategy, considering faults at the hardware level. The results of the fault simulation campaigns we performed on the GPU data-path cores are compared with those at the application level, proving that the latter ones are generally optimistic.