FT-DeepNets: Fault-Tolerant Convolutional Neural Networks with Kernel-based Duplication

Iljoo Baek, Wei Chen, Zhihao Zhu, Soheil Samii, R. Rajkumar
{"title":"FT-DeepNets: Fault-Tolerant Convolutional Neural Networks with Kernel-based Duplication","authors":"Iljoo Baek, Wei Chen, Zhihao Zhu, Soheil Samii, R. Rajkumar","doi":"10.1109/WACV51458.2022.00194","DOIUrl":null,"url":null,"abstract":"Deep neural network (deepnet) applications play a crucial role in safety-critical systems such as autonomous vehicles (AVs). An AV must drive safely towards its destination, avoiding obstacles, and respond quickly when the vehicle must stop. Any transient errors in software calculations or hardware memory in these deepnet applications can potentially lead to dramatically incorrect results. Therefore, assessing and mitigating any transient errors and providing robust results are important for safety-critical systems. Previous research on this subject focused on detecting errors and then recovering from the errors by re-running the network. Other approaches were based on the extent of full network duplication such as the ensemble learning-based approach to boost system fault-tolerance by leveraging each model’s advantages. However, it is hard to detect errors in a deep neural network, and the computational overhead of full redundancy can be substantial.We first study the impact of the error types and locations in deepnets. We next focus on selecting which part should be duplicated using multiple ranking methods to measure the order of importance among neurons. We find that the duplication overhead for computation and memory is a trade-off between algorithmic performance and robustness. To achieve higher robustness with less system overhead, we present two error protection mechanisms that only duplicate parts of the network from critical neurons. Finally, we substantiate the practical feasibility of our approach and evaluate the improvement in the accuracy of a deepnet in the presence of errors. We demonstrate these results using a case study with real-world applications on an Nvidia GeForce RTX 2070Ti GPU and an Nvidia Xavier embedded platform used by automotive OEMs.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WACV51458.2022.00194","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Deep neural network (deepnet) applications play a crucial role in safety-critical systems such as autonomous vehicles (AVs). An AV must drive safely towards its destination, avoiding obstacles, and respond quickly when the vehicle must stop. Any transient errors in software calculations or hardware memory in these deepnet applications can potentially lead to dramatically incorrect results. Therefore, assessing and mitigating any transient errors and providing robust results are important for safety-critical systems. Previous research on this subject focused on detecting errors and then recovering from the errors by re-running the network. Other approaches were based on the extent of full network duplication such as the ensemble learning-based approach to boost system fault-tolerance by leveraging each model’s advantages. However, it is hard to detect errors in a deep neural network, and the computational overhead of full redundancy can be substantial.We first study the impact of the error types and locations in deepnets. We next focus on selecting which part should be duplicated using multiple ranking methods to measure the order of importance among neurons. We find that the duplication overhead for computation and memory is a trade-off between algorithmic performance and robustness. To achieve higher robustness with less system overhead, we present two error protection mechanisms that only duplicate parts of the network from critical neurons. Finally, we substantiate the practical feasibility of our approach and evaluate the improvement in the accuracy of a deepnet in the presence of errors. We demonstrate these results using a case study with real-world applications on an Nvidia GeForce RTX 2070Ti GPU and an Nvidia Xavier embedded platform used by automotive OEMs.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
FT-DeepNets:基于核复制的容错卷积神经网络
深度神经网络(deepnet)应用在自动驾驶汽车(AVs)等安全关键系统中发挥着至关重要的作用。自动驾驶汽车必须安全驶向目的地,避开障碍物,并在车辆必须停车时迅速做出反应。在这些深度网络应用程序中,软件计算或硬件内存中的任何短暂错误都可能导致严重错误的结果。因此,评估和减轻任何瞬态错误并提供可靠的结果对于安全关键系统非常重要。以往对该问题的研究主要集中在检测错误,然后通过重新运行网络从错误中恢复。其他方法基于完整网络复制的程度,例如基于集成学习的方法,通过利用每个模型的优势来提高系统容错性。然而,在深度神经网络中很难检测到错误,并且完全冗余的计算开销可能很大。我们首先研究了深度网络中误差类型和位置的影响。接下来,我们将重点关注使用多种排序方法来衡量神经元之间的重要性顺序,以选择应该复制的部分。我们发现计算和内存的重复开销是算法性能和鲁棒性之间的权衡。为了以更少的系统开销获得更高的鲁棒性,我们提出了两种错误保护机制,仅从关键神经元复制部分网络。最后,我们验证了我们的方法的实际可行性,并评估了在存在误差的情况下深度网络精度的提高。我们通过在汽车oem使用的Nvidia GeForce RTX 2070Ti GPU和Nvidia Xavier嵌入式平台上的实际应用案例研究来展示这些结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Unsupervised Learning for Human Sensing Using Radio Signals AirCamRTM: Enhancing Vehicle Detection for Efficient Aerial Camera-based Road Traffic Monitoring QUALIFIER: Question-Guided Self-Attentive Multimodal Fusion Network for Audio Visual Scene-Aware Dialog Transductive Weakly-Supervised Player Detection using Soccer Broadcast Videos Inpaint2Learn: A Self-Supervised Framework for Affordance Learning
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1