Dependable DPU Architectures on AMD-Xilinx Versal Adaptive SoCs for Space Applications

IF 5.7 2区 计算机科学 Q1 ENGINEERING, AEROSPACE IEEE Transactions on Aerospace and Electronic Systems Pub Date : 2025-01-10 DOI:10.1109/TAES.2025.3527938
Noah Perryman;Sebastian Sabogal;Christopher Wilson;Alan George
{"title":"Dependable DPU Architectures on AMD-Xilinx Versal Adaptive SoCs for Space Applications","authors":"Noah Perryman;Sebastian Sabogal;Christopher Wilson;Alan George","doi":"10.1109/TAES.2025.3527938","DOIUrl":null,"url":null,"abstract":"Space-computing platforms have considerable performance restrictions that are imposed by the limited onboard-processing capabilities provided by heritage flight computers. Conversely, there is a growing need for increased system autonomy enabled by deep learning (DL) to maximize performance and minimize the burden of ground-based processing. To address these limitations, domain-specific architectures with specialized acceleration hardware, such as the AMD-Xilinx Versal adaptive System-on-Chip (SoC), have been developed. This heterogeneous platform contains significant energy-efficient compute capabilities, but it is susceptible to radiation-induced effects. Therefore, the dependability of the device must be characterized prior to inclusion on future space-computing platforms. In addition, several popular DL models exist, but each model provides unique accuracy, performance, energy-efficiency, and dependability characteristics that must be thoroughly understood. In this research, we propose a methodology for evaluating and analyzing dependable computing on AMD-Xilinx deep learning processing unit (DPU) architectures on Versal SoCs using simulated radiation-induced single-event effects through memory-mapped data fault injection. Using our proposed methodology, we perform this fault injection on three Versal AI Core and two Versal AI Edge DPU architectures and evaluate system performance, power consumption, energy efficiency, resource utilization, and dependability on three deployed DL models. Due to innate DPU configurability, our analysis also explores adding varying degrees of triple modular redundancy (TMR) through different DPU architectural features for increased dependability. We leveraged our fault-injection methodology to demonstrate a 24.65× average reduction in critical bits of our TMR DPU architectures compared to the unmitigated baseline, showcasing a significant increase in system dependability.","PeriodicalId":13157,"journal":{"name":"IEEE Transactions on Aerospace and Electronic Systems","volume":"61 3","pages":"6629-6646"},"PeriodicalIF":5.7000,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Aerospace and Electronic Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10836953/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, AEROSPACE","Score":null,"Total":0}
引用次数: 0

Abstract

Space-computing platforms have considerable performance restrictions that are imposed by the limited onboard-processing capabilities provided by heritage flight computers. Conversely, there is a growing need for increased system autonomy enabled by deep learning (DL) to maximize performance and minimize the burden of ground-based processing. To address these limitations, domain-specific architectures with specialized acceleration hardware, such as the AMD-Xilinx Versal adaptive System-on-Chip (SoC), have been developed. This heterogeneous platform contains significant energy-efficient compute capabilities, but it is susceptible to radiation-induced effects. Therefore, the dependability of the device must be characterized prior to inclusion on future space-computing platforms. In addition, several popular DL models exist, but each model provides unique accuracy, performance, energy-efficiency, and dependability characteristics that must be thoroughly understood. In this research, we propose a methodology for evaluating and analyzing dependable computing on AMD-Xilinx deep learning processing unit (DPU) architectures on Versal SoCs using simulated radiation-induced single-event effects through memory-mapped data fault injection. Using our proposed methodology, we perform this fault injection on three Versal AI Core and two Versal AI Edge DPU architectures and evaluate system performance, power consumption, energy efficiency, resource utilization, and dependability on three deployed DL models. Due to innate DPU configurability, our analysis also explores adding varying degrees of triple modular redundancy (TMR) through different DPU architectural features for increased dependability. We leveraged our fault-injection methodology to demonstrate a 24.65× average reduction in critical bits of our TMR DPU architectures compared to the unmitigated baseline, showcasing a significant increase in system dependability.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于空间应用的AMD-Xilinx通用自适应soc的可靠DPU架构
由于传统飞行计算机提供的机载处理能力有限,空间计算平台具有相当大的性能限制。相反,人们越来越需要通过深度学习(DL)来提高系统的自主性,以最大限度地提高性能并减少地面处理的负担。为了解决这些限制,已经开发了具有专用加速硬件的特定领域架构,例如AMD-Xilinx Versal自适应片上系统(SoC)。这种异构平台包含显著的节能计算能力,但容易受到辐射诱导效应的影响。因此,在纳入未来的空间计算平台之前,必须对设备的可靠性进行鉴定。此外,存在几种流行的深度学习模型,但每个模型都提供独特的准确性、性能、能效和可靠性特征,必须彻底理解这些特征。在这项研究中,我们提出了一种评估和分析通用soc上AMD-Xilinx深度学习处理单元(DPU)架构的可靠计算的方法,该方法通过内存映射数据故障注入来模拟辐射引起的单事件效应。使用我们提出的方法,我们在三个Versal AI Core和两个Versal AI Edge DPU架构上执行了这种故障注入,并在三个部署的DL模型上评估了系统性能、功耗、能源效率、资源利用率和可靠性。由于DPU固有的可配置性,我们的分析还探讨了通过不同的DPU架构特性添加不同程度的三模块冗余(TMR)以提高可靠性。我们利用我们的故障注入方法证明,与未缓解的基线相比,我们的TMR DPU架构的关键位平均减少了24.65倍,显示了系统可靠性的显着提高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
7.80
自引率
13.60%
发文量
433
审稿时长
8.7 months
期刊介绍: IEEE Transactions on Aerospace and Electronic Systems focuses on the organization, design, development, integration, and operation of complex systems for space, air, ocean, or ground environment. These systems include, but are not limited to, navigation, avionics, spacecraft, aerospace power, radar, sonar, telemetry, defense, transportation, automated testing, and command and control.
期刊最新文献
Multidimensional Assessment of the VMF3-FC and Its Application in PPP-IAR EdgeEnhance-YOLO: A Lightweight Small Object Detection Model with Multi-Dimensional Edge Enhancement Neural Network Aided Information Filtering for Model Uncertainty Robust Direct Position Estimation Based on Grid Space Reduction and Data Association in Complex Environments Adaptive Super-Twisting Kernel Dynamic Programming: Energy Optimal and Robust Theory Application for Pursuit-Evasion Game System
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1