预测DRAM中的单事件效应

Donald Kline, Stephen Longofono, R. Melhem, A. Jones
{"title":"预测DRAM中的单事件效应","authors":"Donald Kline, Stephen Longofono, R. Melhem, A. Jones","doi":"10.1109/DFT.2019.8875328","DOIUrl":null,"url":null,"abstract":"The ability to leverage commodity memory in harsh environments due to radiation has the potential advance computing capability for aerospace and nuclear applications, among others. In this work, we provide the first demonstration of the existence of a small number of weak cells to single event effects for DDR3 memory when exposed to radiation. Thus, a high proportion of single event faults are actually not entirely random and can be predicted with high accuracy. We also demonstrate a classification of single event effects into predictable single cell, unpredictable single cell, and correlated multi-cell persistent faults, the latter due to latch-up effects. We further show that through classification, we can partition faults, which allows the development of a holistic framework to provide enhanced protection of the DRAM memory. This framework leverages a fault map with bit sparing to protect against faults from weak cells in conjunction with Chipkill ECC to effectively correct chip-level and random errors. This protection provides a potential path to the use of commodity DRAM memory in high radiation environments with extremely low fault rates. Our results, based on data from a multi-day radiation beam experiment, indicate a reduction in uncorrectable bit error rate for rows containing a weak cell by a factor of $\\geq 10^{7}$ compared to Chipkill alone.","PeriodicalId":415648,"journal":{"name":"2019 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Predicting Single Event Effects in DRAM\",\"authors\":\"Donald Kline, Stephen Longofono, R. Melhem, A. Jones\",\"doi\":\"10.1109/DFT.2019.8875328\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The ability to leverage commodity memory in harsh environments due to radiation has the potential advance computing capability for aerospace and nuclear applications, among others. In this work, we provide the first demonstration of the existence of a small number of weak cells to single event effects for DDR3 memory when exposed to radiation. Thus, a high proportion of single event faults are actually not entirely random and can be predicted with high accuracy. We also demonstrate a classification of single event effects into predictable single cell, unpredictable single cell, and correlated multi-cell persistent faults, the latter due to latch-up effects. We further show that through classification, we can partition faults, which allows the development of a holistic framework to provide enhanced protection of the DRAM memory. This framework leverages a fault map with bit sparing to protect against faults from weak cells in conjunction with Chipkill ECC to effectively correct chip-level and random errors. This protection provides a potential path to the use of commodity DRAM memory in high radiation environments with extremely low fault rates. Our results, based on data from a multi-day radiation beam experiment, indicate a reduction in uncorrectable bit error rate for rows containing a weak cell by a factor of $\\\\geq 10^{7}$ compared to Chipkill alone.\",\"PeriodicalId\":415648,\"journal\":{\"name\":\"2019 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)\",\"volume\":\"46 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DFT.2019.8875328\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DFT.2019.8875328","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

在受辐射影响的恶劣环境中利用商品存储器的能力,对航空航天和核应用等领域具有潜在的先进计算能力。在这项工作中,我们首次证明了当暴露于辐射时,存在少量弱细胞对DDR3记忆的单事件效应。因此,很大比例的单事件故障实际上并不是完全随机的,可以以很高的精度进行预测。我们还展示了单事件效应的分类,分为可预测的单细胞、不可预测的单细胞和相关的多细胞持续故障,后者是由于闭锁效应。我们进一步表明,通过分类,我们可以对故障进行分区,从而允许开发一个整体框架,以提供对DRAM存储器的增强保护。该框架利用具有位保留的故障映射来防止弱单元的故障,并结合Chipkill ECC有效地纠正芯片级和随机错误。这种保护为在高辐射环境中以极低的故障率使用商品DRAM存储器提供了潜在的途径。基于多日辐射束实验的数据,我们的结果表明,与单独Chipkill相比,包含弱单元的行不可校正比特误码率降低了$\geq 10^{7}$。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Predicting Single Event Effects in DRAM
The ability to leverage commodity memory in harsh environments due to radiation has the potential advance computing capability for aerospace and nuclear applications, among others. In this work, we provide the first demonstration of the existence of a small number of weak cells to single event effects for DDR3 memory when exposed to radiation. Thus, a high proportion of single event faults are actually not entirely random and can be predicted with high accuracy. We also demonstrate a classification of single event effects into predictable single cell, unpredictable single cell, and correlated multi-cell persistent faults, the latter due to latch-up effects. We further show that through classification, we can partition faults, which allows the development of a holistic framework to provide enhanced protection of the DRAM memory. This framework leverages a fault map with bit sparing to protect against faults from weak cells in conjunction with Chipkill ECC to effectively correct chip-level and random errors. This protection provides a potential path to the use of commodity DRAM memory in high radiation environments with extremely low fault rates. Our results, based on data from a multi-day radiation beam experiment, indicate a reduction in uncorrectable bit error rate for rows containing a weak cell by a factor of $\geq 10^{7}$ compared to Chipkill alone.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Rebooting Computing: The Challenges for Test and Reliability A Comprehensive Evaluation of the Effects of Input Data on the Resilience of GPU Applications On the Criticality of Caches in Fault-Tolerant Processors for Space On-line Testing for Autonomous Systems driven by RISC-V Processor Design Verification Understanding of GPU Architectural Vulnerability for Deep Learning Workloads
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1