Donald Kline, Stephen Longofono, R. Melhem, A. Jones
{"title":"预测DRAM中的单事件效应","authors":"Donald Kline, Stephen Longofono, R. Melhem, A. Jones","doi":"10.1109/DFT.2019.8875328","DOIUrl":null,"url":null,"abstract":"The ability to leverage commodity memory in harsh environments due to radiation has the potential advance computing capability for aerospace and nuclear applications, among others. In this work, we provide the first demonstration of the existence of a small number of weak cells to single event effects for DDR3 memory when exposed to radiation. Thus, a high proportion of single event faults are actually not entirely random and can be predicted with high accuracy. We also demonstrate a classification of single event effects into predictable single cell, unpredictable single cell, and correlated multi-cell persistent faults, the latter due to latch-up effects. We further show that through classification, we can partition faults, which allows the development of a holistic framework to provide enhanced protection of the DRAM memory. This framework leverages a fault map with bit sparing to protect against faults from weak cells in conjunction with Chipkill ECC to effectively correct chip-level and random errors. This protection provides a potential path to the use of commodity DRAM memory in high radiation environments with extremely low fault rates. Our results, based on data from a multi-day radiation beam experiment, indicate a reduction in uncorrectable bit error rate for rows containing a weak cell by a factor of $\\geq 10^{7}$ compared to Chipkill alone.","PeriodicalId":415648,"journal":{"name":"2019 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Predicting Single Event Effects in DRAM\",\"authors\":\"Donald Kline, Stephen Longofono, R. Melhem, A. Jones\",\"doi\":\"10.1109/DFT.2019.8875328\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The ability to leverage commodity memory in harsh environments due to radiation has the potential advance computing capability for aerospace and nuclear applications, among others. In this work, we provide the first demonstration of the existence of a small number of weak cells to single event effects for DDR3 memory when exposed to radiation. Thus, a high proportion of single event faults are actually not entirely random and can be predicted with high accuracy. We also demonstrate a classification of single event effects into predictable single cell, unpredictable single cell, and correlated multi-cell persistent faults, the latter due to latch-up effects. We further show that through classification, we can partition faults, which allows the development of a holistic framework to provide enhanced protection of the DRAM memory. This framework leverages a fault map with bit sparing to protect against faults from weak cells in conjunction with Chipkill ECC to effectively correct chip-level and random errors. This protection provides a potential path to the use of commodity DRAM memory in high radiation environments with extremely low fault rates. Our results, based on data from a multi-day radiation beam experiment, indicate a reduction in uncorrectable bit error rate for rows containing a weak cell by a factor of $\\\\geq 10^{7}$ compared to Chipkill alone.\",\"PeriodicalId\":415648,\"journal\":{\"name\":\"2019 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)\",\"volume\":\"46 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DFT.2019.8875328\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DFT.2019.8875328","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The ability to leverage commodity memory in harsh environments due to radiation has the potential advance computing capability for aerospace and nuclear applications, among others. In this work, we provide the first demonstration of the existence of a small number of weak cells to single event effects for DDR3 memory when exposed to radiation. Thus, a high proportion of single event faults are actually not entirely random and can be predicted with high accuracy. We also demonstrate a classification of single event effects into predictable single cell, unpredictable single cell, and correlated multi-cell persistent faults, the latter due to latch-up effects. We further show that through classification, we can partition faults, which allows the development of a holistic framework to provide enhanced protection of the DRAM memory. This framework leverages a fault map with bit sparing to protect against faults from weak cells in conjunction with Chipkill ECC to effectively correct chip-level and random errors. This protection provides a potential path to the use of commodity DRAM memory in high radiation environments with extremely low fault rates. Our results, based on data from a multi-day radiation beam experiment, indicate a reduction in uncorrectable bit error rate for rows containing a weak cell by a factor of $\geq 10^{7}$ compared to Chipkill alone.