E-ECC: Low Power Erasure and Error Correction Schemes for Increasing Reliability of Commodity DRAM Systems

H. Chen, A. Arunkumar, Carole-Jean Wu, T. Mudge, C. Chakrabarti
{"title":"E-ECC: Low Power Erasure and Error Correction Schemes for Increasing Reliability of Commodity DRAM Systems","authors":"H. Chen, A. Arunkumar, Carole-Jean Wu, T. Mudge, C. Chakrabarti","doi":"10.1145/2818950.2818961","DOIUrl":null,"url":null,"abstract":"Most server-grade memory systems provide Chipkill-Correct error protection at the expense of power and/or performance overhead. In this paper we present low overhead schemes for improving the reliability of commodity DRAM systems with better power and IPC performance compared to Chipkill-Correct solutions. Specifically, we propose two erasure and error correction (E-ECC) schemes for x8 memory systems that have 12.5% storage overhead and do not require any change in the existing memory architecture. Both schemes have superior error performance due to the use of a strong ECC code, namely, RS(36,32) over GF(28). Scheme 1 activates 18 chips per access and has stronger reliability compared to Chipkill-Correct solutions. If the location of the faulty chip is known, Scheme 1 can correct an additional random error in a second chip. Scheme 2 trades off reliability for higher energy efficiency by activating only 9 chips per access. It cannot correct random errors due to a chip failure but can detect them with 99.9986% probability, and once a chip is marked faulty due to persistent errors, it can correct all errors due to that chip. Synthesis results in 28nm node show that the RS (36,32) code results in a very low decoding latency that can be well-hidden in commodity memory systems and, therefore, it has minimal effect on the DRAM access latency. Evaluations based on SPEC CPU 2006 sequential and multi-programmed workloads show that compared to Chipkill-Correct, the proposed Schemes 1 and 2 improve IPC by an average of 3.2% (maximum of 13.8%) and 4.8% (maximum of 31.8%) and reduce the power consumption by an average of 16.2% (maximum of 25%) and 26.8% (maximum of 36%), respectively.","PeriodicalId":389462,"journal":{"name":"Proceedings of the 2015 International Symposium on Memory Systems","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2015 International Symposium on Memory Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2818950.2818961","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

Abstract

Most server-grade memory systems provide Chipkill-Correct error protection at the expense of power and/or performance overhead. In this paper we present low overhead schemes for improving the reliability of commodity DRAM systems with better power and IPC performance compared to Chipkill-Correct solutions. Specifically, we propose two erasure and error correction (E-ECC) schemes for x8 memory systems that have 12.5% storage overhead and do not require any change in the existing memory architecture. Both schemes have superior error performance due to the use of a strong ECC code, namely, RS(36,32) over GF(28). Scheme 1 activates 18 chips per access and has stronger reliability compared to Chipkill-Correct solutions. If the location of the faulty chip is known, Scheme 1 can correct an additional random error in a second chip. Scheme 2 trades off reliability for higher energy efficiency by activating only 9 chips per access. It cannot correct random errors due to a chip failure but can detect them with 99.9986% probability, and once a chip is marked faulty due to persistent errors, it can correct all errors due to that chip. Synthesis results in 28nm node show that the RS (36,32) code results in a very low decoding latency that can be well-hidden in commodity memory systems and, therefore, it has minimal effect on the DRAM access latency. Evaluations based on SPEC CPU 2006 sequential and multi-programmed workloads show that compared to Chipkill-Correct, the proposed Schemes 1 and 2 improve IPC by an average of 3.2% (maximum of 13.8%) and 4.8% (maximum of 31.8%) and reduce the power consumption by an average of 16.2% (maximum of 25%) and 26.8% (maximum of 36%), respectively.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
E-ECC:提高商品DRAM系统可靠性的低功耗擦除和纠错方案
大多数服务器级内存系统以牺牲电源和/或性能开销为代价提供Chipkill-Correct错误保护。在本文中,我们提出了一种低开销方案,用于提高商品DRAM系统的可靠性,与Chipkill-Correct解决方案相比,它具有更好的功率和IPC性能。具体来说,我们提出了两种用于x8内存系统的擦除和纠错(E-ECC)方案,它们具有12.5%的存储开销,并且不需要对现有内存架构进行任何更改。由于使用了强ECC码,即RS(36,32) / GF(28),这两种方案都具有优越的误差性能。方案1每次访问激活18个芯片,与Chipkill-Correct方案相比具有更强的可靠性。如果故障芯片的位置已知,方案1可以纠正第二个芯片中的额外随机错误。方案2通过每次访问仅激活9个芯片来换取更高的能源效率。它不能纠正由于芯片故障而导致的随机错误,但可以以99.9986%的概率检测到它们,一旦一个芯片由于持续错误而被标记为故障,它可以纠正由于该芯片导致的所有错误。28nm节点的合成结果表明,RS(36,32)编码导致非常低的解码延迟,可以很好地隐藏在商品存储系统中,因此,它对DRAM访问延迟的影响最小。基于SPEC CPU 2006顺序和多程序工作负载的评估表明,与Chipkill-Correct相比,所提出的方案1和方案2将IPC平均提高3.2%(最高13.8%)和4.8%(最高31.8%),并将功耗平均降低16.2%(最高25%)和26.8%(最高36%)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
MEMST: Cloning Memory Behavior using Stochastic Traces Dynamic Memory Pressure Aware Ballooning Another Trip to the Wall: How Much Will Stacked DRAM Benefit HPC? E-ECC: Low Power Erasure and Error Correction Schemes for Increasing Reliability of Commodity DRAM Systems Inefficiencies in the Cache Hierarchy: A Sensitivity Study of Cacheline Size with Mobile Workloads
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1