HeatWatch: Improving 3D NAND Flash Memory Device Reliability by Exploiting Self-Recovery and Temperature Awareness

Yixin Luo, Saugata Ghose, Yu Cai, E. Haratsch, O. Mutlu
{"title":"HeatWatch: Improving 3D NAND Flash Memory Device Reliability by Exploiting Self-Recovery and Temperature Awareness","authors":"Yixin Luo, Saugata Ghose, Yu Cai, E. Haratsch, O. Mutlu","doi":"10.1109/HPCA.2018.00050","DOIUrl":null,"url":null,"abstract":"NAND flash memory density continues to scale to keep up with the increasing storage demands of data-intensive applications. Unfortunately, as a result of this scaling, the lifetime of NAND flash memory has been decreasing. Each cell in NAND flash memory can endure only a limited number of writes, due to the damage caused by each program and erase operation on the cell. This damage can be partially repaired on its own during the idle time between program or erase operations (known as the dwell time), via a phenomenon known as the self-recovery effect. Prior works study the self-recovery effect for planar (i.e., 2D) NAND flash memory, and propose to exploit it to improve flash lifetime, by applying high temperature to accelerate self-recovery. However, these findings may not be directly applicable to 3D NAND flash memory, due to significant changes in the design and manufacturing process that are required to enable practical 3D stacking for NAND flash memory. In this paper, we perform the first detailed experimental characterization of the effects of self-recovery and temperature on real, state-of-the-art 3D NAND flash memory devices. We show that these effects influence two major factors of NAND flash memory reliability: (1) retention loss speed (i.e., the speed at which a flash cell leaks charge), and (2) program variation (i.e., the difference in programming speed across flash cells). We find that self-recovery and temperature affect 3D NAND flash memory quite differently than they affect planar NAND flash memory, rendering prior models of self-recovery and temperature ineffective for 3D NAND flash memory. Using our characterization results, we develop a new model for 3D NAND flash memory reliability, which predicts how retention, wearout, self-recovery, and temperature affect raw bit error rates and cell threshold voltages. We show that our model is accurate, with an error of only 4.9%. Based on our experimental findings and our model, we propose HeatWatch, a new mechanism to improve 3D NAND flash memory reliability. The key idea of HeatWatch is to optimize the read reference voltage, i.e., the voltage applied to the cell during a read operation, by adapting it to the dwell time of the workload and the current operating temperature. HeatWatch (1) efficiently tracks flash memory temperature and dwell time online, (2) sends this information to our reliability model to predict the current voltages of flash cells, and (3) predicts the optimal read reference voltage based on the current cell voltages. Our detailed experimental evaluations show that HeatWatch improves flash lifetime by 3.85× over a baseline that uses a fixed read reference voltage, averaged across 28 real storage workload traces, and comes within 0.9% of the lifetime of an ideal read reference voltage selection mechanism.","PeriodicalId":154694,"journal":{"name":"2018 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"102","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2018.00050","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 102

Abstract

NAND flash memory density continues to scale to keep up with the increasing storage demands of data-intensive applications. Unfortunately, as a result of this scaling, the lifetime of NAND flash memory has been decreasing. Each cell in NAND flash memory can endure only a limited number of writes, due to the damage caused by each program and erase operation on the cell. This damage can be partially repaired on its own during the idle time between program or erase operations (known as the dwell time), via a phenomenon known as the self-recovery effect. Prior works study the self-recovery effect for planar (i.e., 2D) NAND flash memory, and propose to exploit it to improve flash lifetime, by applying high temperature to accelerate self-recovery. However, these findings may not be directly applicable to 3D NAND flash memory, due to significant changes in the design and manufacturing process that are required to enable practical 3D stacking for NAND flash memory. In this paper, we perform the first detailed experimental characterization of the effects of self-recovery and temperature on real, state-of-the-art 3D NAND flash memory devices. We show that these effects influence two major factors of NAND flash memory reliability: (1) retention loss speed (i.e., the speed at which a flash cell leaks charge), and (2) program variation (i.e., the difference in programming speed across flash cells). We find that self-recovery and temperature affect 3D NAND flash memory quite differently than they affect planar NAND flash memory, rendering prior models of self-recovery and temperature ineffective for 3D NAND flash memory. Using our characterization results, we develop a new model for 3D NAND flash memory reliability, which predicts how retention, wearout, self-recovery, and temperature affect raw bit error rates and cell threshold voltages. We show that our model is accurate, with an error of only 4.9%. Based on our experimental findings and our model, we propose HeatWatch, a new mechanism to improve 3D NAND flash memory reliability. The key idea of HeatWatch is to optimize the read reference voltage, i.e., the voltage applied to the cell during a read operation, by adapting it to the dwell time of the workload and the current operating temperature. HeatWatch (1) efficiently tracks flash memory temperature and dwell time online, (2) sends this information to our reliability model to predict the current voltages of flash cells, and (3) predicts the optimal read reference voltage based on the current cell voltages. Our detailed experimental evaluations show that HeatWatch improves flash lifetime by 3.85× over a baseline that uses a fixed read reference voltage, averaged across 28 real storage workload traces, and comes within 0.9% of the lifetime of an ideal read reference voltage selection mechanism.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
HeatWatch:通过利用自我恢复和温度感知来提高3D NAND闪存设备的可靠性
NAND闪存密度不断扩大,以跟上数据密集型应用日益增长的存储需求。不幸的是,由于这种缩放,NAND闪存的寿命一直在减少。NAND闪存中的每个单元只能承受有限的写入次数,因为每个程序和对单元的擦除操作都会造成损坏。在程序或擦除操作之间的空闲时间(称为驻留时间),这种损坏可以通过一种称为自我恢复效应的现象自行部分修复。先前的工作研究了平面(即2D) NAND闪存的自恢复效应,并提出利用它来提高闪存寿命,通过施加高温来加速自恢复。然而,这些发现可能并不直接适用于3D NAND闪存,因为在设计和制造过程中需要发生重大变化,才能实现NAND闪存的实际3D堆叠。在本文中,我们进行了第一个详细的实验表征的自我恢复和温度的影响,在真正的,最先进的3D NAND闪存设备。我们表明,这些影响影响NAND闪存可靠性的两个主要因素:(1)保留损耗速度(即闪存单元泄漏电荷的速度),以及(2)程序变化(即跨闪存单元编程速度的差异)。我们发现自恢复和温度对3D NAND闪存的影响与对平面NAND闪存的影响完全不同,这使得先前的自恢复和温度模型对3D NAND闪存无效。利用我们的表征结果,我们开发了3D NAND闪存可靠性的新模型,该模型预测了保留、磨损、自我恢复和温度如何影响原始误码率和单元阈值电压。我们证明我们的模型是准确的,误差只有4.9%。基于我们的实验结果和我们的模型,我们提出了一种新的机制来提高3D NAND闪存的可靠性。HeatWatch的关键思想是优化读取参考电压,即在读取操作期间施加到电池上的电压,通过使其适应工作负载的停留时间和当前的工作温度。HeatWatch(1)有效地跟踪闪存温度和在线停留时间,(2)将这些信息发送给我们的可靠性模型来预测闪存单元的电流电压,(3)根据当前单元电压预测最佳读取参考电压。我们详细的实验评估表明,与使用固定读取参考电压的基线相比,HeatWatch将闪存寿命提高了3.85倍,平均在28个实际存储工作负载跟踪中,并且在理想读取参考电压选择机制的寿命范围内达到0.9%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Record-Replay Architecture as a General Security Framework LATTE-CC: Latency Tolerance Aware Adaptive Cache Compression Management for Energy Efficient GPUs Secure DIMM: Moving ORAM Primitives Closer to Memory OuterSPACE: An Outer Product Based Sparse Matrix Multiplication Accelerator WIR: Warp Instruction Reuse to Minimize Repeated Computations in GPUs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1