ADF2T:一个主动磁盘故障预测和容错软件

Hongzhang Yang, Yahui Yang, Zhengguang Chen, Zongzhao Li, Yaofeng Tu
{"title":"ADF2T:一个主动磁盘故障预测和容错软件","authors":"Hongzhang Yang, Yahui Yang, Zhengguang Chen, Zongzhao Li, Yaofeng Tu","doi":"10.1109/ISSREW51248.2020.00030","DOIUrl":null,"url":null,"abstract":"The reliability of distributed file system is inevitably affected by hard disk failure. This paper proposes an active disk failure forecasting and tolerance software. Firstly, multiple SMART records in the time window are merged into one sample, and after sliding, tens of times of positive samples are created. Secondly, the features are selected by two-stage sorting method, so that the most conducive features are used in machine learning modeling, and the time for model training can be shortened obviously. Thirdly, through two-stage verification, parameters can be adjusted in time for unreasonable proactive reconstruction strategies. Experiments show that modeling and forecast of ZTE data set and Backblaze data set respectively, the recall rate is 95.66% and 84.28%, and the error rate is 0.23% and 2.45%. The work in this paper has been commercially used for more than one year in ZTE data center. The reliability of distributed file system software is significantly improved.","PeriodicalId":202247,"journal":{"name":"2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ADF2T: an Active Disk Failure Forecasting and Tolerance Software\",\"authors\":\"Hongzhang Yang, Yahui Yang, Zhengguang Chen, Zongzhao Li, Yaofeng Tu\",\"doi\":\"10.1109/ISSREW51248.2020.00030\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The reliability of distributed file system is inevitably affected by hard disk failure. This paper proposes an active disk failure forecasting and tolerance software. Firstly, multiple SMART records in the time window are merged into one sample, and after sliding, tens of times of positive samples are created. Secondly, the features are selected by two-stage sorting method, so that the most conducive features are used in machine learning modeling, and the time for model training can be shortened obviously. Thirdly, through two-stage verification, parameters can be adjusted in time for unreasonable proactive reconstruction strategies. Experiments show that modeling and forecast of ZTE data set and Backblaze data set respectively, the recall rate is 95.66% and 84.28%, and the error rate is 0.23% and 2.45%. The work in this paper has been commercially used for more than one year in ZTE data center. The reliability of distributed file system software is significantly improved.\",\"PeriodicalId\":202247,\"journal\":{\"name\":\"2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISSREW51248.2020.00030\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSREW51248.2020.00030","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

硬盘故障不可避免地会影响分布式文件系统的可靠性。提出了一种主动磁盘故障预测与容错软件。首先,将时间窗内的多条SMART记录合并为一个样本,滑动后生成数十次正样本。其次,采用两阶段排序的方法选择特征,使最有利于机器学习建模的特征被使用,可以明显缩短模型训练的时间。第三,通过两阶段验证,可以对不合理的主动重构策略及时调整参数。实验表明,分别对ZTE数据集和Backblaze数据集进行建模和预测,召回率为95.66%和84.28%,错误率为0.23%和2.45%。本文的工作已经在中兴通讯数据中心进行了一年多的商业应用。大大提高了分布式文件系统软件的可靠性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
ADF2T: an Active Disk Failure Forecasting and Tolerance Software
The reliability of distributed file system is inevitably affected by hard disk failure. This paper proposes an active disk failure forecasting and tolerance software. Firstly, multiple SMART records in the time window are merged into one sample, and after sliding, tens of times of positive samples are created. Secondly, the features are selected by two-stage sorting method, so that the most conducive features are used in machine learning modeling, and the time for model training can be shortened obviously. Thirdly, through two-stage verification, parameters can be adjusted in time for unreasonable proactive reconstruction strategies. Experiments show that modeling and forecast of ZTE data set and Backblaze data set respectively, the recall rate is 95.66% and 84.28%, and the error rate is 0.23% and 2.45%. The work in this paper has been commercially used for more than one year in ZTE data center. The reliability of distributed file system software is significantly improved.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
BP-IDS: Using business process specification to leverage intrusion detection in critical infrastructures Techniques and Tools for Advanced Software Vulnerability Detection Challenges Faced with Application Performance Monitoring (APM) when Migrating to the Cloud AHPCap: A Framework for Automated Hardware Profiling and Capture of Mobile Application States Unit Lemmas for Detecting Requirement and Specification Flaws
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1