Shujie Han, P. Lee, Zhirong Shen, Cheng He, Yi Liu, Tao Huang
{"title":"基于流挖掘的自适应磁盘故障预测","authors":"Shujie Han, P. Lee, Zhirong Shen, Cheng He, Yi Liu, Tao Huang","doi":"10.1109/ICDCS47774.2020.00044","DOIUrl":null,"url":null,"abstract":"We explore machine learning for accurately predicting imminent disk failures and hence providing proactive fault tolerance for modern storage systems. Current disk failure prediction approaches are mostly offline and assume that the disk logs required for training learning models are available a priori. However, in large-scale disk deployment, disk logs are often continuously generated as an evolving data stream, in which the statistical patterns vary over time (also known as concept drift). Such a challenge motivates the need of online techniques that perform training and prediction on the incoming stream of disk logs in real time, while being adaptive to concept drift.We present StreamDFP, a general stream mining framework for disk failure prediction with concept-drift adaptation. We start with a measurement study and demonstrate the existence of concept drift on various disk models based on the datasets from Backblaze and Alibaba Cloud. Motivated by our study, we design StreamDFP with three key techniques, namely (i) online labeling, (ii) concept-drift-aware training, and (iii) general prediction, with a primary objective of making StreamDFP support various machine learning algorithms as a general frame-work. Our evaluation shows that StreamDFP improves the prediction accuracy significantly compared to without concept-drift adaptation under various settings, and achieves reasonably high stream processing performance.","PeriodicalId":158630,"journal":{"name":"2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"Toward Adaptive Disk Failure Prediction via Stream Mining\",\"authors\":\"Shujie Han, P. Lee, Zhirong Shen, Cheng He, Yi Liu, Tao Huang\",\"doi\":\"10.1109/ICDCS47774.2020.00044\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We explore machine learning for accurately predicting imminent disk failures and hence providing proactive fault tolerance for modern storage systems. Current disk failure prediction approaches are mostly offline and assume that the disk logs required for training learning models are available a priori. However, in large-scale disk deployment, disk logs are often continuously generated as an evolving data stream, in which the statistical patterns vary over time (also known as concept drift). Such a challenge motivates the need of online techniques that perform training and prediction on the incoming stream of disk logs in real time, while being adaptive to concept drift.We present StreamDFP, a general stream mining framework for disk failure prediction with concept-drift adaptation. We start with a measurement study and demonstrate the existence of concept drift on various disk models based on the datasets from Backblaze and Alibaba Cloud. Motivated by our study, we design StreamDFP with three key techniques, namely (i) online labeling, (ii) concept-drift-aware training, and (iii) general prediction, with a primary objective of making StreamDFP support various machine learning algorithms as a general frame-work. Our evaluation shows that StreamDFP improves the prediction accuracy significantly compared to without concept-drift adaptation under various settings, and achieves reasonably high stream processing performance.\",\"PeriodicalId\":158630,\"journal\":{\"name\":\"2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS)\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDCS47774.2020.00044\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDCS47774.2020.00044","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Toward Adaptive Disk Failure Prediction via Stream Mining
We explore machine learning for accurately predicting imminent disk failures and hence providing proactive fault tolerance for modern storage systems. Current disk failure prediction approaches are mostly offline and assume that the disk logs required for training learning models are available a priori. However, in large-scale disk deployment, disk logs are often continuously generated as an evolving data stream, in which the statistical patterns vary over time (also known as concept drift). Such a challenge motivates the need of online techniques that perform training and prediction on the incoming stream of disk logs in real time, while being adaptive to concept drift.We present StreamDFP, a general stream mining framework for disk failure prediction with concept-drift adaptation. We start with a measurement study and demonstrate the existence of concept drift on various disk models based on the datasets from Backblaze and Alibaba Cloud. Motivated by our study, we design StreamDFP with three key techniques, namely (i) online labeling, (ii) concept-drift-aware training, and (iii) general prediction, with a primary objective of making StreamDFP support various machine learning algorithms as a general frame-work. Our evaluation shows that StreamDFP improves the prediction accuracy significantly compared to without concept-drift adaptation under various settings, and achieves reasonably high stream processing performance.