{"title":"计算机系统自适应故障预测:框架与案例研究","authors":"Ivano Irrera, M. Vieira, J. Durães","doi":"10.1109/HASE.2015.29","DOIUrl":null,"url":null,"abstract":"Online Failure Prediction allows improving system dependability by foreseeing incoming failures at runtime, enabling mitigation actions to be taken in advance. Despite advances in the last years, Online Failure Prediction is still not adopted due to the complexity and time needed to perform the supporting operations, such as training, testing and tuning. Moreover, a predictor must be frequently re-trained to maintain its effectiveness as the target system evolves during its runtime life, this requiring substantial human intervention and effort. In this work we propose a framework for the automatic deployment and online retraining of failure prediction systems. The framework makes use of key techniques such as fault injection and virtualization to reduce the cost and impact of retraining, and is driven by configurable events that trigger the entire process. We present a case study using a web server system and our results show that the framework is able to maintain the performance of the fault predictor even when the system is modified, suggesting that it can be useful in real scenarios.","PeriodicalId":248645,"journal":{"name":"2015 IEEE 16th International Symposium on High Assurance Systems Engineering","volume":"96 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"Adaptive Failure Prediction for Computer Systems: A Framework and a Case Study\",\"authors\":\"Ivano Irrera, M. Vieira, J. Durães\",\"doi\":\"10.1109/HASE.2015.29\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Online Failure Prediction allows improving system dependability by foreseeing incoming failures at runtime, enabling mitigation actions to be taken in advance. Despite advances in the last years, Online Failure Prediction is still not adopted due to the complexity and time needed to perform the supporting operations, such as training, testing and tuning. Moreover, a predictor must be frequently re-trained to maintain its effectiveness as the target system evolves during its runtime life, this requiring substantial human intervention and effort. In this work we propose a framework for the automatic deployment and online retraining of failure prediction systems. The framework makes use of key techniques such as fault injection and virtualization to reduce the cost and impact of retraining, and is driven by configurable events that trigger the entire process. We present a case study using a web server system and our results show that the framework is able to maintain the performance of the fault predictor even when the system is modified, suggesting that it can be useful in real scenarios.\",\"PeriodicalId\":248645,\"journal\":{\"name\":\"2015 IEEE 16th International Symposium on High Assurance Systems Engineering\",\"volume\":\"96 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-01-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE 16th International Symposium on High Assurance Systems Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HASE.2015.29\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 16th International Symposium on High Assurance Systems Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HASE.2015.29","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Adaptive Failure Prediction for Computer Systems: A Framework and a Case Study
Online Failure Prediction allows improving system dependability by foreseeing incoming failures at runtime, enabling mitigation actions to be taken in advance. Despite advances in the last years, Online Failure Prediction is still not adopted due to the complexity and time needed to perform the supporting operations, such as training, testing and tuning. Moreover, a predictor must be frequently re-trained to maintain its effectiveness as the target system evolves during its runtime life, this requiring substantial human intervention and effort. In this work we propose a framework for the automatic deployment and online retraining of failure prediction systems. The framework makes use of key techniques such as fault injection and virtualization to reduce the cost and impact of retraining, and is driven by configurable events that trigger the entire process. We present a case study using a web server system and our results show that the framework is able to maintain the performance of the fault predictor even when the system is modified, suggesting that it can be useful in real scenarios.