计算机系统自适应故障预测:框架与案例研究

Ivano Irrera, M. Vieira, J. Durães
{"title":"计算机系统自适应故障预测:框架与案例研究","authors":"Ivano Irrera, M. Vieira, J. Durães","doi":"10.1109/HASE.2015.29","DOIUrl":null,"url":null,"abstract":"Online Failure Prediction allows improving system dependability by foreseeing incoming failures at runtime, enabling mitigation actions to be taken in advance. Despite advances in the last years, Online Failure Prediction is still not adopted due to the complexity and time needed to perform the supporting operations, such as training, testing and tuning. Moreover, a predictor must be frequently re-trained to maintain its effectiveness as the target system evolves during its runtime life, this requiring substantial human intervention and effort. In this work we propose a framework for the automatic deployment and online retraining of failure prediction systems. The framework makes use of key techniques such as fault injection and virtualization to reduce the cost and impact of retraining, and is driven by configurable events that trigger the entire process. We present a case study using a web server system and our results show that the framework is able to maintain the performance of the fault predictor even when the system is modified, suggesting that it can be useful in real scenarios.","PeriodicalId":248645,"journal":{"name":"2015 IEEE 16th International Symposium on High Assurance Systems Engineering","volume":"96 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"Adaptive Failure Prediction for Computer Systems: A Framework and a Case Study\",\"authors\":\"Ivano Irrera, M. Vieira, J. Durães\",\"doi\":\"10.1109/HASE.2015.29\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Online Failure Prediction allows improving system dependability by foreseeing incoming failures at runtime, enabling mitigation actions to be taken in advance. Despite advances in the last years, Online Failure Prediction is still not adopted due to the complexity and time needed to perform the supporting operations, such as training, testing and tuning. Moreover, a predictor must be frequently re-trained to maintain its effectiveness as the target system evolves during its runtime life, this requiring substantial human intervention and effort. In this work we propose a framework for the automatic deployment and online retraining of failure prediction systems. The framework makes use of key techniques such as fault injection and virtualization to reduce the cost and impact of retraining, and is driven by configurable events that trigger the entire process. We present a case study using a web server system and our results show that the framework is able to maintain the performance of the fault predictor even when the system is modified, suggesting that it can be useful in real scenarios.\",\"PeriodicalId\":248645,\"journal\":{\"name\":\"2015 IEEE 16th International Symposium on High Assurance Systems Engineering\",\"volume\":\"96 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-01-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE 16th International Symposium on High Assurance Systems Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HASE.2015.29\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 16th International Symposium on High Assurance Systems Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HASE.2015.29","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17

摘要

在线故障预测可以通过在运行时预测传入的故障来提高系统可靠性,从而提前采取缓解措施。尽管在过去几年中取得了进步,但由于执行支持操作(如培训、测试和调优)的复杂性和时间需要,在线故障预测仍然没有被采用。此外,预测器必须经常被重新训练,以便在目标系统在其运行寿命期间发展时保持其有效性,这需要大量的人工干预和努力。在这项工作中,我们提出了一个自动部署和在线再训练故障预测系统的框架。该框架利用故障注入和虚拟化等关键技术来降低再培训的成本和影响,并由触发整个过程的可配置事件驱动。我们提出了一个使用web服务器系统的案例研究,结果表明,即使系统被修改,该框架也能够保持故障预测器的性能,这表明它在实际场景中是有用的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Adaptive Failure Prediction for Computer Systems: A Framework and a Case Study
Online Failure Prediction allows improving system dependability by foreseeing incoming failures at runtime, enabling mitigation actions to be taken in advance. Despite advances in the last years, Online Failure Prediction is still not adopted due to the complexity and time needed to perform the supporting operations, such as training, testing and tuning. Moreover, a predictor must be frequently re-trained to maintain its effectiveness as the target system evolves during its runtime life, this requiring substantial human intervention and effort. In this work we propose a framework for the automatic deployment and online retraining of failure prediction systems. The framework makes use of key techniques such as fault injection and virtualization to reduce the cost and impact of retraining, and is driven by configurable events that trigger the entire process. We present a case study using a web server system and our results show that the framework is able to maintain the performance of the fault predictor even when the system is modified, suggesting that it can be useful in real scenarios.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Game-Theoretical Model for Security Risk Management of Interdependent ICT and Electrical Infrastructures HCPN Modeling for ERTMS Requirements Specification A Diversity-Based Approach for Communication Integrity in Critical Embedded Systems Weaving an Assurance Case from Design: A Model-Based Approach Using Pairwise Testing to Verify Automatically-Generated Formal Specifications
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1