{"title":"Proactive Failure Management by Integrated Unsupervised and Semi-Supervised Learning for Dependable Cloud Systems","authors":"Qiang Guan, Ziming Zhang, Song Fu","doi":"10.1109/ARES.2011.20","DOIUrl":null,"url":null,"abstract":"Cloud computing systems continue to grow in their scale and complexity. They are changing dynamically as well due to the addition and removal of system components, changing execution environments, frequent updates and upgrades, online repairs and more. In such large-scale complex and dynamic systems, failures are common. In this paper, we present a failure prediction mechanism exploiting both unsupervised and semi-supervised learning techniques for building dependable cloud computing systems. The unsupervised failure detection method uses an ensemble of Bayesian models. It characterizes normal execution states of the system and detects anomalous behaviors. After the anomalies are verified by system administrators, labeled data are available. Then, we apply supervised learning based on decision tree classier to predict future failure occurrences in the cloud. Experimental results in an institute-wide cloud computing system show that our proposed method can forecast failure dynamics with high accuracy.","PeriodicalId":254443,"journal":{"name":"2011 Sixth International Conference on Availability, Reliability and Security","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"48","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 Sixth International Conference on Availability, Reliability and Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ARES.2011.20","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 48
Abstract
Cloud computing systems continue to grow in their scale and complexity. They are changing dynamically as well due to the addition and removal of system components, changing execution environments, frequent updates and upgrades, online repairs and more. In such large-scale complex and dynamic systems, failures are common. In this paper, we present a failure prediction mechanism exploiting both unsupervised and semi-supervised learning techniques for building dependable cloud computing systems. The unsupervised failure detection method uses an ensemble of Bayesian models. It characterizes normal execution states of the system and detects anomalous behaviors. After the anomalies are verified by system administrators, labeled data are available. Then, we apply supervised learning based on decision tree classier to predict future failure occurrences in the cloud. Experimental results in an institute-wide cloud computing system show that our proposed method can forecast failure dynamics with high accuracy.