{"title":"面向高可用性集群的贝叶斯预测框架","authors":"Premathas Somasekaram, R. Calinescu","doi":"10.1145/3492323.3495583","DOIUrl":null,"url":null,"abstract":"Critical applications deployed on cloud and in-house information technology infrastructures use software solutions known as high-availability clusters (HACs) to ensure higher availability. Our paper introduces a Bayesian prognostic (BP) framework that improves the ability of HACs to (i) predict component failures that can be resolved by reinitialising the failed component and (ii) propagate and predict failures in high-level components when the component failure cannot be resolved through reinitialisation. Preliminary experiments presented in the paper demonstrate that this BP framework can reduce the downtime for an enterprise application subjected to a wide range of injected faults by between 5.5 and 7.9 times compared to the availability achieved by the open-source HAC ClusterLabs stack (Pacemaker/Corosync).","PeriodicalId":440884,"journal":{"name":"Proceedings of the 14th IEEE/ACM International Conference on Utility and Cloud Computing Companion","volume":"330 1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Towards a Bayesian prognostic framework for high-availability clusters\",\"authors\":\"Premathas Somasekaram, R. Calinescu\",\"doi\":\"10.1145/3492323.3495583\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Critical applications deployed on cloud and in-house information technology infrastructures use software solutions known as high-availability clusters (HACs) to ensure higher availability. Our paper introduces a Bayesian prognostic (BP) framework that improves the ability of HACs to (i) predict component failures that can be resolved by reinitialising the failed component and (ii) propagate and predict failures in high-level components when the component failure cannot be resolved through reinitialisation. Preliminary experiments presented in the paper demonstrate that this BP framework can reduce the downtime for an enterprise application subjected to a wide range of injected faults by between 5.5 and 7.9 times compared to the availability achieved by the open-source HAC ClusterLabs stack (Pacemaker/Corosync).\",\"PeriodicalId\":440884,\"journal\":{\"name\":\"Proceedings of the 14th IEEE/ACM International Conference on Utility and Cloud Computing Companion\",\"volume\":\"330 1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 14th IEEE/ACM International Conference on Utility and Cloud Computing Companion\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3492323.3495583\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 14th IEEE/ACM International Conference on Utility and Cloud Computing Companion","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3492323.3495583","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Towards a Bayesian prognostic framework for high-availability clusters
Critical applications deployed on cloud and in-house information technology infrastructures use software solutions known as high-availability clusters (HACs) to ensure higher availability. Our paper introduces a Bayesian prognostic (BP) framework that improves the ability of HACs to (i) predict component failures that can be resolved by reinitialising the failed component and (ii) propagate and predict failures in high-level components when the component failure cannot be resolved through reinitialisation. Preliminary experiments presented in the paper demonstrate that this BP framework can reduce the downtime for an enterprise application subjected to a wide range of injected faults by between 5.5 and 7.9 times compared to the availability achieved by the open-source HAC ClusterLabs stack (Pacemaker/Corosync).