Chao Lou, Wang Luo, Dequan Gao, Z. Zhao, Fenggang Lai, Shengya Han, Chao Ma
{"title":"基于贝叶斯网络和知识图的云数据中心诊断推理研究","authors":"Chao Lou, Wang Luo, Dequan Gao, Z. Zhao, Fenggang Lai, Shengya Han, Chao Ma","doi":"10.1109/PHM2022-London52454.2022.00056","DOIUrl":null,"url":null,"abstract":"Cloud Data Center (CDC) has the characteristics of multi-level and multi-domain complex system relations. It is difficult to analyze the alarm information manually to obtain the fault devices and fault cause. In this paper, a knowledge graph is used to track the dynamic changes of CDC topology, and Bayesian Network (BN) diagnosis model with probability attribute is dynamically generated through graph search. Firstly, based on the dynamic topology of CDC tracked in the KG, and the collected fault symptoms from the server log, the graph search is carried out to construct the BN topology, which contains possible fault devices, fault modes and causes. Then with the proposed Causality Strength and Leakage Probability, which could be stored in the KG database, the Condition Probability Table is calculated. Combined with the a priori probability, the Bayesian Network model is established. Finally, the fault cause with the largest a posteriori probability is obtained through the calculation of BN. If the fault cannot be solved by eliminating this cause, reason again with the rest causes. During the maintenance process, constantly update the fault symptoms and causes to make the BN model more accurate. Two fault diagnosis cases show that this method is of great significance to the operation and maintenance of CDC.","PeriodicalId":269605,"journal":{"name":"2022 Prognostics and Health Management Conference (PHM-2022 London)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Research on Diagnostic Reasoning of Cloud Data Center Based on Bayesian Network and Knowledge Graph\",\"authors\":\"Chao Lou, Wang Luo, Dequan Gao, Z. Zhao, Fenggang Lai, Shengya Han, Chao Ma\",\"doi\":\"10.1109/PHM2022-London52454.2022.00056\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Cloud Data Center (CDC) has the characteristics of multi-level and multi-domain complex system relations. It is difficult to analyze the alarm information manually to obtain the fault devices and fault cause. In this paper, a knowledge graph is used to track the dynamic changes of CDC topology, and Bayesian Network (BN) diagnosis model with probability attribute is dynamically generated through graph search. Firstly, based on the dynamic topology of CDC tracked in the KG, and the collected fault symptoms from the server log, the graph search is carried out to construct the BN topology, which contains possible fault devices, fault modes and causes. Then with the proposed Causality Strength and Leakage Probability, which could be stored in the KG database, the Condition Probability Table is calculated. Combined with the a priori probability, the Bayesian Network model is established. Finally, the fault cause with the largest a posteriori probability is obtained through the calculation of BN. If the fault cannot be solved by eliminating this cause, reason again with the rest causes. During the maintenance process, constantly update the fault symptoms and causes to make the BN model more accurate. Two fault diagnosis cases show that this method is of great significance to the operation and maintenance of CDC.\",\"PeriodicalId\":269605,\"journal\":{\"name\":\"2022 Prognostics and Health Management Conference (PHM-2022 London)\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 Prognostics and Health Management Conference (PHM-2022 London)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PHM2022-London52454.2022.00056\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Prognostics and Health Management Conference (PHM-2022 London)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PHM2022-London52454.2022.00056","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Research on Diagnostic Reasoning of Cloud Data Center Based on Bayesian Network and Knowledge Graph
Cloud Data Center (CDC) has the characteristics of multi-level and multi-domain complex system relations. It is difficult to analyze the alarm information manually to obtain the fault devices and fault cause. In this paper, a knowledge graph is used to track the dynamic changes of CDC topology, and Bayesian Network (BN) diagnosis model with probability attribute is dynamically generated through graph search. Firstly, based on the dynamic topology of CDC tracked in the KG, and the collected fault symptoms from the server log, the graph search is carried out to construct the BN topology, which contains possible fault devices, fault modes and causes. Then with the proposed Causality Strength and Leakage Probability, which could be stored in the KG database, the Condition Probability Table is calculated. Combined with the a priori probability, the Bayesian Network model is established. Finally, the fault cause with the largest a posteriori probability is obtained through the calculation of BN. If the fault cannot be solved by eliminating this cause, reason again with the rest causes. During the maintenance process, constantly update the fault symptoms and causes to make the BN model more accurate. Two fault diagnosis cases show that this method is of great significance to the operation and maintenance of CDC.