{"title":"基于异常传播图随机游走的并发报警根本原因分析","authors":"Lingyu Zhang, Jiabao Zhao, Min Zhang","doi":"10.1109/ICNSC48988.2020.9238084","DOIUrl":null,"url":null,"abstract":"With the development of Internet technology, IT systems are getting more and more complex, in which there are two main relationships among system components: service call relationship and deployment configuration relationship. Once a local anomaly occurs in the system, it tends to spread, triggering emergent and dense concurrent alarms. Hence, it is important to quickly and precisely locate the root cause of concurrent alarms. In this paper, we first construct an anomaly propagation graph using collected system data. Then, based on the graph, we propose two optional algorithms: random walk and state iteration, to track anomaly propagation process and locate the root cause. Simulation experiments demonstrate that our proposed method can localize root causes correctly and rapidly for scenarios with complex call chains and resource competition, and is robust to alarm error. The proposed method pays more attention to system characteristics and depends little on experience knowledge of IT operators.","PeriodicalId":412290,"journal":{"name":"2020 IEEE International Conference on Networking, Sensing and Control (ICNSC)","volume":"257 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Root Cause Analysis of Concurrent Alarms Based on Random Walk over Anomaly Propagation Graph\",\"authors\":\"Lingyu Zhang, Jiabao Zhao, Min Zhang\",\"doi\":\"10.1109/ICNSC48988.2020.9238084\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the development of Internet technology, IT systems are getting more and more complex, in which there are two main relationships among system components: service call relationship and deployment configuration relationship. Once a local anomaly occurs in the system, it tends to spread, triggering emergent and dense concurrent alarms. Hence, it is important to quickly and precisely locate the root cause of concurrent alarms. In this paper, we first construct an anomaly propagation graph using collected system data. Then, based on the graph, we propose two optional algorithms: random walk and state iteration, to track anomaly propagation process and locate the root cause. Simulation experiments demonstrate that our proposed method can localize root causes correctly and rapidly for scenarios with complex call chains and resource competition, and is robust to alarm error. The proposed method pays more attention to system characteristics and depends little on experience knowledge of IT operators.\",\"PeriodicalId\":412290,\"journal\":{\"name\":\"2020 IEEE International Conference on Networking, Sensing and Control (ICNSC)\",\"volume\":\"257 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Conference on Networking, Sensing and Control (ICNSC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICNSC48988.2020.9238084\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Conference on Networking, Sensing and Control (ICNSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICNSC48988.2020.9238084","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Root Cause Analysis of Concurrent Alarms Based on Random Walk over Anomaly Propagation Graph
With the development of Internet technology, IT systems are getting more and more complex, in which there are two main relationships among system components: service call relationship and deployment configuration relationship. Once a local anomaly occurs in the system, it tends to spread, triggering emergent and dense concurrent alarms. Hence, it is important to quickly and precisely locate the root cause of concurrent alarms. In this paper, we first construct an anomaly propagation graph using collected system data. Then, based on the graph, we propose two optional algorithms: random walk and state iteration, to track anomaly propagation process and locate the root cause. Simulation experiments demonstrate that our proposed method can localize root causes correctly and rapidly for scenarios with complex call chains and resource competition, and is robust to alarm error. The proposed method pays more attention to system characteristics and depends little on experience knowledge of IT operators.