{"title":"SinkFlow: Fast and traceable root-cause localization for multidimensional anomaly events","authors":"Zhichao Hu , Likun Liu , Lina Ma , Xiangzhan Yu","doi":"10.1016/j.engappai.2024.109582","DOIUrl":null,"url":null,"abstract":"<div><div>With the development of various artificial intelligence (AI)–based applications, detecting anomalies and analyzing the root causes from massive data are critical to increasing the usability of AI. Fast, accurate root-cause analysis (RCA) that finds the main reason for an anomaly, as well as reasonable explanations, helps in solving problems effectively. Thus, RCA plays an important role in troubleshooting and fault diagnosis, making its application in data analysis crucial. Previous root-cause-localization approaches for multidimensional anomaly events encompass various techniques to reduce search space and have improved the localization performance. However, they do not effectively balance the requirements in terms of performance, compatibility, and interpretability. To solve these problems, we propose a new root-cause-localization method called <em>SinkFlow</em>. It provides a unified framework event-aggregation Graph (EAG) to describe the constraints of event aggregation and relations between events, so it can be easily generalized to various domains. <em>SinkFlow</em> introduces an applicable measure evaluation method for both fundamental and derived measures to quantify the impact of events. Also, it utilizes an optimal search strategy to reduce the search space based on the anomaly behavioral consistency and deviation significance. Our experimental results on semisynthetic datasets show that <em>SinkFlow</em> achieved better performance than other baselines and ran much faster, achieving a 1.88% increase of the F1-score and only 25% of the time cost of the second best localization method. In addition, <em>SinkFlow</em> offered clear, visible explanations of the localization results to answer the questions of why they are root causes and how the anomaly is formed.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"139 ","pages":"Article 109582"},"PeriodicalIF":7.5000,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Applications of Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0952197624017408","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
With the development of various artificial intelligence (AI)–based applications, detecting anomalies and analyzing the root causes from massive data are critical to increasing the usability of AI. Fast, accurate root-cause analysis (RCA) that finds the main reason for an anomaly, as well as reasonable explanations, helps in solving problems effectively. Thus, RCA plays an important role in troubleshooting and fault diagnosis, making its application in data analysis crucial. Previous root-cause-localization approaches for multidimensional anomaly events encompass various techniques to reduce search space and have improved the localization performance. However, they do not effectively balance the requirements in terms of performance, compatibility, and interpretability. To solve these problems, we propose a new root-cause-localization method called SinkFlow. It provides a unified framework event-aggregation Graph (EAG) to describe the constraints of event aggregation and relations between events, so it can be easily generalized to various domains. SinkFlow introduces an applicable measure evaluation method for both fundamental and derived measures to quantify the impact of events. Also, it utilizes an optimal search strategy to reduce the search space based on the anomaly behavioral consistency and deviation significance. Our experimental results on semisynthetic datasets show that SinkFlow achieved better performance than other baselines and ran much faster, achieving a 1.88% increase of the F1-score and only 25% of the time cost of the second best localization method. In addition, SinkFlow offered clear, visible explanations of the localization results to answer the questions of why they are root causes and how the anomaly is formed.
期刊介绍:
Artificial Intelligence (AI) is pivotal in driving the fourth industrial revolution, witnessing remarkable advancements across various machine learning methodologies. AI techniques have become indispensable tools for practicing engineers, enabling them to tackle previously insurmountable challenges. Engineering Applications of Artificial Intelligence serves as a global platform for the swift dissemination of research elucidating the practical application of AI methods across all engineering disciplines. Submitted papers are expected to present novel aspects of AI utilized in real-world engineering applications, validated using publicly available datasets to ensure the replicability of research outcomes. Join us in exploring the transformative potential of AI in engineering.