{"title":"发现数据流中异常值的离群属性","authors":"Egawati Panjei , Le Gruenwald","doi":"10.1016/j.datak.2024.102349","DOIUrl":null,"url":null,"abstract":"<div><p>Data streams, continuous sequences of timestamped data points, necessitate real-time monitoring due to their time-sensitive nature. In various data stream applications, such as network security and credit card transaction monitoring, real-time detection of outliers is crucial, as these outliers often signify potential threats. Equally important is the real-time explanation of outliers, enabling users to glean insights and thereby shorten their investigation time. The investigation time for outliers is closely tied to their number of attributes, making it essential to provide explanations that detail which attributes are responsible for the abnormality of a data point, referred to as outlying attributes. However, the unbounded volume of data and concept drift of data streams pose challenges for discovering the outlying attributes of outliers in real time. In response, in this paper we propose EXOS, an algorithm designed for discovering the outlying attributes of multi-dimensional outliers in data streams. EXOS leverages cross-correlations among data streams, accommodates varying data stream schemas and arrival rates, and effectively addresses challenges related to the unbounded volume of data and concept drift. The algorithm is model-agnostic for point outlier detection and provides real-time explanations based on the local context of the outlier, derived from time-based tumbling windows. The paper provides a complexity analysis of EXOS and an experimental analysis comparing EXOS with existing algorithms. The evaluation includes an assessment of performance on both real-world and synthetic datasets in terms of average precision, recall, F1-score, and explanation time. The evaluation results show that, on average, EXOS achieves a 45.6% better F1 Score and is 7.3 times lower in explanation time compared to existing outlying attribute algorithms.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"154 ","pages":"Article 102349"},"PeriodicalIF":2.7000,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Discovering outlying attributes of outliers in data streams\",\"authors\":\"Egawati Panjei , Le Gruenwald\",\"doi\":\"10.1016/j.datak.2024.102349\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Data streams, continuous sequences of timestamped data points, necessitate real-time monitoring due to their time-sensitive nature. In various data stream applications, such as network security and credit card transaction monitoring, real-time detection of outliers is crucial, as these outliers often signify potential threats. Equally important is the real-time explanation of outliers, enabling users to glean insights and thereby shorten their investigation time. The investigation time for outliers is closely tied to their number of attributes, making it essential to provide explanations that detail which attributes are responsible for the abnormality of a data point, referred to as outlying attributes. However, the unbounded volume of data and concept drift of data streams pose challenges for discovering the outlying attributes of outliers in real time. In response, in this paper we propose EXOS, an algorithm designed for discovering the outlying attributes of multi-dimensional outliers in data streams. EXOS leverages cross-correlations among data streams, accommodates varying data stream schemas and arrival rates, and effectively addresses challenges related to the unbounded volume of data and concept drift. The algorithm is model-agnostic for point outlier detection and provides real-time explanations based on the local context of the outlier, derived from time-based tumbling windows. The paper provides a complexity analysis of EXOS and an experimental analysis comparing EXOS with existing algorithms. The evaluation includes an assessment of performance on both real-world and synthetic datasets in terms of average precision, recall, F1-score, and explanation time. The evaluation results show that, on average, EXOS achieves a 45.6% better F1 Score and is 7.3 times lower in explanation time compared to existing outlying attribute algorithms.</p></div>\",\"PeriodicalId\":55184,\"journal\":{\"name\":\"Data & Knowledge Engineering\",\"volume\":\"154 \",\"pages\":\"Article 102349\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2024-08-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Data & Knowledge Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0169023X24000739\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data & Knowledge Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169023X24000739","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Discovering outlying attributes of outliers in data streams
Data streams, continuous sequences of timestamped data points, necessitate real-time monitoring due to their time-sensitive nature. In various data stream applications, such as network security and credit card transaction monitoring, real-time detection of outliers is crucial, as these outliers often signify potential threats. Equally important is the real-time explanation of outliers, enabling users to glean insights and thereby shorten their investigation time. The investigation time for outliers is closely tied to their number of attributes, making it essential to provide explanations that detail which attributes are responsible for the abnormality of a data point, referred to as outlying attributes. However, the unbounded volume of data and concept drift of data streams pose challenges for discovering the outlying attributes of outliers in real time. In response, in this paper we propose EXOS, an algorithm designed for discovering the outlying attributes of multi-dimensional outliers in data streams. EXOS leverages cross-correlations among data streams, accommodates varying data stream schemas and arrival rates, and effectively addresses challenges related to the unbounded volume of data and concept drift. The algorithm is model-agnostic for point outlier detection and provides real-time explanations based on the local context of the outlier, derived from time-based tumbling windows. The paper provides a complexity analysis of EXOS and an experimental analysis comparing EXOS with existing algorithms. The evaluation includes an assessment of performance on both real-world and synthetic datasets in terms of average precision, recall, F1-score, and explanation time. The evaluation results show that, on average, EXOS achieves a 45.6% better F1 Score and is 7.3 times lower in explanation time compared to existing outlying attribute algorithms.
期刊介绍:
Data & Knowledge Engineering (DKE) stimulates the exchange of ideas and interaction between these two related fields of interest. DKE reaches a world-wide audience of researchers, designers, managers and users. The major aim of the journal is to identify, investigate and analyze the underlying principles in the design and effective use of these systems.