{"title":"数据流的在线异常值检测","authors":"Md. Shiblee Sadik, L. Gruenwald","doi":"10.1145/2076623.2076635","DOIUrl":null,"url":null,"abstract":"Outlier detection is a well established area of statistics but most of the existing outlier detection techniques are designed for applications where the entire dataset is available for random access. A typical outlier detection technique constructs a standard data distribution or model and identifies the deviated data points from the model as outliers. Evidently these techniques are not suitable for online data streams where the entire dataset, due to its unbounded volume, is not available for random access. Moreover, the data distribution in data streams change over time which challenges the existing outlier detection techniques that assume a constant standard data distribution for the entire dataset. In addition, data streams are characterized by uncertainty which imposes further complexity. In this paper we propose an adaptive, online outlier detection technique addressing the aforementioned characteristics of data streams, called Adaptive Outlier Detection for Data Streams (A-ODDS), which identifies outliers with respect to all the received data points as well as temporally close data points. The temporally close data points are selected based on time and change of data distribution. We also present an efficient and online implementation of the technique and a performance study showing the superiority of A-ODDS over existing techniques in terms of accuracy and execution time on a real-life dataset collected from meteorological applications.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"47 1","pages":"88-96"},"PeriodicalIF":0.0000,"publicationDate":"2011-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":"{\"title\":\"Online outlier detection for data streams\",\"authors\":\"Md. Shiblee Sadik, L. Gruenwald\",\"doi\":\"10.1145/2076623.2076635\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Outlier detection is a well established area of statistics but most of the existing outlier detection techniques are designed for applications where the entire dataset is available for random access. A typical outlier detection technique constructs a standard data distribution or model and identifies the deviated data points from the model as outliers. Evidently these techniques are not suitable for online data streams where the entire dataset, due to its unbounded volume, is not available for random access. Moreover, the data distribution in data streams change over time which challenges the existing outlier detection techniques that assume a constant standard data distribution for the entire dataset. In addition, data streams are characterized by uncertainty which imposes further complexity. In this paper we propose an adaptive, online outlier detection technique addressing the aforementioned characteristics of data streams, called Adaptive Outlier Detection for Data Streams (A-ODDS), which identifies outliers with respect to all the received data points as well as temporally close data points. The temporally close data points are selected based on time and change of data distribution. We also present an efficient and online implementation of the technique and a performance study showing the superiority of A-ODDS over existing techniques in terms of accuracy and execution time on a real-life dataset collected from meteorological applications.\",\"PeriodicalId\":93615,\"journal\":{\"name\":\"Proceedings. International Database Engineering and Applications Symposium\",\"volume\":\"47 1\",\"pages\":\"88-96\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-09-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"21\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. International Database Engineering and Applications Symposium\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2076623.2076635\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. International Database Engineering and Applications Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2076623.2076635","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Outlier detection is a well established area of statistics but most of the existing outlier detection techniques are designed for applications where the entire dataset is available for random access. A typical outlier detection technique constructs a standard data distribution or model and identifies the deviated data points from the model as outliers. Evidently these techniques are not suitable for online data streams where the entire dataset, due to its unbounded volume, is not available for random access. Moreover, the data distribution in data streams change over time which challenges the existing outlier detection techniques that assume a constant standard data distribution for the entire dataset. In addition, data streams are characterized by uncertainty which imposes further complexity. In this paper we propose an adaptive, online outlier detection technique addressing the aforementioned characteristics of data streams, called Adaptive Outlier Detection for Data Streams (A-ODDS), which identifies outliers with respect to all the received data points as well as temporally close data points. The temporally close data points are selected based on time and change of data distribution. We also present an efficient and online implementation of the technique and a performance study showing the superiority of A-ODDS over existing techniques in terms of accuracy and execution time on a real-life dataset collected from meteorological applications.