{"title":"大数据流中上下文感知异常点检测的新框架","authors":"Hussien Ahmad, S. Dowaji","doi":"10.6025/JDIM/2018/16/5/213-222","DOIUrl":null,"url":null,"abstract":"Outlier and anomaly detection has always been a critical problem in many fields. Although it has been investigated deeply in data mining, the problem has become more difficult and critical in the Big Data era since the volume, velocity and variety of data change drastically with rather complicated types of outliers. In such an environment, where real-time outlier detection and analysis over data streams is a necessity, the existing solutions are no longer effective and sufficient. While many existing algorithms and approaches consider the content of the data stream, there are few approaches which consider the context and conditions in which the content has been produced. In this paper, we propose a novel framework for contextual outlier detection in big data streams which inject the contextual attributes in the stream content as a primary input for outlier detection rather than using the stream content alone or applying the contextual detection on content anomalies only. The detection algorithm incorporates two approaches; the first, a supervised detection method and the other, an unsupervised, which allows the detection process to adapt to the normal change in the stream behavior over time. The detected outliers are either both content and contextual outliers or contextual outliers only. The proposed contextual detection approach prunes the false positive outliers and detects the true negative outliers at the same time. Moreover, in this framework, the detection engine preserves both outliers and context values in which those outliers were detected to be used in the engine self-training and in outliers modeling in order to enhance the outlier prediction accuracy. Journal of Digital Information Management Subject Categories and Descriptors H.2 [Database Management] H.2.8 Database Applications];","PeriodicalId":197165,"journal":{"name":"Journal of Digital Information Management","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A Novel Framework for Context-aware Outlier Detection in Big Data Streams\",\"authors\":\"Hussien Ahmad, S. Dowaji\",\"doi\":\"10.6025/JDIM/2018/16/5/213-222\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Outlier and anomaly detection has always been a critical problem in many fields. Although it has been investigated deeply in data mining, the problem has become more difficult and critical in the Big Data era since the volume, velocity and variety of data change drastically with rather complicated types of outliers. In such an environment, where real-time outlier detection and analysis over data streams is a necessity, the existing solutions are no longer effective and sufficient. While many existing algorithms and approaches consider the content of the data stream, there are few approaches which consider the context and conditions in which the content has been produced. In this paper, we propose a novel framework for contextual outlier detection in big data streams which inject the contextual attributes in the stream content as a primary input for outlier detection rather than using the stream content alone or applying the contextual detection on content anomalies only. The detection algorithm incorporates two approaches; the first, a supervised detection method and the other, an unsupervised, which allows the detection process to adapt to the normal change in the stream behavior over time. The detected outliers are either both content and contextual outliers or contextual outliers only. The proposed contextual detection approach prunes the false positive outliers and detects the true negative outliers at the same time. Moreover, in this framework, the detection engine preserves both outliers and context values in which those outliers were detected to be used in the engine self-training and in outliers modeling in order to enhance the outlier prediction accuracy. Journal of Digital Information Management Subject Categories and Descriptors H.2 [Database Management] H.2.8 Database Applications];\",\"PeriodicalId\":197165,\"journal\":{\"name\":\"Journal of Digital Information Management\",\"volume\":\"70 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Digital Information Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.6025/JDIM/2018/16/5/213-222\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Digital Information Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.6025/JDIM/2018/16/5/213-222","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Novel Framework for Context-aware Outlier Detection in Big Data Streams
Outlier and anomaly detection has always been a critical problem in many fields. Although it has been investigated deeply in data mining, the problem has become more difficult and critical in the Big Data era since the volume, velocity and variety of data change drastically with rather complicated types of outliers. In such an environment, where real-time outlier detection and analysis over data streams is a necessity, the existing solutions are no longer effective and sufficient. While many existing algorithms and approaches consider the content of the data stream, there are few approaches which consider the context and conditions in which the content has been produced. In this paper, we propose a novel framework for contextual outlier detection in big data streams which inject the contextual attributes in the stream content as a primary input for outlier detection rather than using the stream content alone or applying the contextual detection on content anomalies only. The detection algorithm incorporates two approaches; the first, a supervised detection method and the other, an unsupervised, which allows the detection process to adapt to the normal change in the stream behavior over time. The detected outliers are either both content and contextual outliers or contextual outliers only. The proposed contextual detection approach prunes the false positive outliers and detects the true negative outliers at the same time. Moreover, in this framework, the detection engine preserves both outliers and context values in which those outliers were detected to be used in the engine self-training and in outliers modeling in order to enhance the outlier prediction accuracy. Journal of Digital Information Management Subject Categories and Descriptors H.2 [Database Management] H.2.8 Database Applications];