大数据流中上下文感知异常点检测的新框架

Journal of Digital Information Management Pub Date : 2018-10-01 DOI:10.6025/JDIM/2018/16/5/213-222

Hussien Ahmad, S. Dowaji

{"title":"大数据流中上下文感知异常点检测的新框架","authors":"Hussien Ahmad, S. Dowaji","doi":"10.6025/JDIM/2018/16/5/213-222","DOIUrl":null,"url":null,"abstract":"Outlier and anomaly detection has always been a critical problem in many fields. Although it has been investigated deeply in data mining, the problem has become more difficult and critical in the Big Data era since the volume, velocity and variety of data change drastically with rather complicated types of outliers. In such an environment, where real-time outlier detection and analysis over data streams is a necessity, the existing solutions are no longer effective and sufficient. While many existing algorithms and approaches consider the content of the data stream, there are few approaches which consider the context and conditions in which the content has been produced. In this paper, we propose a novel framework for contextual outlier detection in big data streams which inject the contextual attributes in the stream content as a primary input for outlier detection rather than using the stream content alone or applying the contextual detection on content anomalies only. The detection algorithm incorporates two approaches; the first, a supervised detection method and the other, an unsupervised, which allows the detection process to adapt to the normal change in the stream behavior over time. The detected outliers are either both content and contextual outliers or contextual outliers only. The proposed contextual detection approach prunes the false positive outliers and detects the true negative outliers at the same time. Moreover, in this framework, the detection engine preserves both outliers and context values in which those outliers were detected to be used in the engine self-training and in outliers modeling in order to enhance the outlier prediction accuracy. Journal of Digital Information Management Subject Categories and Descriptors H.2 [Database Management] H.2.8 Database Applications];","PeriodicalId":197165,"journal":{"name":"Journal of Digital Information Management","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A Novel Framework for Context-aware Outlier Detection in Big Data Streams\",\"authors\":\"Hussien Ahmad, S. Dowaji\",\"doi\":\"10.6025/JDIM/2018/16/5/213-222\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Outlier and anomaly detection has always been a critical problem in many fields. Although it has been investigated deeply in data mining, the problem has become more difficult and critical in the Big Data era since the volume, velocity and variety of data change drastically with rather complicated types of outliers. In such an environment, where real-time outlier detection and analysis over data streams is a necessity, the existing solutions are no longer effective and sufficient. While many existing algorithms and approaches consider the content of the data stream, there are few approaches which consider the context and conditions in which the content has been produced. In this paper, we propose a novel framework for contextual outlier detection in big data streams which inject the contextual attributes in the stream content as a primary input for outlier detection rather than using the stream content alone or applying the contextual detection on content anomalies only. The detection algorithm incorporates two approaches; the first, a supervised detection method and the other, an unsupervised, which allows the detection process to adapt to the normal change in the stream behavior over time. The detected outliers are either both content and contextual outliers or contextual outliers only. The proposed contextual detection approach prunes the false positive outliers and detects the true negative outliers at the same time. Moreover, in this framework, the detection engine preserves both outliers and context values in which those outliers were detected to be used in the engine self-training and in outliers modeling in order to enhance the outlier prediction accuracy. Journal of Digital Information Management Subject Categories and Descriptors H.2 [Database Management] H.2.8 Database Applications];\",\"PeriodicalId\":197165,\"journal\":{\"name\":\"Journal of Digital Information Management\",\"volume\":\"70 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Digital Information Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.6025/JDIM/2018/16/5/213-222\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Digital Information Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.6025/JDIM/2018/16/5/213-222","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

异常点和异常检测一直是许多领域的关键问题。尽管在数据挖掘中已经进行了深入的研究，但在大数据时代，由于数据的数量、速度和种类发生了巨大变化，异常值类型相当复杂，因此这个问题变得更加困难和关键。在这样的环境中，需要对数据流进行实时异常值检测和分析，现有的解决方案不再有效和充分。虽然许多现有的算法和方法考虑数据流的内容，但很少有方法考虑产生内容的上下文和条件。在本文中，我们提出了一种新的大数据流上下文异常点检测框架，该框架将流内容中的上下文属性作为异常点检测的主要输入，而不是单独使用流内容或仅对内容异常应用上下文检测。该检测算法包含两种方法;第一种是监督检测方法，另一种是无监督检测方法，它允许检测过程适应流行为随时间的正常变化。检测到的离群值要么是内容和上下文离群值，要么只是上下文离群值。本文提出的上下文检测方法在剔除假阳性异常值的同时检测真阴性异常值。此外，在该框架中，检测引擎同时保留了异常值和上下文值，将检测到的异常值用于引擎的自训练和异常值建模，以提高异常值预测的精度。数字信息管理学科分类与描述符H.2[数据库管理]H.2.8数据库应用];

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A Novel Framework for Context-aware Outlier Detection in Big Data Streams

Outlier and anomaly detection has always been a critical problem in many fields. Although it has been investigated deeply in data mining, the problem has become more difficult and critical in the Big Data era since the volume, velocity and variety of data change drastically with rather complicated types of outliers. In such an environment, where real-time outlier detection and analysis over data streams is a necessity, the existing solutions are no longer effective and sufficient. While many existing algorithms and approaches consider the content of the data stream, there are few approaches which consider the context and conditions in which the content has been produced. In this paper, we propose a novel framework for contextual outlier detection in big data streams which inject the contextual attributes in the stream content as a primary input for outlier detection rather than using the stream content alone or applying the contextual detection on content anomalies only. The detection algorithm incorporates two approaches; the first, a supervised detection method and the other, an unsupervised, which allows the detection process to adapt to the normal change in the stream behavior over time. The detected outliers are either both content and contextual outliers or contextual outliers only. The proposed contextual detection approach prunes the false positive outliers and detects the true negative outliers at the same time. Moreover, in this framework, the detection engine preserves both outliers and context values in which those outliers were detected to be used in the engine self-training and in outliers modeling in order to enhance the outlier prediction accuracy. Journal of Digital Information Management Subject Categories and Descriptors H.2 [Database Management] H.2.8 Database Applications];

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Digital Information Management

自引率

0.00%

发文量