大数据流中上下文感知异常点检测的新框架

Hussien Ahmad, S. Dowaji
{"title":"大数据流中上下文感知异常点检测的新框架","authors":"Hussien Ahmad, S. Dowaji","doi":"10.6025/JDIM/2018/16/5/213-222","DOIUrl":null,"url":null,"abstract":"Outlier and anomaly detection has always been a critical problem in many fields. Although it has been investigated deeply in data mining, the problem has become more difficult and critical in the Big Data era since the volume, velocity and variety of data change drastically with rather complicated types of outliers. In such an environment, where real-time outlier detection and analysis over data streams is a necessity, the existing solutions are no longer effective and sufficient. While many existing algorithms and approaches consider the content of the data stream, there are few approaches which consider the context and conditions in which the content has been produced. In this paper, we propose a novel framework for contextual outlier detection in big data streams which inject the contextual attributes in the stream content as a primary input for outlier detection rather than using the stream content alone or applying the contextual detection on content anomalies only. The detection algorithm incorporates two approaches; the first, a supervised detection method and the other, an unsupervised, which allows the detection process to adapt to the normal change in the stream behavior over time. The detected outliers are either both content and contextual outliers or contextual outliers only. The proposed contextual detection approach prunes the false positive outliers and detects the true negative outliers at the same time. Moreover, in this framework, the detection engine preserves both outliers and context values in which those outliers were detected to be used in the engine self-training and in outliers modeling in order to enhance the outlier prediction accuracy. Journal of Digital Information Management Subject Categories and Descriptors H.2 [Database Management] H.2.8 Database Applications];","PeriodicalId":197165,"journal":{"name":"Journal of Digital Information Management","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A Novel Framework for Context-aware Outlier Detection in Big Data Streams\",\"authors\":\"Hussien Ahmad, S. Dowaji\",\"doi\":\"10.6025/JDIM/2018/16/5/213-222\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Outlier and anomaly detection has always been a critical problem in many fields. Although it has been investigated deeply in data mining, the problem has become more difficult and critical in the Big Data era since the volume, velocity and variety of data change drastically with rather complicated types of outliers. In such an environment, where real-time outlier detection and analysis over data streams is a necessity, the existing solutions are no longer effective and sufficient. While many existing algorithms and approaches consider the content of the data stream, there are few approaches which consider the context and conditions in which the content has been produced. In this paper, we propose a novel framework for contextual outlier detection in big data streams which inject the contextual attributes in the stream content as a primary input for outlier detection rather than using the stream content alone or applying the contextual detection on content anomalies only. The detection algorithm incorporates two approaches; the first, a supervised detection method and the other, an unsupervised, which allows the detection process to adapt to the normal change in the stream behavior over time. The detected outliers are either both content and contextual outliers or contextual outliers only. The proposed contextual detection approach prunes the false positive outliers and detects the true negative outliers at the same time. Moreover, in this framework, the detection engine preserves both outliers and context values in which those outliers were detected to be used in the engine self-training and in outliers modeling in order to enhance the outlier prediction accuracy. Journal of Digital Information Management Subject Categories and Descriptors H.2 [Database Management] H.2.8 Database Applications];\",\"PeriodicalId\":197165,\"journal\":{\"name\":\"Journal of Digital Information Management\",\"volume\":\"70 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Digital Information Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.6025/JDIM/2018/16/5/213-222\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Digital Information Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.6025/JDIM/2018/16/5/213-222","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

异常点和异常检测一直是许多领域的关键问题。尽管在数据挖掘中已经进行了深入的研究,但在大数据时代,由于数据的数量、速度和种类发生了巨大变化,异常值类型相当复杂,因此这个问题变得更加困难和关键。在这样的环境中,需要对数据流进行实时异常值检测和分析,现有的解决方案不再有效和充分。虽然许多现有的算法和方法考虑数据流的内容,但很少有方法考虑产生内容的上下文和条件。在本文中,我们提出了一种新的大数据流上下文异常点检测框架,该框架将流内容中的上下文属性作为异常点检测的主要输入,而不是单独使用流内容或仅对内容异常应用上下文检测。该检测算法包含两种方法;第一种是监督检测方法,另一种是无监督检测方法,它允许检测过程适应流行为随时间的正常变化。检测到的离群值要么是内容和上下文离群值,要么只是上下文离群值。本文提出的上下文检测方法在剔除假阳性异常值的同时检测真阴性异常值。此外,在该框架中,检测引擎同时保留了异常值和上下文值,将检测到的异常值用于引擎的自训练和异常值建模,以提高异常值预测的精度。数字信息管理学科分类与描述符H.2[数据库管理]H.2.8数据库应用];
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A Novel Framework for Context-aware Outlier Detection in Big Data Streams
Outlier and anomaly detection has always been a critical problem in many fields. Although it has been investigated deeply in data mining, the problem has become more difficult and critical in the Big Data era since the volume, velocity and variety of data change drastically with rather complicated types of outliers. In such an environment, where real-time outlier detection and analysis over data streams is a necessity, the existing solutions are no longer effective and sufficient. While many existing algorithms and approaches consider the content of the data stream, there are few approaches which consider the context and conditions in which the content has been produced. In this paper, we propose a novel framework for contextual outlier detection in big data streams which inject the contextual attributes in the stream content as a primary input for outlier detection rather than using the stream content alone or applying the contextual detection on content anomalies only. The detection algorithm incorporates two approaches; the first, a supervised detection method and the other, an unsupervised, which allows the detection process to adapt to the normal change in the stream behavior over time. The detected outliers are either both content and contextual outliers or contextual outliers only. The proposed contextual detection approach prunes the false positive outliers and detects the true negative outliers at the same time. Moreover, in this framework, the detection engine preserves both outliers and context values in which those outliers were detected to be used in the engine self-training and in outliers modeling in order to enhance the outlier prediction accuracy. Journal of Digital Information Management Subject Categories and Descriptors H.2 [Database Management] H.2.8 Database Applications];
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Step towards Semantic Digital Library in the Arabic Region Analysis of ChatGPT as a Question-Answering Tool How Much Difference in Earthquake Risk among China’s Areas: A Study based on Pricing a Seismic Catastrophe Bond Empirical Analysis on the Efficiency of Clustering Algorithms Based on the Significance of Cluster Size COVID- 19 pandemic – An Empirical Study on the Cybersecurity Behaviour of Healthcare Sectors and Employees
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1