MIR_MAD: An Efficient and On-line Approach for Anomaly Detection in Dynamic Data Stream

2020 International Conference on Data Mining Workshops (ICDMW) Pub Date : 2020-11-01 DOI:10.1109/ICDMW51313.2020.00065

Chang How Tan, V. C. Lee, Mahsa Salehi

{"title":"MIR_MAD: An Efficient and On-line Approach for Anomaly Detection in Dynamic Data Stream","authors":"Chang How Tan, V. C. Lee, Mahsa Salehi","doi":"10.1109/ICDMW51313.2020.00065","DOIUrl":null,"url":null,"abstract":"Anomaly detection in a dynamic data stream is a challenging task. The endless bound and high arriving rate of data prohibits anomaly detection models to store all observations in memory for processing. In addition, the dynamically moving properties of the data stream exhibit concept drift. While recent studies focus on feature extraction for anomaly detection, majority of them assume data stream are static ignoring the possibility of concept drift occurring. Anomaly detection models must operate efficiently in order to deal with high volume and velocity data, that is to have low complexity and to learn incrementally from each arriving observation. Incremental learning allows the model to adapt to concept drift. In cases where drifting rate is higher than adaptation rate, the capability to detect concept drift and retraining a new model is much preferable to minimize the performance losses. In this paper, we propose MIR_MAD, an approach based on multiple incremental robust Mahalanobis estimators that is efficient, learns incrementally and has the capability to detect concept drift. MIR_MAD is fast, can be initialized with small amount of data, and is able to estimate the drift location on the data stream. Our empirical results show that MIR_MAD achieves state-of-the-art performance and is significantly faster. We also performed a case study to show that detecting concept drift is critical to minimize the reduction in model's performance.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Data Mining Workshops (ICDMW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW51313.2020.00065","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Anomaly detection in a dynamic data stream is a challenging task. The endless bound and high arriving rate of data prohibits anomaly detection models to store all observations in memory for processing. In addition, the dynamically moving properties of the data stream exhibit concept drift. While recent studies focus on feature extraction for anomaly detection, majority of them assume data stream are static ignoring the possibility of concept drift occurring. Anomaly detection models must operate efficiently in order to deal with high volume and velocity data, that is to have low complexity and to learn incrementally from each arriving observation. Incremental learning allows the model to adapt to concept drift. In cases where drifting rate is higher than adaptation rate, the capability to detect concept drift and retraining a new model is much preferable to minimize the performance losses. In this paper, we propose MIR_MAD, an approach based on multiple incremental robust Mahalanobis estimators that is efficient, learns incrementally and has the capability to detect concept drift. MIR_MAD is fast, can be initialized with small amount of data, and is able to estimate the drift location on the data stream. Our empirical results show that MIR_MAD achieves state-of-the-art performance and is significantly faster. We also performed a case study to show that detecting concept drift is critical to minimize the reduction in model's performance.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

MIR_MAD:一种高效的动态数据流异常在线检测方法

动态数据流中的异常检测是一项具有挑战性的任务。数据的无限边界和高到达率使得异常检测模型无法将所有观测数据存储在内存中进行处理。此外，数据流的动态移动特性表现出概念漂移。目前的研究主要集中在异常检测的特征提取上，但大多假设数据流是静态的，忽略了概念漂移发生的可能性。异常检测模型必须高效运行，以处理大容量、高速度的数据，即具有较低的复杂性，并从每个到达的观测点中进行增量学习。增量学习允许模型适应概念漂移。在漂移率高于自适应率的情况下，检测概念漂移并重新训练新模型的能力更可取，以尽量减少性能损失。在本文中，我们提出了一种基于多个增量鲁棒马氏估计器的MIR_MAD方法，该方法具有高效、增量学习和检测概念漂移的能力。MIR_MAD速度快，可以用少量数据初始化，并且能够估计数据流上的漂移位置。我们的实证结果表明，MIR_MAD达到了最先进的性能，并且明显更快。我们还进行了一个案例研究，以表明检测概念漂移对于最小化模型性能的降低至关重要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2020 International Conference on Data Mining Workshops (ICDMW)

自引率

0.00%

发文量

期刊最新文献

Synthetic Data by Principal Component Analysis Deep Contextualized Word Embedding for Text-based Online User Profiling to Detect Social Bots on Twitter Integration of Fuzzy and Deep Learning in Three-Way Decisions Mining Heterogeneous Data for Formulation Design Restructuring of Hoeffding Trees for Trapezoidal Data Streams