Unsupervised outlier detection in streaming data using weighted clustering

Yogita Thakran, Durga Toshniwal
{"title":"Unsupervised outlier detection in streaming data using weighted clustering","authors":"Yogita Thakran, Durga Toshniwal","doi":"10.1109/ISDA.2012.6416666","DOIUrl":null,"url":null,"abstract":"Outlier detection is a very important task in many fields like network intrusion detection, credit card fraud detection, stock market analysis, detecting outlying cases in medical data etc. Outlier detection in streaming data is very challenging because streaming data cannot be scanned multiple times and also new concepts may keep evolving in coming data over time. Irrelevant attributes can be termed as noisy attributes and such attributes further magnify the challenge of working with data streams. In this paper, we propose an unsupervised outlier detection scheme for streaming data. This scheme is based on clustering as clustering is an unsupervised data mining task and it does not require labeled data. In proposed scheme both density based and partitioning clustering method are combined to take advantage of both density based and distance based outlier detection. Proposed scheme also assigns weights to attributes depending upon their respective relevance in mining task and weights are adaptive in nature. Weighted attributes are helpful to reduce or remove the effect of noisy attributes. Keeping in view the challenges of streaming data, the proposed scheme is incremental and adaptive to concept evolution. Experimental results on synthetic and real world data sets show that our proposed approach outperforms other existing approach (CORM) in terms of outlier detection rate, false alarm rate, and increasing percentages of outliers.","PeriodicalId":370150,"journal":{"name":"2012 12th International Conference on Intelligent Systems Design and Applications (ISDA)","volume":"99 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"46","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 12th International Conference on Intelligent Systems Design and Applications (ISDA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISDA.2012.6416666","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 46

Abstract

Outlier detection is a very important task in many fields like network intrusion detection, credit card fraud detection, stock market analysis, detecting outlying cases in medical data etc. Outlier detection in streaming data is very challenging because streaming data cannot be scanned multiple times and also new concepts may keep evolving in coming data over time. Irrelevant attributes can be termed as noisy attributes and such attributes further magnify the challenge of working with data streams. In this paper, we propose an unsupervised outlier detection scheme for streaming data. This scheme is based on clustering as clustering is an unsupervised data mining task and it does not require labeled data. In proposed scheme both density based and partitioning clustering method are combined to take advantage of both density based and distance based outlier detection. Proposed scheme also assigns weights to attributes depending upon their respective relevance in mining task and weights are adaptive in nature. Weighted attributes are helpful to reduce or remove the effect of noisy attributes. Keeping in view the challenges of streaming data, the proposed scheme is incremental and adaptive to concept evolution. Experimental results on synthetic and real world data sets show that our proposed approach outperforms other existing approach (CORM) in terms of outlier detection rate, false alarm rate, and increasing percentages of outliers.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于加权聚类的流数据无监督异常点检测
离群点检测在网络入侵检测、信用卡欺诈检测、股票市场分析、医疗数据离群点检测等领域都是一项非常重要的任务。流数据中的异常值检测非常具有挑战性,因为流数据不能多次扫描,而且随着时间的推移,新的概念可能会不断发展。不相关的属性可以称为噪声属性,这些属性进一步增加了处理数据流的挑战。本文提出了一种流数据的无监督离群值检测方案。该方案基于聚类,因为聚类是一种无监督的数据挖掘任务,不需要标记数据。该方案将密度聚类和分区聚类相结合,充分利用了密度聚类和距离聚类的优势。该方案还根据属性在挖掘任务中的相关性为属性分配权重,且权重具有自适应性。加权属性有助于减少或消除噪声属性的影响。考虑到流数据的挑战,提出的方案是增量的,并适应概念的演变。在合成数据集和真实世界数据集上的实验结果表明,我们提出的方法在异常值检测率、虚警率和异常值增加百分比方面优于其他现有方法(CORM)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Prediction of risk score for heart disease using associative classification and hybrid feature subset selection WSDL-TC: Collaborative customization of web services Knowledge representation and reasoning based on generalised fuzzy Petri nets Interval-valued fuzzy graph representation of concept lattice Community optimization: Function optimization by a simulated web community
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1