An Empirical Study on Anomaly Detection Using Density-based and Representative-based Clustering Algorithms

Gerard Shu Fuhnwi, Janet O. Agbaje, K. Oshinubi, O. J. Peter
{"title":"An Empirical Study on Anomaly Detection Using Density-based and Representative-based Clustering Algorithms","authors":"Gerard Shu Fuhnwi, Janet O. Agbaje, K. Oshinubi, O. J. Peter","doi":"10.46481/jnsps.2023.1364","DOIUrl":null,"url":null,"abstract":"In data mining, and statistics, anomaly detection is the process of finding data patterns (outcomes, values, or observations) that deviate from the rest of the other observations or outcomes. Anomaly detection is heavily used in solving real-world problems in many application domains, like medicine, finance , cybersecurity, banking, networking, transportation, and military surveillance for enemy activities, but not limited to only these fields. In this paper, we present an empirical study on unsupervised anomaly detection techniques such as Density-Based Spatial Clustering of Applications with Noise (DBSCAN), (DBSCAN++) (with uniform initialization, k-center initialization, uniform with approximate neighbor initialization, and $k$-center with approximate neighbor initialization), and $k$-means$--$ algorithms on six benchmark imbalanced data sets. Findings from our in-depth empirical study show that k-means-- is more robust than DBSCAN, and DBSCAN++, in terms of the different evaluation measures (F1-score, False alarm rate, Adjusted rand index, and Jaccard coefficient), and running time. We also observe that DBSCAN performs very well on data sets with fewer number of data points. Moreover, the results indicate that the choice of clustering algorithm can significantly impact the performance of anomaly detection and that the performance of different algorithms varies depending on the characteristics of the data. Overall, this study provides insights into the strengths and limitations of different clustering algorithms for anomaly detection and can help guide the selection of appropriate algorithms for specific applications.","PeriodicalId":342917,"journal":{"name":"Journal of the Nigerian Society of Physical Sciences","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Nigerian Society of Physical Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.46481/jnsps.2023.1364","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

In data mining, and statistics, anomaly detection is the process of finding data patterns (outcomes, values, or observations) that deviate from the rest of the other observations or outcomes. Anomaly detection is heavily used in solving real-world problems in many application domains, like medicine, finance , cybersecurity, banking, networking, transportation, and military surveillance for enemy activities, but not limited to only these fields. In this paper, we present an empirical study on unsupervised anomaly detection techniques such as Density-Based Spatial Clustering of Applications with Noise (DBSCAN), (DBSCAN++) (with uniform initialization, k-center initialization, uniform with approximate neighbor initialization, and $k$-center with approximate neighbor initialization), and $k$-means$--$ algorithms on six benchmark imbalanced data sets. Findings from our in-depth empirical study show that k-means-- is more robust than DBSCAN, and DBSCAN++, in terms of the different evaluation measures (F1-score, False alarm rate, Adjusted rand index, and Jaccard coefficient), and running time. We also observe that DBSCAN performs very well on data sets with fewer number of data points. Moreover, the results indicate that the choice of clustering algorithm can significantly impact the performance of anomaly detection and that the performance of different algorithms varies depending on the characteristics of the data. Overall, this study provides insights into the strengths and limitations of different clustering algorithms for anomaly detection and can help guide the selection of appropriate algorithms for specific applications.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于密度和代表性聚类算法的异常检测实证研究
在数据挖掘和统计中,异常检测是查找偏离其他观察值或结果的数据模式(结果、值或观察值)的过程。异常检测在许多应用领域被大量用于解决现实世界的问题,如医学、金融、网络安全、银行、网络、交通和军事监视敌人的活动,但不仅限于这些领域。在本文中,我们在六个基准不平衡数据集上对无监督异常检测技术进行了实证研究,如基于密度的带噪声应用空间聚类(DBSCAN)、(DBSCAN++)(均匀初始化、k中心初始化、均匀与近似邻居初始化、$k$-中心与近似邻居初始化)和$k$-means$- $算法。通过深入的实证研究发现,在不同的评价指标(f1得分、虚警率、调整后的rand指数和Jaccard系数)和运行时间方面,k-means-比DBSCAN和DBSCAN++更具鲁棒性。我们还观察到,DBSCAN在数据点数量较少的数据集上执行得非常好。此外,结果表明,聚类算法的选择会显著影响异常检测的性能,不同算法的性能取决于数据的特征。总的来说,本研究提供了不同的聚类算法异常检测的优势和局限性的见解,可以帮助指导为特定应用选择合适的算法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Optimal control with the effects of ivermectin and livestock availability on malaria transmission Data safety prediction using YOLOv7+G3HN for traffic roads Novel way to predict stock movements using multiple models and comprehensive analysis: leveraging voting meta-ensemble techniques Analysis of support vector machine and random forest models for predicting the scalability of a broadband network Investigations on the structural, vibrational, optical and photocatalytic behavior of CuO, MnO and CuMnO nanomaterials
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1