An Empirical Study on Anomaly Detection Using Density-based and Representative-based Clustering Algorithms

Journal of the Nigerian Society of Physical Sciences Pub Date : 2023-04-19 DOI:10.46481/jnsps.2023.1364

Gerard Shu Fuhnwi, Janet O. Agbaje, K. Oshinubi, O. J. Peter

{"title":"An Empirical Study on Anomaly Detection Using Density-based and Representative-based Clustering Algorithms","authors":"Gerard Shu Fuhnwi, Janet O. Agbaje, K. Oshinubi, O. J. Peter","doi":"10.46481/jnsps.2023.1364","DOIUrl":null,"url":null,"abstract":"In data mining, and statistics, anomaly detection is the process of finding data patterns (outcomes, values, or observations) that deviate from the rest of the other observations or outcomes. Anomaly detection is heavily used in solving real-world problems in many application domains, like medicine, finance , cybersecurity, banking, networking, transportation, and military surveillance for enemy activities, but not limited to only these fields. In this paper, we present an empirical study on unsupervised anomaly detection techniques such as Density-Based Spatial Clustering of Applications with Noise (DBSCAN), (DBSCAN++) (with uniform initialization, k-center initialization, uniform with approximate neighbor initialization, and $k$-center with approximate neighbor initialization), and $k$-means$--$ algorithms on six benchmark imbalanced data sets. Findings from our in-depth empirical study show that k-means-- is more robust than DBSCAN, and DBSCAN++, in terms of the different evaluation measures (F1-score, False alarm rate, Adjusted rand index, and Jaccard coefficient), and running time. We also observe that DBSCAN performs very well on data sets with fewer number of data points. Moreover, the results indicate that the choice of clustering algorithm can significantly impact the performance of anomaly detection and that the performance of different algorithms varies depending on the characteristics of the data. Overall, this study provides insights into the strengths and limitations of different clustering algorithms for anomaly detection and can help guide the selection of appropriate algorithms for specific applications.","PeriodicalId":342917,"journal":{"name":"Journal of the Nigerian Society of Physical Sciences","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Nigerian Society of Physical Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.46481/jnsps.2023.1364","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

In data mining, and statistics, anomaly detection is the process of finding data patterns (outcomes, values, or observations) that deviate from the rest of the other observations or outcomes. Anomaly detection is heavily used in solving real-world problems in many application domains, like medicine, finance , cybersecurity, banking, networking, transportation, and military surveillance for enemy activities, but not limited to only these fields. In this paper, we present an empirical study on unsupervised anomaly detection techniques such as Density-Based Spatial Clustering of Applications with Noise (DBSCAN), (DBSCAN++) (with uniform initialization, k-center initialization, uniform with approximate neighbor initialization, and $k$-center with approximate neighbor initialization), and $k$-means$--$ algorithms on six benchmark imbalanced data sets. Findings from our in-depth empirical study show that k-means-- is more robust than DBSCAN, and DBSCAN++, in terms of the different evaluation measures (F1-score, False alarm rate, Adjusted rand index, and Jaccard coefficient), and running time. We also observe that DBSCAN performs very well on data sets with fewer number of data points. Moreover, the results indicate that the choice of clustering algorithm can significantly impact the performance of anomaly detection and that the performance of different algorithms varies depending on the characteristics of the data. Overall, this study provides insights into the strengths and limitations of different clustering algorithms for anomaly detection and can help guide the selection of appropriate algorithms for specific applications.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于密度和代表性聚类算法的异常检测实证研究

在数据挖掘和统计中，异常检测是查找偏离其他观察值或结果的数据模式(结果、值或观察值)的过程。异常检测在许多应用领域被大量用于解决现实世界的问题，如医学、金融、网络安全、银行、网络、交通和军事监视敌人的活动，但不仅限于这些领域。在本文中，我们在六个基准不平衡数据集上对无监督异常检测技术进行了实证研究，如基于密度的带噪声应用空间聚类(DBSCAN)、(DBSCAN++)(均匀初始化、k中心初始化、均匀与近似邻居初始化、$k$-中心与近似邻居初始化)和$k$-means$- $算法。通过深入的实证研究发现，在不同的评价指标(f1得分、虚警率、调整后的rand指数和Jaccard系数)和运行时间方面，k-means-比DBSCAN和DBSCAN++更具鲁棒性。我们还观察到，DBSCAN在数据点数量较少的数据集上执行得非常好。此外，结果表明，聚类算法的选择会显著影响异常检测的性能，不同算法的性能取决于数据的特征。总的来说，本研究提供了不同的聚类算法异常检测的优势和局限性的见解，可以帮助指导为特定应用选择合适的算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of the Nigerian Society of Physical Sciences

自引率

0.00%

发文量