{"title":"Local outlier factor for anomaly detection in HPCC systems","authors":"Arya Adesh , Shobha G , Jyoti Shetty , Lili Xu","doi":"10.1016/j.jpdc.2024.104923","DOIUrl":null,"url":null,"abstract":"<div><p>Local Outlier Factor (LOF) is an unsupervised anomaly detection algorithm that finds anomalies by assessing the local density of a data point relative to its neighborhood. Anomaly detection is the process of finding anomalies in datasets. Anomalies in real-time datasets may indicate critical events like bank frauds, data compromise, network threats, etc. This paper deals with the implementation of the LOF algorithm in the HPCC Systems platform, which is an open-source distributed computing platform for big data analytics. Improved LOF is also proposed which efficiently detects anomalies in datasets rich in duplicates. The impact of varying hyperparameters on the performance of LOF is examined in HPCC Systems. This paper examines the performance of LOF with other algorithms like COF, LoOP, and kNN over several datasets in the HPCC Systems. Additionally, the efficacy of LOF is evaluated across big-data frameworks such as Spark, Hadoop, and HPCC Systems, by comparing their runtime performances.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"192 ","pages":"Article 104923"},"PeriodicalIF":3.4000,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Parallel and Distributed Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S074373152400087X","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Local Outlier Factor (LOF) is an unsupervised anomaly detection algorithm that finds anomalies by assessing the local density of a data point relative to its neighborhood. Anomaly detection is the process of finding anomalies in datasets. Anomalies in real-time datasets may indicate critical events like bank frauds, data compromise, network threats, etc. This paper deals with the implementation of the LOF algorithm in the HPCC Systems platform, which is an open-source distributed computing platform for big data analytics. Improved LOF is also proposed which efficiently detects anomalies in datasets rich in duplicates. The impact of varying hyperparameters on the performance of LOF is examined in HPCC Systems. This paper examines the performance of LOF with other algorithms like COF, LoOP, and kNN over several datasets in the HPCC Systems. Additionally, the efficacy of LOF is evaluated across big-data frameworks such as Spark, Hadoop, and HPCC Systems, by comparing their runtime performances.
期刊介绍:
This international journal is directed to researchers, engineers, educators, managers, programmers, and users of computers who have particular interests in parallel processing and/or distributed computing.
The Journal of Parallel and Distributed Computing publishes original research papers and timely review articles on the theory, design, evaluation, and use of parallel and/or distributed computing systems. The journal also features special issues on these topics; again covering the full range from the design to the use of our targeted systems.