A fast parallelized DBSCAN algorithm based on OpenMp for detection of criminals on streaming services

IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Frontiers in Big Data Pub Date : 2023-10-31 DOI:10.3389/fdata.2023.1292923

Lesia Mochurad, Andrii Sydor, Oleh Ratinskiy

{"title":"A fast parallelized DBSCAN algorithm based on OpenMp for detection of criminals on streaming services","authors":"Lesia Mochurad, Andrii Sydor, Oleh Ratinskiy","doi":"10.3389/fdata.2023.1292923","DOIUrl":null,"url":null,"abstract":"Introduction Streaming services are highly popular today. Millions of people watch live streams or videos and listen to music. Methods One of the most popular streaming platforms is Twitch, and data from this type of service can be a good example for applying the parallel DBSCAN algorithm proposed in this paper. Unlike the classical approach to neighbor search, the proposed one avoids redundancy, i.e., the repetition of the same calculations. At the same time, this algorithm is based on the classical DBSCAN method with a full search for all neighbors, parallelization by subtasks, and OpenMP parallel computing technology. Results In this work, without reducing the accuracy, we managed to speed up the solution based on the DBSCAN algorithm when analyzing medium-sized data. As a result, the acceleration rate tends to the number of cores of a multicore computer system and the efficiency to one. Discussion Before conducting numerical experiments, theoretical estimates of speed-up and efficiency were obtained, and they aligned with the results obtained, confirming their validity. The quality of the performed clustering was verified using the silhouette value. All experiments were conducted using different percentages of medium-sized datasets. The prospects of applying the proposed algorithm can be obtained in various fields such as advertising, marketing, cybersecurity, and sociology. It is worth mentioning that datasets of this kind are often used for detecting fraud on the Internet, making an algorithm capable of considering all neighbors a useful tool for such research.","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":"2020 27","pages":"0"},"PeriodicalIF":2.4000,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Big Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fdata.2023.1292923","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Introduction Streaming services are highly popular today. Millions of people watch live streams or videos and listen to music. Methods One of the most popular streaming platforms is Twitch, and data from this type of service can be a good example for applying the parallel DBSCAN algorithm proposed in this paper. Unlike the classical approach to neighbor search, the proposed one avoids redundancy, i.e., the repetition of the same calculations. At the same time, this algorithm is based on the classical DBSCAN method with a full search for all neighbors, parallelization by subtasks, and OpenMP parallel computing technology. Results In this work, without reducing the accuracy, we managed to speed up the solution based on the DBSCAN algorithm when analyzing medium-sized data. As a result, the acceleration rate tends to the number of cores of a multicore computer system and the efficiency to one. Discussion Before conducting numerical experiments, theoretical estimates of speed-up and efficiency were obtained, and they aligned with the results obtained, confirming their validity. The quality of the performed clustering was verified using the silhouette value. All experiments were conducted using different percentages of medium-sized datasets. The prospects of applying the proposed algorithm can be obtained in various fields such as advertising, marketing, cybersecurity, and sociology. It is worth mentioning that datasets of this kind are often used for detecting fraud on the Internet, making an algorithm capable of considering all neighbors a useful tool for such research.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

一种基于OpenMp的快速并行DBSCAN算法用于流媒体服务中的犯罪分子检测

流媒体服务在今天非常受欢迎。数百万人观看直播或视频，听音乐。方法Twitch是最流行的流媒体平台之一，该服务的数据可以作为应用本文提出的并行DBSCAN算法的一个很好的例子。与传统的邻居搜索方法不同，该方法避免了冗余，即重复相同的计算。同时，该算法基于经典的DBSCAN方法，充分搜索所有邻居，采用子任务并行化和OpenMP并行计算技术。结果在不降低准确率的情况下，在分析中等规模数据时，我们成功地提高了基于DBSCAN算法的求解速度。因此，加速速率趋向于多核计算机系统的核数，效率趋向于1。在进行数值实验之前，得到了加速和效率的理论估计，并与所得结果相吻合，证实了其有效性。使用轮廓值验证所执行聚类的质量。所有实验都使用不同百分比的中型数据集进行。该算法在广告、市场营销、网络安全、社会学等领域具有广泛的应用前景。值得一提的是，这类数据集经常被用来检测互联网上的欺诈行为，这使得能够考虑所有邻居的算法成为此类研究的有用工具。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊