A fast parallelized DBSCAN algorithm based on OpenMp for detection of criminals on streaming services

IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Frontiers in Big Data Pub Date : 2023-10-31 DOI:10.3389/fdata.2023.1292923
Lesia Mochurad, Andrii Sydor, Oleh Ratinskiy
{"title":"A fast parallelized DBSCAN algorithm based on OpenMp for detection of criminals on streaming services","authors":"Lesia Mochurad, Andrii Sydor, Oleh Ratinskiy","doi":"10.3389/fdata.2023.1292923","DOIUrl":null,"url":null,"abstract":"Introduction Streaming services are highly popular today. Millions of people watch live streams or videos and listen to music. Methods One of the most popular streaming platforms is Twitch, and data from this type of service can be a good example for applying the parallel DBSCAN algorithm proposed in this paper. Unlike the classical approach to neighbor search, the proposed one avoids redundancy, i.e., the repetition of the same calculations. At the same time, this algorithm is based on the classical DBSCAN method with a full search for all neighbors, parallelization by subtasks, and OpenMP parallel computing technology. Results In this work, without reducing the accuracy, we managed to speed up the solution based on the DBSCAN algorithm when analyzing medium-sized data. As a result, the acceleration rate tends to the number of cores of a multicore computer system and the efficiency to one. Discussion Before conducting numerical experiments, theoretical estimates of speed-up and efficiency were obtained, and they aligned with the results obtained, confirming their validity. The quality of the performed clustering was verified using the silhouette value. All experiments were conducted using different percentages of medium-sized datasets. The prospects of applying the proposed algorithm can be obtained in various fields such as advertising, marketing, cybersecurity, and sociology. It is worth mentioning that datasets of this kind are often used for detecting fraud on the Internet, making an algorithm capable of considering all neighbors a useful tool for such research.","PeriodicalId":52859,"journal":{"name":"Frontiers in Big Data","volume":null,"pages":null},"PeriodicalIF":2.4000,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Big Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fdata.2023.1292923","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction Streaming services are highly popular today. Millions of people watch live streams or videos and listen to music. Methods One of the most popular streaming platforms is Twitch, and data from this type of service can be a good example for applying the parallel DBSCAN algorithm proposed in this paper. Unlike the classical approach to neighbor search, the proposed one avoids redundancy, i.e., the repetition of the same calculations. At the same time, this algorithm is based on the classical DBSCAN method with a full search for all neighbors, parallelization by subtasks, and OpenMP parallel computing technology. Results In this work, without reducing the accuracy, we managed to speed up the solution based on the DBSCAN algorithm when analyzing medium-sized data. As a result, the acceleration rate tends to the number of cores of a multicore computer system and the efficiency to one. Discussion Before conducting numerical experiments, theoretical estimates of speed-up and efficiency were obtained, and they aligned with the results obtained, confirming their validity. The quality of the performed clustering was verified using the silhouette value. All experiments were conducted using different percentages of medium-sized datasets. The prospects of applying the proposed algorithm can be obtained in various fields such as advertising, marketing, cybersecurity, and sociology. It is worth mentioning that datasets of this kind are often used for detecting fraud on the Internet, making an algorithm capable of considering all neighbors a useful tool for such research.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
一种基于OpenMp的快速并行DBSCAN算法用于流媒体服务中的犯罪分子检测
流媒体服务在今天非常受欢迎。数百万人观看直播或视频,听音乐。方法Twitch是最流行的流媒体平台之一,该服务的数据可以作为应用本文提出的并行DBSCAN算法的一个很好的例子。与传统的邻居搜索方法不同,该方法避免了冗余,即重复相同的计算。同时,该算法基于经典的DBSCAN方法,充分搜索所有邻居,采用子任务并行化和OpenMP并行计算技术。结果在不降低准确率的情况下,在分析中等规模数据时,我们成功地提高了基于DBSCAN算法的求解速度。因此,加速速率趋向于多核计算机系统的核数,效率趋向于1。在进行数值实验之前,得到了加速和效率的理论估计,并与所得结果相吻合,证实了其有效性。使用轮廓值验证所执行聚类的质量。所有实验都使用不同百分比的中型数据集进行。该算法在广告、市场营销、网络安全、社会学等领域具有广泛的应用前景。值得一提的是,这类数据集经常被用来检测互联网上的欺诈行为,这使得能够考虑所有邻居的算法成为此类研究的有用工具。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
5.20
自引率
3.20%
发文量
122
审稿时长
13 weeks
期刊最新文献
Exploring code portability solutions for HEP with a particle tracking test code. Editorial: Utilizing big data and deep learning to improve healthcare intelligence and biomedical service delivery. Big data and AI for gender equality in health: bias is a big challenge. Integrating longitudinal mental health data into a staging database: harnessing DDI-lifecycle and OMOP vocabularies within the INSPIRE Network Datahub. AI security and cyber risk in IoT systems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1