Towards Safe Cyber Practices: Developing a Proactive Cyber-Threat Intelligence System for Dark Web Forum Content by Identifying Cybercrimes

Inf. Comput. Pub Date : 2023-06-18 DOI:10.3390/info14060349
Kanti Singh Sangher, Archana Singh, Hari Mohan Pandey, Vivek Kumar
{"title":"Towards Safe Cyber Practices: Developing a Proactive Cyber-Threat Intelligence System for Dark Web Forum Content by Identifying Cybercrimes","authors":"Kanti Singh Sangher, Archana Singh, Hari Mohan Pandey, Vivek Kumar","doi":"10.3390/info14060349","DOIUrl":null,"url":null,"abstract":"The untraceable part of the Deep Web, also known as the Dark Web, is one of the most used “secretive spaces” to execute all sorts of illegal and criminal activities by terrorists, cybercriminals, spies, and offenders. Identifying actions, products, and offenders on the Dark Web is challenging due to its size, intractability, and anonymity. Therefore, it is crucial to intelligently enforce tools and techniques capable of identifying the activities of the Dark Web to assist law enforcement agencies as a support system. Therefore, this study proposes four deep learning architectures (RNN, CNN, LSTM, and Transformer)-based classification models using the pre-trained word embedding representations to identify illicit activities related to cybercrimes on Dark Web forums. We used the Agora dataset derived from the DarkNet market archive, which lists 109 activities by category. The listings in the dataset are vaguely described, and several data points are untagged, which rules out the automatic labeling of category items as target classes. Hence, to overcome this constraint, we applied a meticulously designed human annotation scheme to annotate the data, taking into account all the attributes to infer the context. In this research, we conducted comprehensive evaluations to assess the performance of our proposed approach. Our proposed BERT-based classification model achieved an accuracy score of 96%. Given the unbalancedness of the experimental data, our results indicate the advantage of our tailored data preprocessing strategies and validate our annotation scheme. Thus, in real-world scenarios, our work can be used to analyze Dark Web forums and identify cybercrimes by law enforcement agencies and can pave the path to develop sophisticated systems as per the requirements.","PeriodicalId":13622,"journal":{"name":"Inf. Comput.","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Inf. Comput.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/info14060349","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

The untraceable part of the Deep Web, also known as the Dark Web, is one of the most used “secretive spaces” to execute all sorts of illegal and criminal activities by terrorists, cybercriminals, spies, and offenders. Identifying actions, products, and offenders on the Dark Web is challenging due to its size, intractability, and anonymity. Therefore, it is crucial to intelligently enforce tools and techniques capable of identifying the activities of the Dark Web to assist law enforcement agencies as a support system. Therefore, this study proposes four deep learning architectures (RNN, CNN, LSTM, and Transformer)-based classification models using the pre-trained word embedding representations to identify illicit activities related to cybercrimes on Dark Web forums. We used the Agora dataset derived from the DarkNet market archive, which lists 109 activities by category. The listings in the dataset are vaguely described, and several data points are untagged, which rules out the automatic labeling of category items as target classes. Hence, to overcome this constraint, we applied a meticulously designed human annotation scheme to annotate the data, taking into account all the attributes to infer the context. In this research, we conducted comprehensive evaluations to assess the performance of our proposed approach. Our proposed BERT-based classification model achieved an accuracy score of 96%. Given the unbalancedness of the experimental data, our results indicate the advantage of our tailored data preprocessing strategies and validate our annotation scheme. Thus, in real-world scenarios, our work can be used to analyze Dark Web forums and identify cybercrimes by law enforcement agencies and can pave the path to develop sophisticated systems as per the requirements.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
迈向安全网络实践:通过识别网络犯罪为暗网论坛内容开发主动网络威胁情报系统
深网中无法追踪的部分,也被称为暗网,是恐怖分子、网络罪犯、间谍和罪犯执行各种非法和犯罪活动的最常用的“秘密空间”之一。由于暗网的规模、难处理性和匿名性,在暗网上识别行动、产品和罪犯是具有挑战性的。因此,智能执行能够识别暗网活动的工具和技术,以协助执法机构作为支持系统是至关重要的。因此,本研究提出了四种基于深度学习架构(RNN、CNN、LSTM和Transformer)的分类模型,使用预训练的词嵌入表示来识别暗网论坛上与网络犯罪相关的非法活动。我们使用了来自暗网市场档案的Agora数据集,它按类别列出了109项活动。数据集中的清单是模糊描述的,并且有几个数据点是未标记的,这就排除了将类别项自动标记为目标类的可能性。因此,为了克服这一限制,我们采用了精心设计的人工注释方案来注释数据,考虑到所有属性来推断上下文。在这项研究中,我们进行了全面的评估,以评估我们提出的方法的性能。我们提出的基于bert的分类模型达到了96%的准确率。考虑到实验数据的不平衡性,我们的结果表明了我们定制的数据预处理策略的优势,并验证了我们的标注方案。因此,在现实世界中,我们的工作可以用来分析暗网论坛和识别执法机构的网络犯罪,并可以为根据要求开发复杂的系统铺平道路。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Traceable Constant-Size Multi-authority Credentials Pspace-Completeness of the Temporal Logic of Sub-Intervals and Suffixes Employee Productivity Assessment Using Fuzzy Inference System Correction of Threshold Determination in Rapid-Guessing Behaviour Detection Combining Classifiers for Deep Learning Mask Face Recognition
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1