The Digital Detective's Discourse - A toolset for forensically sound collaborative dark web content annotation and collection

IF 0.6 Q4 COMPUTER SCIENCE, INFORMATION SYSTEMS Journal of Digital Forensics Security and Law Pub Date : 2022-01-01 DOI:10.15394/jdfsl.2022.1740
J. Bergman, O. Popov
{"title":"The Digital Detective's Discourse - A toolset for forensically sound collaborative dark web content annotation and collection","authors":"J. Bergman, O. Popov","doi":"10.15394/jdfsl.2022.1740","DOIUrl":null,"url":null,"abstract":"In the last decade, the proliferation of machine learning (ML) algorithms and their application on big data sets have benefited many researchers and practitioners in different scientific areas. Consequently, the research in cybercrime and digital forensics has relied on ML techniques and methods for analyzing large quantities of data such as text, graphics, images, videos, and network traffic scans to support criminal investigations. Complete and accurate training data sets are indispensable for efficient and effective machine learning models. An essential part of creating complete and accurate data sets is annotating or labelling data. We present a method for law enforcement agency investigators to annotate and store specific dark web content. Using a design science strategy, we design and develop tools to enable and extend web content annotation. The annotation tool was implemented as a plugin for the Tor browser. It can store web content, thus automatically creating a dataset of dark web data pertinent to criminal investigations. Combined with a central storage management server, enabling annotation sharing and collaboration, and a web scraping program, the dataset becomes multifold, dynamic, and extensive while maintaining the forensic soundness of the data saved and transmitted. To manifest our toolset’s fitness of purpose, we used our dataset as training data for ML based classification models. A five cross-fold validation technique was used to evaluate the classifiers, which reported an accuracy score of 85 96%. In the concluding sections, we discuss the possible use-cases of the proposed method in real-life cybercrime investigations, along with ethical concerns and future extensions.","PeriodicalId":43224,"journal":{"name":"Journal of Digital Forensics Security and Law","volume":null,"pages":null},"PeriodicalIF":0.6000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Digital Forensics Security and Law","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15394/jdfsl.2022.1740","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 2

Abstract

In the last decade, the proliferation of machine learning (ML) algorithms and their application on big data sets have benefited many researchers and practitioners in different scientific areas. Consequently, the research in cybercrime and digital forensics has relied on ML techniques and methods for analyzing large quantities of data such as text, graphics, images, videos, and network traffic scans to support criminal investigations. Complete and accurate training data sets are indispensable for efficient and effective machine learning models. An essential part of creating complete and accurate data sets is annotating or labelling data. We present a method for law enforcement agency investigators to annotate and store specific dark web content. Using a design science strategy, we design and develop tools to enable and extend web content annotation. The annotation tool was implemented as a plugin for the Tor browser. It can store web content, thus automatically creating a dataset of dark web data pertinent to criminal investigations. Combined with a central storage management server, enabling annotation sharing and collaboration, and a web scraping program, the dataset becomes multifold, dynamic, and extensive while maintaining the forensic soundness of the data saved and transmitted. To manifest our toolset’s fitness of purpose, we used our dataset as training data for ML based classification models. A five cross-fold validation technique was used to evaluate the classifiers, which reported an accuracy score of 85 96%. In the concluding sections, we discuss the possible use-cases of the proposed method in real-life cybercrime investigations, along with ethical concerns and future extensions.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
数字侦探的话语-一个工具集法医声音协作暗网内容注释和收集
在过去的十年中,机器学习(ML)算法的激增及其在大数据集上的应用使不同科学领域的许多研究人员和从业者受益。因此,网络犯罪和数字取证的研究依赖于机器学习技术和方法来分析大量数据,如文本、图形、图像、视频和网络流量扫描,以支持刑事调查。完整、准确的训练数据集对于高效、有效的机器学习模型是必不可少的。创建完整和准确的数据集的一个重要部分是注释或标记数据。我们提出了一种方法,执法机构的调查人员注释和存储特定的暗网内容。使用设计科学策略,我们设计和开发工具来启用和扩展web内容注释。注释工具是作为Tor浏览器的插件实现的。它可以存储网络内容,从而自动创建与刑事调查相关的暗网数据集。与中央存储管理服务器相结合,允许注释共享和协作,以及网络抓取程序,数据集变得多元,动态和广泛,同时保持保存和传输数据的法医健全。为了显示我们的工具集的目的适应度,我们使用我们的数据集作为基于ML的分类模型的训练数据。使用五交叉验证技术来评估分类器,其准确率评分为85 96%。在结语部分,我们讨论了该方法在现实生活中的网络犯罪调查中的可能用例,以及伦理问题和未来的扩展。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Digital Forensics Security and Law
Journal of Digital Forensics Security and Law COMPUTER SCIENCE, INFORMATION SYSTEMS-
自引率
0.00%
发文量
5
审稿时长
10 weeks
期刊最新文献
A CRITICAL COMPARISON OF BRAVE BROWSER AND GOOGLE CHROME FORENSIC ARTEFACTS Fault Lines In The Application Of International Humanitarian Law To Cyberwarfare To License or Not to License Reexamined: An Updated Report on Licensing of Digital Examiners Under State Private Investigator Statutes Forensic Discoverability of iOS Vault Applications A Combined Approach For Private Indexing Mechanism
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1