Video source identification using machine learning: A case study of 16 instant messaging applications

IF 2 4区 医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Forensic Science International-Digital Investigation Pub Date : 2024-10-01 DOI:10.1016/j.fsidi.2024.301812
Hyomin Yang , Junho Kim , Jungheum Park
{"title":"Video source identification using machine learning: A case study of 16 instant messaging applications","authors":"Hyomin Yang ,&nbsp;Junho Kim ,&nbsp;Jungheum Park","doi":"10.1016/j.fsidi.2024.301812","DOIUrl":null,"url":null,"abstract":"<div><div>In recent years, there has been a notable increase in the prevalence of cybercrimes related to video content, including the distribution of illegal videos and the sharing of copyrighted material. This has led to the growing importance of identifying the source of video files to trace the owner of the files involved in the incident or identify the distributor. Previous research has concentrated on revealing the device (brand and/or model) that “originally” created a video file. This has been achieved by analysing the pattern noise generated by the image sensor in the camera, the storage structural features of the file, and the metadata patterns. However, due to the widespread use of mobile environments, instant messaging applications (IMAs) such as Telegram and Wire have been utilized to share illegal videos, which can result in the loss of information from the original file due to re-encoding at the application level, depending on the transmission settings. Consequently, it is necessary to extend the scope of existing research to identify the various applications that are capable of re-encoding video files in transit. Furthermore, it is essential to determine whether there are features that can be leveraged to distinguish them from the source identification perspective. In this paper, we propose a machine learning-based methodology for classifying the source application by extracting various features stored in the storage format and internal metadata of video files. To conduct this study, we analyzed 16 IMAs that are widely used in mobile environments and generated a total of 1974 sample videos, taking into account both the transmission options and encoding settings offered by each IMA. The training and testing results on this dataset indicate that the ExtraTrees model achieved an identification accuracy of approximately 99.96 %. Furthermore, we developed a proof-of-concept tool based on the proposed method, which extracts the suggested features from videos and queries a pre-trained model. This tool is released as open-source software for the community.</div></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":null,"pages":null},"PeriodicalIF":2.0000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Forensic Science International-Digital Investigation","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666281724001367","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

In recent years, there has been a notable increase in the prevalence of cybercrimes related to video content, including the distribution of illegal videos and the sharing of copyrighted material. This has led to the growing importance of identifying the source of video files to trace the owner of the files involved in the incident or identify the distributor. Previous research has concentrated on revealing the device (brand and/or model) that “originally” created a video file. This has been achieved by analysing the pattern noise generated by the image sensor in the camera, the storage structural features of the file, and the metadata patterns. However, due to the widespread use of mobile environments, instant messaging applications (IMAs) such as Telegram and Wire have been utilized to share illegal videos, which can result in the loss of information from the original file due to re-encoding at the application level, depending on the transmission settings. Consequently, it is necessary to extend the scope of existing research to identify the various applications that are capable of re-encoding video files in transit. Furthermore, it is essential to determine whether there are features that can be leveraged to distinguish them from the source identification perspective. In this paper, we propose a machine learning-based methodology for classifying the source application by extracting various features stored in the storage format and internal metadata of video files. To conduct this study, we analyzed 16 IMAs that are widely used in mobile environments and generated a total of 1974 sample videos, taking into account both the transmission options and encoding settings offered by each IMA. The training and testing results on this dataset indicate that the ExtraTrees model achieved an identification accuracy of approximately 99.96 %. Furthermore, we developed a proof-of-concept tool based on the proposed method, which extracts the suggested features from videos and queries a pre-trained model. This tool is released as open-source software for the community.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用机器学习识别视频源:16 款即时通讯应用的案例研究
近年来,与视频内容有关的网络犯罪明显增多,包括传播非法视频和共享受版权保护的资料。因此,识别视频文件的来源以追踪事件所涉文件的所有者或识别传播者变得越来越重要。以往的研究主要集中于揭示 "最初 "创建视频文件的设备(品牌和/或型号)。这是通过分析摄像机图像传感器产生的模式噪声、文件的存储结构特征和元数据模式来实现的。然而,由于移动环境的广泛使用,Telegram 和 Wire 等即时通讯应用程序(IMA)已被用来共享非法视频,这可能会导致原始文件的信息丢失,原因是根据传输设置在应用程序级别进行了重新编码。因此,有必要扩大现有研究的范围,以确定能够在传输过程中对视频文件进行重新编码的各种应用程序。此外,还必须确定是否存在可以利用的特征,以便从来源识别的角度区分它们。在本文中,我们提出了一种基于机器学习的方法,通过提取存储在视频文件的存储格式和内部元数据中的各种特征来对源应用程序进行分类。为了开展这项研究,我们分析了在移动环境中广泛使用的 16 种 IMA,并生成了总共 1974 个样本视频,同时考虑了每种 IMA 提供的传输选项和编码设置。该数据集的训练和测试结果表明,ExtraTrees 模型的识别准确率约为 99.96%。此外,我们还基于所提出的方法开发了一个概念验证工具,它可以从视频中提取建议的特征,并查询预先训练好的模型。该工具已作为开源软件在社区发布。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
5.90
自引率
15.00%
发文量
87
审稿时长
76 days
期刊最新文献
Temporal metadata analysis: A learning classifier system approach Uncertainty and error in location traces Competence in digital forensics “What you say in the lab, stays in the lab”: A reflexive thematic analysis of current challenges and future directions of digital forensic investigations in the UK Decoding digital interactions: An extensive study of TeamViewer's Forensic Artifacts across Windows and android platforms
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1