Pub Date : 2025-11-05DOI: 10.1016/j.fsidi.2025.302023
Marouschka Vink , Ruud Schramp , Bas Kokshoorn , Marjan J. Sjerps
Digital forensic scientists primarily rely on individual internal reasoning and categorical conclusions when evaluating evidence in casework. This can make it difficult to maintain structured reasoning that is logically sound, balanced, robust, and transparent. Trojan horse defense cases exemplify these challenges in evaluating digital forensic findings. The key challenge in such cases is combining multiple observations into a logically sound probabilistic evaluation while maintaining an understandable forensic report for court and other recipients. To address these challenges, we propose using the likelihood ratio framework to evaluate digital findings in Trojan horse defense cases, with Bayesian networks serving to visualize the evaluation and derive a likelihood ratio. We will illustrate this approach by demonstrating the construction of a Bayesian network through a case example. We show that these networks are very suitable to model the evaluation of digital evidence in Trojan horse defense cases and that they can be easily adapted for various case circumstances. Based on our findings, we strongly recommend broader exploration of Bayesian networks in digital forensic casework.
{"title":"Evaluating digital forensic findings in Trojan horse defense cases using Bayesian networks","authors":"Marouschka Vink , Ruud Schramp , Bas Kokshoorn , Marjan J. Sjerps","doi":"10.1016/j.fsidi.2025.302023","DOIUrl":"10.1016/j.fsidi.2025.302023","url":null,"abstract":"<div><div>Digital forensic scientists primarily rely on individual internal reasoning and categorical conclusions when evaluating evidence in casework. This can make it difficult to maintain structured reasoning that is logically sound, balanced, robust, and transparent. Trojan horse defense cases exemplify these challenges in evaluating digital forensic findings. The key challenge in such cases is combining multiple observations into a logically sound probabilistic evaluation while maintaining an understandable forensic report for court and other recipients. To address these challenges, we propose using the likelihood ratio framework to evaluate digital findings in Trojan horse defense cases, with Bayesian networks serving to visualize the evaluation and derive a likelihood ratio. We will illustrate this approach by demonstrating the construction of a Bayesian network through a case example. We show that these networks are very suitable to model the evaluation of digital evidence in Trojan horse defense cases and that they can be easily adapted for various case circumstances. Based on our findings, we strongly recommend broader exploration of Bayesian networks in digital forensic casework.</div></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":"55 ","pages":"Article 302023"},"PeriodicalIF":2.2,"publicationDate":"2025-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145473733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Detection methodologies for steganography are a topic of study both within academia and in law enforcement. For the development of detection methods and the validation of their use for law enforcement, a large-scale representative dataset is essential. Current datasets are lacking in terms of representing real-life steganography, as they only include low resolution images, are taken with only a few different cameras, and are validated with only a minimal number of steganography methods. A new large-scale comprehensive image steganography dataset is needed with many typical examples of steganography one could encounter in casework. To that end, we present the REVEAL dataset containing 100.006 images taken with more than 50 different cameras. The set contains a rich variety of images, the attributes of which have a wide distribution. There are for example over 200 different sizes, ranging from 256x256 to 7680x4320. All 100.006 images have then been subjected to many different chains of image preprocessing steps. After the preprocessing, a total of more than 50 different image steganography algorithms were used to hide information in the images. This results in three image sets namely: original, preprocessed, and stego, in total more than 300.000 images. This properly annotated dataset can help to achieve accurate detection using supervised machine-learning based methods. At the same time, this dataset can be used for both forensic evaluation and validation, thus improving the applicability of detection methods. The dataset with full annotations, algorithms, and results is made publicly available.
{"title":"REVEAL: A large-scale comprehensive image dataset for steganalysis","authors":"Meike Kombrink , Stijn van Lierop , Dionne Stolwijk , Marcel Worring , Derk Vrijdag , Zeno Geradts","doi":"10.1016/j.fsidi.2025.302006","DOIUrl":"10.1016/j.fsidi.2025.302006","url":null,"abstract":"<div><div>Detection methodologies for steganography are a topic of study both within academia and in law enforcement. For the development of detection methods and the validation of their use for law enforcement, a large-scale representative dataset is essential. Current datasets are lacking in terms of representing real-life steganography, as they only include low resolution images, are taken with only a few different cameras, and are validated with only a minimal number of steganography methods. A new large-scale comprehensive image steganography dataset is needed with many typical examples of steganography one could encounter in casework. To that end, we present the REVEAL dataset containing 100.006 images taken with more than 50 different cameras. The set contains a rich variety of images, the attributes of which have a wide distribution. There are for example over 200 different sizes, ranging from 256x256 to 7680x4320. All 100.006 images have then been subjected to many different chains of image preprocessing steps. After the preprocessing, a total of more than 50 different image steganography algorithms were used to hide information in the images. This results in three image sets namely: original, preprocessed, and stego, in total more than 300.000 images. This properly annotated dataset can help to achieve accurate detection using supervised machine-learning based methods. At the same time, this dataset can be used for both forensic evaluation and validation, thus improving the applicability of detection methods. The dataset with full annotations, algorithms, and results is made publicly available.</div></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":"55 ","pages":"Article 302006"},"PeriodicalIF":2.2,"publicationDate":"2025-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145424629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Money laundering is a global threat that undermines the integrity of the financial system and the stability of the world economy. This paper proposes an approach based on complex network techniques to support investigating financial transactions of individuals suspected of money laundering. The study includes analyses for anomaly detection, community detection, density analysis, and cycle identification, aiming to capture complex patterns of interaction among accounts. Anomaly detection was based on a Graph Neural Networks model. The results highlight the model’s effectiveness, as indicated by the Silhouette score and Davies-Bouldin index metrics obtained on the test set, which were 0.83 and 1.59, respectively. This suggests that the groups of anomalous and normal accounts are well represented in terms of similarity and dissimilarity. The study also incorporates various financial indicators, such as moving averages over different time windows of transactions. The K-means algorithm was employed to identify patterns in financial transactions and determine the number of clusters. Correspondence Analysis was applied to establish similarities among the transactional profiles of the investigated individuals. The findings are relevant to the investigative process, providing analytical support for monitoring and prioritizing cases and identifying potential transactional patterns and groups of individuals possibly involved in illicit activities, such as drug trafficking, fraud, and scams.
{"title":"Complex networks-based anomaly detection for financial transactions in anti-money laundering","authors":"Rodrigo Marcel Araujo Oliveira , Angelo Marcio Oliveira Sant’Anna , Paulo Henrique Ferreira","doi":"10.1016/j.fsidi.2025.302005","DOIUrl":"10.1016/j.fsidi.2025.302005","url":null,"abstract":"<div><div>Money laundering is a global threat that undermines the integrity of the financial system and the stability of the world economy. This paper proposes an approach based on complex network techniques to support investigating financial transactions of individuals suspected of money laundering. The study includes analyses for anomaly detection, community detection, density analysis, and cycle identification, aiming to capture complex patterns of interaction among accounts. Anomaly detection was based on a Graph Neural Networks model. The results highlight the model’s effectiveness, as indicated by the Silhouette score and Davies-Bouldin index metrics obtained on the test set, which were 0.83 and 1.59, respectively. This suggests that the groups of anomalous and normal accounts are well represented in terms of similarity and dissimilarity. The study also incorporates various financial indicators, such as moving averages over different time windows of transactions. The K-means algorithm was employed to identify patterns in financial transactions and determine the number of clusters. Correspondence Analysis was applied to establish similarities among the transactional profiles of the investigated individuals. The findings are relevant to the investigative process, providing analytical support for monitoring and prioritizing cases and identifying potential transactional patterns and groups of individuals possibly involved in illicit activities, such as drug trafficking, fraud, and scams.</div></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":"55 ","pages":"Article 302005"},"PeriodicalIF":2.2,"publicationDate":"2025-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145267233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-03DOI: 10.1016/j.fsidi.2025.302002
Dirk Pawlaszczyk , Philipp Engler , Ronny Bodach , Christian Hummert , Margaux Michel , Ralf Zimmermann
The generation of datasets is essential for training and validation tasks in digital forensics. Currently, the processes of data generation and provisioning are mainly performed manually. In the field of mobile forensics, there are only a limited number of tools available that aid in populating and injecting data into mobile devices. In this paper, we introduce a novel method for automatic data generation using an AI-driven approach. We present a comprehensive toolchain for dataset creation, focusing on developing a dynamic model (storyboard) with the assistance of large language model (LLM) agents. The generated sequences of activities are then automatically executed on mobile devices. Our proposed approach has been successfully implemented within the data creation and injection framework called AutoPodMobile (APM) as part of a proof-of-concept study. For data generated through AI methods, a validation is presented as well. The paper ends with a brief discussion of the results and the next steps planned.
{"title":"AI-driven dataset creation in mobile forensics using LLM-based storyboards","authors":"Dirk Pawlaszczyk , Philipp Engler , Ronny Bodach , Christian Hummert , Margaux Michel , Ralf Zimmermann","doi":"10.1016/j.fsidi.2025.302002","DOIUrl":"10.1016/j.fsidi.2025.302002","url":null,"abstract":"<div><div>The generation of datasets is essential for training and validation tasks in digital forensics. Currently, the processes of data generation and provisioning are mainly performed manually. In the field of mobile forensics, there are only a limited number of tools available that aid in populating and injecting data into mobile devices. In this paper, we introduce a novel method for automatic data generation using an AI-driven approach. We present a comprehensive toolchain for dataset creation, focusing on developing a dynamic model (storyboard) with the assistance of large language model (LLM) agents. The generated sequences of activities are then automatically executed on mobile devices. Our proposed approach has been successfully implemented within the data creation and injection framework called AutoPodMobile (APM) as part of a proof-of-concept study. For data generated through AI methods, a validation is presented as well. The paper ends with a brief discussion of the results and the next steps planned.</div></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":"55 ","pages":"Article 302002"},"PeriodicalIF":2.2,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145220644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-03DOI: 10.1016/j.fsidi.2025.302004
Lloyd Gonzales , Nancy LaTourrette, Bill Doherty
The forensic community depends on datasets containing disk images, network captures, and other forensic artifacts for education and research. These datasets must be reflective of the artifacts that real-world analysts encounter, which can evolve rapidly as new software is released. Additionally, these datasets must be free of sensitive data that would limit their distribution. To address the issues of relevance and sensitivity, many researchers and educators develop datasets by hand. While this approach is viable, it is time-consuming and rarely produces datasets that are fully reflective of real-world conditions. As a result, there is ongoing research into forensic synthesizers, which simplify the process of creating complex datasets that are free of legal and logistical concerns.
This work introduces the automated kinetic framework (AKF), a modular synthesizer for creating and interacting with virtualized environments to simulate human activity. AKF makes significant improvements to the approaches and implementations of prior synthesizers used to generate forensic artifacts. AKF also improves the process of documenting these datasets by leveraging the CASE standard to provide human- and machine-readable reporting. Finally, AKF offers several options for using these features to build and document datasets, including a custom scripting language. These contributions aim to streamline the development of forensic datasets and ensure the long-term usefulness of AKF-generated datasets and the framework as a whole.
{"title":"AKF: A modern synthesis framework for building datasets in digital forensics","authors":"Lloyd Gonzales , Nancy LaTourrette, Bill Doherty","doi":"10.1016/j.fsidi.2025.302004","DOIUrl":"10.1016/j.fsidi.2025.302004","url":null,"abstract":"<div><div>The forensic community depends on datasets containing disk images, network captures, and other forensic artifacts for education and research. These datasets must be reflective of the artifacts that real-world analysts encounter, which can evolve rapidly as new software is released. Additionally, these datasets must be free of sensitive data that would limit their distribution. To address the issues of relevance and sensitivity, many researchers and educators develop datasets by hand. While this approach is viable, it is time-consuming and rarely produces datasets that are fully reflective of real-world conditions. As a result, there is ongoing research into forensic synthesizers, which simplify the process of creating complex datasets that are free of legal and logistical concerns.</div><div>This work introduces the automated kinetic framework (AKF), a modular synthesizer for creating and interacting with virtualized environments to simulate human activity. AKF makes significant improvements to the approaches and implementations of prior synthesizers used to generate forensic artifacts. AKF also improves the process of documenting these datasets by leveraging the CASE standard to provide human- and machine-readable reporting. Finally, AKF offers several options for using these features to build and document datasets, including a custom scripting language. These contributions aim to streamline the development of forensic datasets and ensure the long-term usefulness of AKF-generated datasets and the framework as a whole.</div></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":"55 ","pages":"Article 302004"},"PeriodicalIF":2.2,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145220647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01DOI: 10.1016/j.fsidi.2025.301983
Maximilian Eichhorn, Felix Freiling
To forensically examine an unknown digital device, a method is proposed that involves to perform experiments on an identical device and systematically derive information from the observed behaviour while performing specific actions. We apply this method to the Thermomix TM6 from Vorwerk, a multifunctional kitchen appliance. Using differential forensic analysis together with our method, we identify various forensic artefacts from real-world use, e.g., timestamps when the system was turned on and logs of specific cooking actions like dough kneading and cooking. We also observe inadequate data sanitization after factory reset. Other forensic artefacts we found include Wi-Fi login details and account information for the Cookidoo online service provided by Vorwerk to exchange recipes.
{"title":"Dial M for Mixer: A methodological approach to forensic analysis of unknown devices using the thermomix TM6","authors":"Maximilian Eichhorn, Felix Freiling","doi":"10.1016/j.fsidi.2025.301983","DOIUrl":"10.1016/j.fsidi.2025.301983","url":null,"abstract":"<div><div>To forensically examine an unknown digital device, a method is proposed that involves to perform experiments on an identical device and systematically derive information from the observed behaviour while performing specific actions. We apply this method to the Thermomix TM6 from Vorwerk, a multifunctional kitchen appliance. Using differential forensic analysis together with our method, we identify various forensic artefacts from real-world use, e.g., timestamps when the system was turned on and logs of specific cooking actions like dough kneading and cooking. We also observe inadequate data sanitization after factory reset. Other forensic artefacts we found include Wi-Fi login details and account information for the Cookidoo online service provided by Vorwerk to exchange recipes.</div></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":"54 ","pages":"Article 301983"},"PeriodicalIF":2.2,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145424401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01DOI: 10.1016/j.fsidi.2025.301989
Mariya Shafat Kirmani
{"title":"Welcome to the Proceedings of the Fifth Annual DFRWS APAC Conference 2025!","authors":"Mariya Shafat Kirmani","doi":"10.1016/j.fsidi.2025.301989","DOIUrl":"10.1016/j.fsidi.2025.301989","url":null,"abstract":"","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":"54 ","pages":"Article 301989"},"PeriodicalIF":2.2,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145424486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01DOI: 10.1016/j.fsidi.2025.301986
Julian Uthoff , Lisa Marie Dreier , Martin Lambertz , Mariia Rybalka , Felix Freiling
Digital stratigraphic methods aim to infer new information about digital objects using their depositional context. Many such methods have been developed, for example, to interpret file allocation traces and thereby estimate timestamps of file fragments based on their position on disk. Such methods are difficult to compare. We therefore present a corpus of NTFS file system images that can be used to evaluate these methods. The corpus comprises different categories, each extensively employing a small subset of file system operations to display their effect on file allocation traces. We demonstrate the usefulness of this corpus by evaluating the method of Bahjat and Jones (2019) that derives the timestamp of a file fragment from the timestamps of neighboring files. The corpus was generated using a revised version of fsstratify, a software framework to simulate file system usage. The tool is able to log the position of content data during file creation, greatly facilitating research in the realm of digital stratigraphy.
{"title":"Creating a standardized corpus for digital stratigraphic methods with fsstratify","authors":"Julian Uthoff , Lisa Marie Dreier , Martin Lambertz , Mariia Rybalka , Felix Freiling","doi":"10.1016/j.fsidi.2025.301986","DOIUrl":"10.1016/j.fsidi.2025.301986","url":null,"abstract":"<div><div>Digital stratigraphic methods aim to infer new information about digital objects using their depositional context. Many such methods have been developed, for example, to interpret file allocation traces and thereby estimate timestamps of file fragments based on their position on disk. Such methods are difficult to compare. We therefore present a corpus of NTFS file system images that can be used to evaluate these methods. The corpus comprises different categories, each extensively employing a small subset of file system operations to display their effect on file allocation traces. We demonstrate the usefulness of this corpus by evaluating the method of Bahjat and Jones (2019) that derives the timestamp of a file fragment from the timestamps of neighboring files. The corpus was generated using a revised version of <span>fsstratify</span>, a software framework to simulate file system usage. The tool is able to log the position of content data during file creation, greatly facilitating research in the realm of digital stratigraphy.</div></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":"54 ","pages":"Article 301986"},"PeriodicalIF":2.2,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145424404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01DOI: 10.1016/j.fsidi.2025.301985
Angelina A. Claij-Swart , Erik Oudsen , Bouke Timbermont , Christopher Hargreaves , Lena L. Voigt
Mobile applications are subject to frequent updates, which poses a challenge for validating digital forensic tools. This paper presents an approach to automate the generation of reference data on an ongoing basis, and how this can be integrated into the overall validation process of a digital forensic analysis platform. Specifically, it describes the architecture of the mobile data synthesis framework Puma, shares its capabilities via an open-source project, and shows how it can be used in a tool testing workflow triggered by application updates. The value of this approach is demonstrated with three example use cases, documenting the use of the approach over six months and reporting insights and experiences gained from this integration. Finally, this work highlights additional contributions the proposed approach and tooling could make to the digital forensics community.
{"title":"Automatically generating digital forensic reference data triggered by mobile application updates","authors":"Angelina A. Claij-Swart , Erik Oudsen , Bouke Timbermont , Christopher Hargreaves , Lena L. Voigt","doi":"10.1016/j.fsidi.2025.301985","DOIUrl":"10.1016/j.fsidi.2025.301985","url":null,"abstract":"<div><div>Mobile applications are subject to frequent updates, which poses a challenge for validating digital forensic tools. This paper presents an approach to automate the generation of reference data on an ongoing basis, and how this can be integrated into the overall validation process of a digital forensic analysis platform. Specifically, it describes the architecture of the mobile data synthesis framework Puma, shares its capabilities via an open-source project, and shows how it can be used in a tool testing workflow triggered by application updates. The value of this approach is demonstrated with three example use cases, documenting the use of the approach over six months and reporting insights and experiences gained from this integration. Finally, this work highlights additional contributions the proposed approach and tooling could make to the digital forensics community.</div></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":"54 ","pages":"Article 301985"},"PeriodicalIF":2.2,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145424403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}