Pub Date : 2025-10-01Epub Date: 2025-11-03DOI: 10.1016/j.fsidi.2025.301985
Angelina A. Claij-Swart , Erik Oudsen , Bouke Timbermont , Christopher Hargreaves , Lena L. Voigt
Mobile applications are subject to frequent updates, which poses a challenge for validating digital forensic tools. This paper presents an approach to automate the generation of reference data on an ongoing basis, and how this can be integrated into the overall validation process of a digital forensic analysis platform. Specifically, it describes the architecture of the mobile data synthesis framework Puma, shares its capabilities via an open-source project, and shows how it can be used in a tool testing workflow triggered by application updates. The value of this approach is demonstrated with three example use cases, documenting the use of the approach over six months and reporting insights and experiences gained from this integration. Finally, this work highlights additional contributions the proposed approach and tooling could make to the digital forensics community.
{"title":"Automatically generating digital forensic reference data triggered by mobile application updates","authors":"Angelina A. Claij-Swart , Erik Oudsen , Bouke Timbermont , Christopher Hargreaves , Lena L. Voigt","doi":"10.1016/j.fsidi.2025.301985","DOIUrl":"10.1016/j.fsidi.2025.301985","url":null,"abstract":"<div><div>Mobile applications are subject to frequent updates, which poses a challenge for validating digital forensic tools. This paper presents an approach to automate the generation of reference data on an ongoing basis, and how this can be integrated into the overall validation process of a digital forensic analysis platform. Specifically, it describes the architecture of the mobile data synthesis framework Puma, shares its capabilities via an open-source project, and shows how it can be used in a tool testing workflow triggered by application updates. The value of this approach is demonstrated with three example use cases, documenting the use of the approach over six months and reporting insights and experiences gained from this integration. Finally, this work highlights additional contributions the proposed approach and tooling could make to the digital forensics community.</div></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":"54 ","pages":"Article 301985"},"PeriodicalIF":2.2,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145424403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-11-03DOI: 10.1016/j.fsidi.2025.301977
Samantha Klier, Harald Baier
Hash functions play a crucial role in digital forensics to mitigate data overload. In addition to traditional cryptographic hash functions, similarity hashes - also known as approximate matching schemes - have emerged as effective tools for identifying media files with similar content. However, despite their relevance in investigative settings, a fast and practical method for identifying files originating from similar sources is still lacking. For example, in Child Sexual Abuse Material (CSAM) investigations, it is critical to distinguish between downloaded and potentially self-produced material. To address this gap, we introduce a Media Source Similarity Hash (MSSH), using JPEG images as a case study. MSSH leverages structural features of media files, converting them efficiently into Similarity Digests using n-gram representations. As such, MSSH constitutes the first syntactic approximate matching scheme. We evaluate the MSSH using our publicly available source code across seven datasets. The method achieves AUC scores exceeding 0.90 for native images — across device-, model-, and brand-level classifications, though the strong devicelevel performance likely reflects limitations in existing datasets rather than generalizable capability — and over 0.85 for samples obtained from social media platforms. Despite its lightweight design, MSSH delivers a performance comparable to that of resourceintensive, established Source Camera Identification (SCI) approaches, and surpasses them on a modern dataset, achieving an AUC of 0.97 compared to their AUCs, which range from 0.74 to 0.94. These results underscore MSSH's effectiveness for media source analysis in digital forensics, while preserving the speed and utility advantages typical of hash-based methods.
{"title":"Media source similarity hashing (MSSH): A practical method for large-scale media investigations","authors":"Samantha Klier, Harald Baier","doi":"10.1016/j.fsidi.2025.301977","DOIUrl":"10.1016/j.fsidi.2025.301977","url":null,"abstract":"<div><div>Hash functions play a crucial role in digital forensics to mitigate data overload. In addition to traditional cryptographic hash functions, similarity hashes - also known as approximate matching schemes - have emerged as effective tools for identifying media files with similar content. However, despite their relevance in investigative settings, a fast and practical method for identifying files originating from similar sources is still lacking. For example, in Child Sexual Abuse Material (CSAM) investigations, it is critical to distinguish between downloaded and potentially self-produced material. To address this gap, we introduce a Media Source Similarity Hash (MSSH), using JPEG images as a case study. MSSH leverages structural features of media files, converting them efficiently into Similarity Digests using n-gram representations. As such, MSSH constitutes the first syntactic approximate matching scheme. We evaluate the MSSH using our publicly available source code across seven datasets. The method achieves AUC scores exceeding 0.90 for native images — across device-, model-, and brand-level classifications, though the strong devicelevel performance likely reflects limitations in existing datasets rather than generalizable capability — and over 0.85 for samples obtained from social media platforms. Despite its lightweight design, MSSH delivers a performance comparable to that of resourceintensive, established Source Camera Identification (SCI) approaches, and surpasses them on a modern dataset, achieving an AUC of 0.97 compared to their AUCs, which range from 0.74 to 0.94. These results underscore MSSH's effectiveness for media source analysis in digital forensics, while preserving the speed and utility advantages typical of hash-based methods.</div></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":"54 ","pages":"Article 301977"},"PeriodicalIF":2.2,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145424489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-11-03DOI: 10.1016/j.fsidi.2025.301982
Hudan Studiawan , Frank Breitinger , Mark Scanlon
Large language models (LLMs) have widespread adoption in many domains, including digital forensics. While prior research has largely centered on case studies and examples demonstrating how LLMs can assist forensic investigations, deeper explorations remain limited, i.e., a standardized approach for precise performance evaluations is lacking. Inspired by the NIST Computer Forensic Tool Testing Program, this paper proposes a standardized methodology to quantitatively evaluate the application of LLMs for digital forensic tasks, specifically in timeline analysis. The paper describes the components of the methodology, including the dataset, timeline generation, and ground truth development. In addition, the paper recommends the use of BLEU and ROUGE metrics for the quantitative evaluation of LLMs through case studies or tasks involving timeline analysis. Experimental results using ChatGPT demonstrate that the proposed methodology can effectively evaluate LLM-based forensic timeline analysis. Finally, we discuss the limitations of applying LLMs to forensic timeline analysis.
{"title":"Towards a standardized methodology and dataset for evaluating LLM-based digital forensic timeline analysis","authors":"Hudan Studiawan , Frank Breitinger , Mark Scanlon","doi":"10.1016/j.fsidi.2025.301982","DOIUrl":"10.1016/j.fsidi.2025.301982","url":null,"abstract":"<div><div>Large language models (LLMs) have widespread adoption in many domains, including digital forensics. While prior research has largely centered on case studies and examples demonstrating how LLMs can assist forensic investigations, deeper explorations remain limited, i.e., a standardized approach for precise performance evaluations is lacking. Inspired by the NIST Computer Forensic Tool Testing Program, this paper proposes a standardized methodology to quantitatively evaluate the application of LLMs for digital forensic tasks, specifically in timeline analysis. The paper describes the components of the methodology, including the dataset, timeline generation, and ground truth development. In addition, the paper recommends the use of BLEU and ROUGE metrics for the quantitative evaluation of LLMs through case studies or tasks involving timeline analysis. Experimental results using ChatGPT demonstrate that the proposed methodology can effectively evaluate LLM-based forensic timeline analysis. Finally, we discuss the limitations of applying LLMs to forensic timeline analysis.</div></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":"54 ","pages":"Article 301982"},"PeriodicalIF":2.2,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145424400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-11-03DOI: 10.1016/j.fsidi.2025.301980
Sunbum Song , Hongseok Yang , Eunji Lee , Sangeun Lee , Gibum Kim
To acquire data stored on damaged devices, forensic analysts have conventionally removed the flash memory from the device and directly extracted the data from it. This process, often called ‘chip-off’ technique, has faced difficulties in application as data encryption technologies are being widely adopted. Except for rare instances where highly advanced chip transplantation is necessary, analysts generally attempt to repair the damaged modules as much as possible. When critical modules in an iPhone are damaged, the device experiences a phenomenon known as panic-full, in which the device repeatedly reboots, preventing analysts from acquiring data within. This research reviews the previously disclosed causes and analysis methods of panic-full through experiments. Furthermore, for cases where module replacement does not resolve the panic-full status, this paper provides diagnosis methods to detect damages to logic boards and as well as jumper point information. Lastly, based on above findings, an improved physical recovery process for iPhones in panic-full status is suggested. This study has been conducted on limited models of iPhone models, yet with Apple's unified hardware ecosystem, the findings and methodologies suggested in this paper can be easily extended to other models.
{"title":"A study on the recovery of damaged iPhone hardware exhibiting panic full phenomena","authors":"Sunbum Song , Hongseok Yang , Eunji Lee , Sangeun Lee , Gibum Kim","doi":"10.1016/j.fsidi.2025.301980","DOIUrl":"10.1016/j.fsidi.2025.301980","url":null,"abstract":"<div><div>To acquire data stored on damaged devices, forensic analysts have conventionally removed the flash memory from the device and directly extracted the data from it. This process, often called ‘chip-off’ technique, has faced difficulties in application as data encryption technologies are being widely adopted. Except for rare instances where highly advanced chip transplantation is necessary, analysts generally attempt to repair the damaged modules as much as possible. When critical modules in an iPhone are damaged, the device experiences a phenomenon known as panic-full, in which the device repeatedly reboots, preventing analysts from acquiring data within. This research reviews the previously disclosed causes and analysis methods of panic-full through experiments. Furthermore, for cases where module replacement does not resolve the panic-full status, this paper provides diagnosis methods to detect damages to logic boards and as well as jumper point information. Lastly, based on above findings, an improved physical recovery process for iPhones in panic-full status is suggested. This study has been conducted on limited models of iPhone models, yet with Apple's unified hardware ecosystem, the findings and methodologies suggested in this paper can be easily extended to other models.</div></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":"54 ","pages":"Article 301980"},"PeriodicalIF":2.2,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145424497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-11-03DOI: 10.1016/j.fsidi.2025.301979
Kyungsuk Cho, Kyuyeon Choi, Yunji Park, Minsoo Kim, Seoyoung Kim, Doowon Jeong
Intimate partner violence (IPV), involving abuse by current or former partners, is a growing global concern. Victims often face serious barriers not only in escaping abusive situations but also in securely collecting and preserving evidence, due to the proximity and control exerted by perpetrators. Storing photos, videos, or audio recordings directly on personal devices increases the risk of discovery—especially when abusers have access to the victim's digital environment. While several support services for IPV survivors have been developed, many remain unsuitable for use in high-risk or surveillance-heavy situations. In this study, we propose the Digital Evidence Framework for IPV (DEF-IPV), a technological solution that enables victims to collect and store digital evidence even under surveillance by their abuser. To identify the essential requirements, we conducted expert interviews with IPV support professionals. Based on these insights, DEF-IPV was designed to combine a camouflaged application with steganographic techniques, ensuring that both the evidence and the act of evidence collection remain undetectable. A detailed process model was constructed, and a proof-of-concept prototype was implemented to validate its technical feasibility. This work lays the foundation for future research on real-time and survivor-centered support in high-risk environments.
{"title":"DEF-IPV:A digital evidence framework for intimate partner violence victims","authors":"Kyungsuk Cho, Kyuyeon Choi, Yunji Park, Minsoo Kim, Seoyoung Kim, Doowon Jeong","doi":"10.1016/j.fsidi.2025.301979","DOIUrl":"10.1016/j.fsidi.2025.301979","url":null,"abstract":"<div><div>Intimate partner violence (IPV), involving abuse by current or former partners, is a growing global concern. Victims often face serious barriers not only in escaping abusive situations but also in securely collecting and preserving evidence, due to the proximity and control exerted by perpetrators. Storing photos, videos, or audio recordings directly on personal devices increases the risk of discovery—especially when abusers have access to the victim's digital environment. While several support services for <span>IPV</span> survivors have been developed, many remain unsuitable for use in high-risk or surveillance-heavy situations. In this study, we propose the Digital Evidence Framework for IPV (DEF-IPV), a technological solution that enables victims to collect and store digital evidence even under surveillance by their abuser. To identify the essential requirements, we conducted expert interviews with <span>IPV</span> support professionals. Based on these insights, DEF-IPV was designed to combine a camouflaged application with steganographic techniques, ensuring that both the evidence and the act of evidence collection remain undetectable. A detailed process model was constructed, and a proof-of-concept prototype was implemented to validate its technical feasibility. This work lays the foundation for future research on real-time and survivor-centered support in high-risk environments.</div></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":"54 ","pages":"Article 301979"},"PeriodicalIF":2.2,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145424491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-11-03DOI: 10.1016/j.fsidi.2025.301984
Anton Schwietert, Jan-Niclas Hilgert
File systems are a fundamental component of virtually all modern computing devices. While their primary purpose is to manage and organize data on persistent storage, they also offer a range of opportunities for concealing information in unintended ways—a practice commonly referred to as data hiding. Given the challenges these techniques pose to forensic analysis, it becomes essential to understand where and how hidden data may reside within file system structures. In response, this paper systematically examines the current state of research on data hiding techniques in file systems, consolidating known methods across widely used file systems including NTFS, ext, and FAT. Building on this comprehensive survey, we explore how existing methods can be adapted or extended and identify previously unexamined data hiding opportunities, particularly in underexplored file systems. Furthermore, we propose and discuss novel data hiding techniques leveraging unique properties of contemporary file systems such as the misuse of snapshots. To support future research and evaluation, we apply a range of data hiding techniques across multiple file systems and present the first publicly available, scenario-based dataset dedicated to file system data hiding. As no comparable dataset currently exists, this contribution addresses a critical gap by supporting systematic evaluation and encouraging the development of effective detection methods.
{"title":"Data hiding in file systems: Current state, novel methods, and a standardized corpus","authors":"Anton Schwietert, Jan-Niclas Hilgert","doi":"10.1016/j.fsidi.2025.301984","DOIUrl":"10.1016/j.fsidi.2025.301984","url":null,"abstract":"<div><div>File systems are a fundamental component of virtually all modern computing devices. While their primary purpose is to manage and organize data on persistent storage, they also offer a range of opportunities for concealing information in unintended ways—a practice commonly referred to as data hiding. Given the challenges these techniques pose to forensic analysis, it becomes essential to understand where and how hidden data may reside within file system structures. In response, this paper systematically examines the current state of research on data hiding techniques in file systems, consolidating known methods across widely used file systems including NTFS, ext, and FAT. Building on this comprehensive survey, we explore how existing methods can be adapted or extended and identify previously unexamined data hiding opportunities, particularly in underexplored file systems. Furthermore, we propose and discuss novel data hiding techniques leveraging unique properties of contemporary file systems such as the misuse of snapshots. To support future research and evaluation, we apply a range of data hiding techniques across multiple file systems and present the first publicly available, scenario-based dataset dedicated to file system data hiding. As no comparable dataset currently exists, this contribution addresses a critical gap by supporting systematic evaluation and encouraging the development of effective detection methods.</div></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":"54 ","pages":"Article 301984"},"PeriodicalIF":2.2,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145424402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-11-03DOI: 10.1016/j.fsidi.2025.301981
Jeongin Lee , Chaejin Lim , Beomjin Jin , Moohong Min , Hyoungshick Kim
Communication data, such as instant messenger exchanges, SMS records, and emails, plays a critical role in digital forensic investigations by revealing criminal intent, interpersonal dynamics, and the temporal structure of events. However, existing AI-based forensic tools frequently hallucinate unverifiable content, obscure their reasoning paths, and ultimately fail to meet the traceability and legal admissibility standards required in criminal investigations. To overcome these challenges, we propose df-graph, a graph-based retrieval-augmented generation (Graph-RAG) framework designed for forensic question answering over communication data. df-graph constructs structured knowledge graphs from message logs, retrieves query-relevant subgraphs based on semantic and structural cues, and generates answers guided by forensic-specific prompts. It further enhances legal transparency through rule-based reasoning traces and citation of message-level evidence. We comprehensively evaluate df-graph across real-world, public, and synthetic datasets, including a narrative dataset adapted from Crime and Punishment. Our evaluation compares four approaches: (1) a direct generation approach using only a language model without retrieval; (2) a BERT embedding-based selective retrieval approach that identifies relevant messages before generation; (3) a conventional text-based retrieval approach; and (4) our proposed graph-based retrieval approach (df-graph). Empirical results show that df-graph consistently outperforms all baseline approaches in exact match accuracy (57.23 %), semantic similarity (BERTScore F1: 0.8597), and contextual faithfulness. A user study with eight forensic experts confirms that df-graph delivers more explainable, accurate, and legally defensible outputs, making it a practical solution for AI-assisted forensic investigations.
{"title":"DF-graph: Structured and explainable analysis of communication data for digital forensics","authors":"Jeongin Lee , Chaejin Lim , Beomjin Jin , Moohong Min , Hyoungshick Kim","doi":"10.1016/j.fsidi.2025.301981","DOIUrl":"10.1016/j.fsidi.2025.301981","url":null,"abstract":"<div><div>Communication data, such as instant messenger exchanges, SMS records, and emails, plays a critical role in digital forensic investigations by revealing criminal intent, interpersonal dynamics, and the temporal structure of events. However, existing AI-based forensic tools frequently hallucinate unverifiable content, obscure their reasoning paths, and ultimately fail to meet the traceability and legal admissibility standards required in criminal investigations. To overcome these challenges, we propose <span>df-graph</span>, a graph-based retrieval-augmented generation (Graph-RAG) framework designed for forensic question answering over communication data. <span>df-graph</span> constructs structured knowledge graphs from message logs, retrieves query-relevant subgraphs based on semantic and structural cues, and generates answers guided by forensic-specific prompts. It further enhances legal transparency through rule-based reasoning traces and citation of message-level evidence. We comprehensively evaluate <span>df-graph</span> across real-world, public, and synthetic datasets, including a narrative dataset adapted from <em>Crime and Punishment</em>. Our evaluation compares four approaches: (1) a direct generation approach using only a language model without retrieval; (2) a BERT embedding-based selective retrieval approach that identifies relevant messages before generation; (3) a conventional text-based retrieval approach; and (4) our proposed graph-based retrieval approach (<span>df-graph</span>). Empirical results show that <span>df-graph</span> consistently outperforms all baseline approaches in exact match accuracy (57.23 %), semantic similarity (BERTScore F1: 0.8597), and contextual faithfulness. A user study with eight forensic experts confirms that <span>df-graph</span> delivers more explainable, accurate, and legally defensible outputs, making it a practical solution for AI-assisted forensic investigations.</div></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":"54 ","pages":"Article 301981"},"PeriodicalIF":2.2,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145424498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-11-03DOI: 10.1016/j.fsidi.2025.301975
Daniel Baier, Martin Lambertz
Extracting TLS key material remains a critical challenge in live memory forensics, particularly for forensic investigators and law enforcement seeking to decrypt network traffic for investigative purposes. Existing methods focus on TLS 1.2 and rely on manual processes limited to specific implementations, leaving gaps in scalability and support for TLS 1.3.
This research introduces a novel approach that automates key aspects of identifying and extracting TLS key material across all major TLS implementations. Our approach leverages unique strings defined by TLS standards to identify key derivation functions, eliminating the need for manual identification and ensuring adaptability to evolving libraries.
We validate our methodology using a ground truth dataset of major TLS libraries and real-world applications, dynamically intercepting the identified functions to extract session keys. While initially implemented on Linux, the underlying concept of our approach is platform-agnostic and broadly applicable.
This work bridges a critical gap in live memory forensics by introducing a scalable framework that automatically locates TLS key derivation functions and uses this information in library-specific hooks, enabling efficient decryption of secure communications. These findings offer significant advancements for forensic practitioners, law enforcement, and cybersecurity professionals.
{"title":"All your TLS keys are belong to Us: A novel approach to live memory forensic key extraction","authors":"Daniel Baier, Martin Lambertz","doi":"10.1016/j.fsidi.2025.301975","DOIUrl":"10.1016/j.fsidi.2025.301975","url":null,"abstract":"<div><div>Extracting TLS key material remains a critical challenge in live memory forensics, particularly for forensic investigators and law enforcement seeking to decrypt network traffic for investigative purposes. Existing methods focus on TLS 1.2 and rely on manual processes limited to specific implementations, leaving gaps in scalability and support for TLS 1.3.</div><div>This research introduces a novel approach that automates key aspects of identifying and extracting TLS key material across all major TLS implementations. Our approach leverages unique strings defined by TLS standards to identify key derivation functions, eliminating the need for manual identification and ensuring adaptability to evolving libraries.</div><div>We validate our methodology using a ground truth dataset of major TLS libraries and real-world applications, dynamically intercepting the identified functions to extract session keys. While initially implemented on Linux, the underlying concept of our approach is platform-agnostic and broadly applicable.</div><div>This work bridges a critical gap in live memory forensics by introducing a scalable framework that automatically locates TLS key derivation functions and uses this information in library-specific hooks, enabling efficient decryption of secure communications. These findings offer significant advancements for forensic practitioners, law enforcement, and cybersecurity professionals.</div></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":"54 ","pages":"Article 301975"},"PeriodicalIF":2.2,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145424487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-11-03DOI: 10.1016/j.fsidi.2025.301987
Sungjo Jeong, Sangjin Lee, Jungheum Park
A wide variety of applications have been developed to simplify the use of Large Language Models (LLMs), raising the importance of systematically analyzing their forensic artifacts. This study proposes a structured framework for LLM application environments, categorizing applications into backend runtime, client interface, and integrated platform components. Through experimental analysis of representative applications, we identify and classify artifacts such as chat records, uploaded fils, generated files, and model setup histories. These artifacts provide valuable insight into user behavior and intent. For instance, LLM-generated files can serve as direct evidence in criminal investigations, particularly in cases involving the creation or distribution of illicit media, such as CSAM. The structured environment model further enables investigators to anticipate artifacts even in applications not directly analyzed. This study lays a foundational methodology for LLM application forensics, offering practical guidance for forensic investigations. To support practical adoption and reproducibility, we also release LangurTrace, an open-source tool that automates the collection and analysis of these artifacts.
{"title":"LangurTrace: Forensic analysis of local LLM applications","authors":"Sungjo Jeong, Sangjin Lee, Jungheum Park","doi":"10.1016/j.fsidi.2025.301987","DOIUrl":"10.1016/j.fsidi.2025.301987","url":null,"abstract":"<div><div>A wide variety of applications have been developed to simplify the use of Large Language Models (LLMs), raising the importance of systematically analyzing their forensic artifacts. This study proposes a structured framework for LLM application environments, categorizing applications into backend runtime, client interface, and integrated platform components. Through experimental analysis of representative applications, we identify and classify artifacts such as chat records, uploaded fils, generated files, and model setup histories. These artifacts provide valuable insight into user behavior and intent. For instance, LLM-generated files can serve as direct evidence in criminal investigations, particularly in cases involving the creation or distribution of illicit media, such as CSAM. The structured environment model further enables investigators to anticipate artifacts even in applications not directly analyzed. This study lays a foundational methodology for LLM application forensics, offering practical guidance for forensic investigations. To support practical adoption and reproducibility, we also release <span>LangurTrace</span>, an open-source tool that automates the collection and analysis of these artifacts.</div></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":"54 ","pages":"Article 301987"},"PeriodicalIF":2.2,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145424405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-11-03DOI: 10.1016/j.fsidi.2025.301978
Julian Geus, Jan Gruber, Jonas Wozar, Felix Freiling
Mobile phone data is crucial for gathering investigative leads and solving cases in most criminal investigations. An increasingly common method for collecting mobile data as evidence is acquiring phone backups stored in manufacturer cloud services. However, the reliability of this evidence source compared to the original device has yet to be thoroughly assessed. In this work, we investigate the accuracy and completeness of iOS backups stored in iCloud. We propose a novel evaluation methodology based on dynamic binary instrumentation, enabling precise tracking of backup contents during the restore process. Using this approach, we compare a full file system extraction and a local backup of an iOS device to a backup downloaded from iCloud and restored on a test device. Our analysis reveals significant discrepancies in timestamp information and minor differences in user data—both critical considerations when analyzing iOS backups in criminal investigations.
{"title":"From sync to seizure: A binary instrumentation-based evaluation of the iCloud backup process","authors":"Julian Geus, Jan Gruber, Jonas Wozar, Felix Freiling","doi":"10.1016/j.fsidi.2025.301978","DOIUrl":"10.1016/j.fsidi.2025.301978","url":null,"abstract":"<div><div>Mobile phone data is crucial for gathering investigative leads and solving cases in most criminal investigations. An increasingly common method for collecting mobile data as evidence is acquiring phone backups stored in manufacturer cloud services. However, the reliability of this evidence source compared to the original device has yet to be thoroughly assessed. In this work, we investigate the accuracy and completeness of iOS backups stored in iCloud. We propose a novel evaluation methodology based on dynamic binary instrumentation, enabling precise tracking of backup contents during the restore process. Using this approach, we compare a full file system extraction and a local backup of an iOS device to a backup downloaded from iCloud and restored on a test device. Our analysis reveals significant discrepancies in timestamp information and minor differences in user data—both critical considerations when analyzing iOS backups in criminal investigations.</div></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":"54 ","pages":"Article 301978"},"PeriodicalIF":2.2,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145424490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}