Pub Date : 2025-07-01Epub Date: 2025-08-01DOI: 10.1016/j.fsidi.2025.301922
Jordi Gonzalez
Malware analysts use Trend Micro Locality-Sensitive Hashing (TLSH) for malware similarity computation, nearest-neighbor search, and related tasks like clustering and family classification. Although TLSH scales better than many alternatives, technical limitations have limited its application to larger datasets. Using the Lean 4 proof assistant, I formalized bounds on the properties of TLSH most relevant to its scalability and identified flaws in prior TLSH nearest-neighbor search algorithms. I leveraged these formal results to design correct acceleration structures for TLSH nearest-neighbor queries. On typical analyst workloads, these structures performed one to two orders of magnitude faster than the prior state-of-the-art, allowing analysts to use datasets at least an order of magnitude larger than what was previously feasible with the same computational resources. I make all code and data publicly available.
{"title":"If at first you don't succeed, trie, trie again: Correcting TLSH scalability claims for large-dataset malware forensics","authors":"Jordi Gonzalez","doi":"10.1016/j.fsidi.2025.301922","DOIUrl":"10.1016/j.fsidi.2025.301922","url":null,"abstract":"<div><div>Malware analysts use Trend Micro Locality-Sensitive Hashing (TLSH) for malware similarity computation, nearest-neighbor search, and related tasks like clustering and family classification. Although TLSH scales better than many alternatives, technical limitations have limited its application to larger datasets. Using the Lean 4 proof assistant, I formalized bounds on the properties of TLSH most relevant to its scalability and identified flaws in prior TLSH nearest-neighbor search algorithms. I leveraged these formal results to design correct acceleration structures for TLSH nearest-neighbor queries. On typical analyst workloads, these structures performed one to two orders of magnitude faster than the prior state-of-the-art, allowing analysts to use datasets at least an order of magnitude larger than what was previously feasible with the same computational resources. I make all code and data publicly available.</div></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":"53 ","pages":"Article 301922"},"PeriodicalIF":2.2,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144749083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-01Epub Date: 2025-08-01DOI: 10.1016/j.fsidi.2025.301923
Philgeun Jin , Namjun Kim , Doowon Jeong
Digital forensic investigations in Windows orchestration environments face critical challenges, including the ephemeral nature of containers, dynamic scaling, and limited visibility into low-level system events. Traditional event log-based approaches often fail to capture essential kernel-level artifacts such as process creation, file I/O, and registry modifications. To overcome these limitations, this paper introduces a novel DFIR framework that leverages eBPF to enable real-time kernel-level monitoring in containerized environments. Building on Microsoft's Windows eBPF project, we developed custom eBPF extensions tailored for DFIR. Aligned with NIST SP 800-61 guidelines, the proposed framework integrates unified workflows for preparation, detection, containment, and recovery through a centralized management console. Through case studies of cryptocurrency mining, ransomware, and blue screen of death attacks, we demonstrate our framework's ability to identify malicious processes that traditional event log-based methods might miss, while confirming minimal system overhead and high compatibility with existing orchestration platforms.
{"title":"Enhancing DFIR in orchestration Environments: Real-time forensic framework with eBPF for windows","authors":"Philgeun Jin , Namjun Kim , Doowon Jeong","doi":"10.1016/j.fsidi.2025.301923","DOIUrl":"10.1016/j.fsidi.2025.301923","url":null,"abstract":"<div><div>Digital forensic investigations in Windows orchestration environments face critical challenges, including the ephemeral nature of containers, dynamic scaling, and limited visibility into low-level system events. Traditional event log-based approaches often fail to capture essential kernel-level artifacts such as process creation, file I/O, and registry modifications. To overcome these limitations, this paper introduces a novel DFIR framework that leverages eBPF to enable real-time kernel-level monitoring in containerized environments. Building on Microsoft's Windows eBPF project, we developed custom eBPF extensions tailored for DFIR. Aligned with NIST SP 800-61 guidelines, the proposed framework integrates unified workflows for preparation, detection, containment, and recovery through a centralized management console. Through case studies of cryptocurrency mining, ransomware, and blue screen of death attacks, we demonstrate our framework's ability to identify malicious processes that traditional event log-based methods might miss, while confirming minimal system overhead and high compatibility with existing orchestration platforms.</div></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":"53 ","pages":"Article 301923"},"PeriodicalIF":2.2,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144749086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-01Epub Date: 2025-08-01DOI: 10.1016/j.fsidi.2025.301931
Jieon Kim, Byeongchan Jeong, Seungeun Park, Sangjin Lee, Jungheum Park
The integration of physical and online activities in today's hyper-connected world has blurred previously distinct boundaries. Online actions such as reservations, payments, and logins generate application-to-person (A2P) messages, which serve as valuable datasets for tracking user behavior. Although A2P messages from different service providers may vary in structure, the information within each message can be systematically normalized based on user behavior and service characteristics. However, traditional forensic tools have been unable to effectively identify and extract such forensically valuable information from these A2P messages. In this study, we leverage large language models (LLMs) combined with prompt engineering to analyze A2P messages from multiple service providers, addressing the limitations of existing forensic tools in extracting meaningful insights from unstructured or semi-structured text stored in messages and emails. The proposed methodology employs A2P messages to elaborately reconstruct user activity, enabling digital forensic investigations to identify case-relevant information with enhanced efficiency and accuracy.
{"title":"Your forensic AI-assistant, SERENA: Systematic extraction and reconstruction for enhanced A2P message forensics","authors":"Jieon Kim, Byeongchan Jeong, Seungeun Park, Sangjin Lee, Jungheum Park","doi":"10.1016/j.fsidi.2025.301931","DOIUrl":"10.1016/j.fsidi.2025.301931","url":null,"abstract":"<div><div>The integration of physical and online activities in today's hyper-connected world has blurred previously distinct boundaries. Online actions such as reservations, payments, and logins generate application-to-person (A2P) messages, which serve as valuable datasets for tracking user behavior. Although A2P messages from different service providers may vary in structure, the information within each message can be systematically normalized based on user behavior and service characteristics. However, traditional forensic tools have been unable to effectively identify and extract such forensically valuable information from these A2P messages. In this study, we leverage large language models (LLMs) combined with prompt engineering to analyze A2P messages from multiple service providers, addressing the limitations of existing forensic tools in extracting meaningful insights from unstructured or semi-structured text stored in messages and emails. The proposed methodology employs A2P messages to elaborately reconstruct user activity, enabling digital forensic investigations to identify case-relevant information with enhanced efficiency and accuracy.</div></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":"53 ","pages":"Article 301931"},"PeriodicalIF":2.2,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144748994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-01Epub Date: 2025-08-01DOI: 10.1016/j.fsidi.2025.301935
Yanan Gong, Kam Pui Chow, Siu Ming Yiu
Cryptocurrency-related crimes are on the rise and have a wide-ranging impact across various areas. To effectively combat and prevent such crimes, cryptocurrency forensics, which relies on blockchain analysis, is essential. Despite advancements in Bitcoin de-anonymization techniques, several challenges persist. The absence of authentic data labels introduces uncertainty in de-anonymization results, especially in the context of address clustering. This issue is further compounded by the development of privacy-enhancing technologies that obscure address linkages, thus undermining the reliability of outcomes as forensic evidence. To address these limitations, this study focuses on Bitcoin blockchain analysis and the improvement of address clustering. Specifically, the work presents an enhanced simulation model designed to accurately simulate real Bitcoin transactions, offering a stable platform for evaluating address clustering algorithms that utilize transaction details, thereby facilitating the assessment of the admissibility of clustering results. Meanwhile, we introduce a new heuristic algorithm aimed at identifying one-time change addresses, with experimental results demonstrating that it achieves more precise clustering outcomes than existing heuristic methods. Furthermore, our blockchain analysis reveals overarching patterns and recent changes in the Bitcoin blockchain, particularly following the introduction of the BRC-20 token.
{"title":"Improved Bitcoin simulation model and address heuristic method","authors":"Yanan Gong, Kam Pui Chow, Siu Ming Yiu","doi":"10.1016/j.fsidi.2025.301935","DOIUrl":"10.1016/j.fsidi.2025.301935","url":null,"abstract":"<div><div>Cryptocurrency-related crimes are on the rise and have a wide-ranging impact across various areas. To effectively combat and prevent such crimes, cryptocurrency forensics, which relies on blockchain analysis, is essential. Despite advancements in Bitcoin de-anonymization techniques, several challenges persist. The absence of authentic data labels introduces uncertainty in de-anonymization results, especially in the context of address clustering. This issue is further compounded by the development of privacy-enhancing technologies that obscure address linkages, thus undermining the reliability of outcomes as forensic evidence. To address these limitations, this study focuses on Bitcoin blockchain analysis and the improvement of address clustering. Specifically, the work presents an enhanced simulation model designed to accurately simulate real Bitcoin transactions, offering a stable platform for evaluating address clustering algorithms that utilize transaction details, thereby facilitating the assessment of the admissibility of clustering results. Meanwhile, we introduce a new heuristic algorithm aimed at identifying one-time change addresses, with experimental results demonstrating that it achieves more precise clustering outcomes than existing heuristic methods. Furthermore, our blockchain analysis reveals overarching patterns and recent changes in the Bitcoin blockchain, particularly following the introduction of the BRC-20 token.</div></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":"53 ","pages":"Article 301935"},"PeriodicalIF":2.2,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144748998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-01Epub Date: 2025-08-01DOI: 10.1016/j.fsidi.2025.301929
Mahfuzul I. Nissan , James Wagner , Alexander Rasin
The increased use of NoSQL databases to store and manage data has led to a demand to include them in forensic investigations. Most NoSQL databases use diverse storage formats compared to file carving and relational database forensics. For example, some NoSQL databases manage key-value pairs using B-Trees, while others maintain hash tables or even binary protocols for serialization. Current research on NoSQL carving focuses on single-database solutions, making it impractical to develop individual carvers for every NoSQL system. This necessitates a generalized approach to forensic recovery, enabling the creation of a unified carver that can operate effectively across various NoSQL platforms.
In this research, we introduce Automated NoSQL Carver, ANOC, a novel tool designed to reconstruct database contents from raw database images without relying on the database API or logs. ANOC adapts to the unique storage characteristics of various NoSQL systems, utilizing byte-level reverse engineering to identify and parse data structures. By analyzing storage layouts algorithmically, ANOC identifies and reconstructs key-value pairs, hierarchical storage structures, and associated metadata across multiple NoSQL platforms.
Through extensive experimentation, we demonstrate ANOC's ability to recover data from four representative key-value store NoSQL databases: Berkeley DB, ZODB, etcd, and LMDB. We explore ANOC's limitations in environments where data is corrupted and RAM snapshots. Our findings establish the feasibility of a generalized carver capable of addressing the challenges posed by the diverse and evolving NoSQL ecosystem.
{"title":"ANOC: Automated NoSQL database carver","authors":"Mahfuzul I. Nissan , James Wagner , Alexander Rasin","doi":"10.1016/j.fsidi.2025.301929","DOIUrl":"10.1016/j.fsidi.2025.301929","url":null,"abstract":"<div><div>The increased use of NoSQL databases to store and manage data has led to a demand to include them in forensic investigations. Most NoSQL databases use diverse storage formats compared to file carving and relational database forensics. For example, some NoSQL databases manage key-value pairs using B-Trees, while others maintain hash tables or even binary protocols for serialization. Current research on NoSQL carving focuses on single-database solutions, making it impractical to develop individual carvers for every NoSQL system. This necessitates a generalized approach to forensic recovery, enabling the creation of a unified carver that can operate effectively across various NoSQL platforms.</div><div>In this research, we introduce Automated NoSQL Carver, <span>ANOC</span>, a novel tool designed to reconstruct database contents from raw database images without relying on the database API or logs. <span>ANOC</span> adapts to the unique storage characteristics of various NoSQL systems, utilizing byte-level reverse engineering to identify and parse data structures. By analyzing storage layouts algorithmically, <span>ANOC</span> identifies and reconstructs key-value pairs, hierarchical storage structures, and associated metadata across multiple NoSQL platforms.</div><div>Through extensive experimentation, we demonstrate <span>ANOC</span>'s ability to recover data from four representative key-value store NoSQL databases: Berkeley DB, ZODB, etcd, and LMDB. We explore <span>ANOC</span>'s limitations in environments where data is corrupted and RAM snapshots. Our findings establish the feasibility of a generalized carver capable of addressing the challenges posed by the diverse and evolving NoSQL ecosystem.</div></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":"53 ","pages":"Article 301929"},"PeriodicalIF":2.2,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144749092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-01Epub Date: 2025-08-01DOI: 10.1016/j.fsidi.2025.301925
Hala Ali , Andrew Case , Irfan Ahmed
As 3D printing is widely adopted across critical sectors, malicious users exploit this technology to produce illegal tools for criminal activities. The increasing availability of affordable 3D printers and the limitations of current regulations highlight the urgent need for robust forensic capabilities. While existing research focuses on the physical forensics of printed objects, the digital aspects of 3D printing forensics remain underexplored, resulting in a significant investigative gap. This paper introduces SliceSnap, a novel memory forensics framework that analyzes the volatile memory of slicing software, which is essential for converting 3D models into printer-executable G-code instructions. Our investigation focuses on Ultimaker Cura, the most popular Python-based slicing tool. By leveraging the Python garbage collector and conducting structural analysis of its objects, SliceSnap can extract the mesh data of 3D models, G-code instructions, slicing settings, detailed 3D printer metadata, and logging information. Given the potential for slicing software compromises, our framework extends beyond artifact extraction to include the complementary analysis tool, G-parser. This tool detects malicious G-code manipulations by finding the discrepancies between the original settings and those extracted from the G-code. Evaluation results demonstrated the effectiveness of SliceSnap in recovering design files and G-code of various criminal tools, such as firearms and TSA master keys, with 100% accuracy, in addition to providing detailed information about the slicing software and 3D printer. The evaluation also analyzed the temporal persistence of memory artifacts across critical stages of Cura's lifecycle. Moreover, through G-parser, the framework successfully detected the G-code manipulations conducted by our novel attack vector that targets G-code during inter-process communication within the software. Implemented as Volatility 3 plugins, SliceSnap provides investigators with automated capabilities to detect 3D printing-related criminal activities.
{"title":"Leveraging memory forensics to investigate and detect illegal 3D printing activities","authors":"Hala Ali , Andrew Case , Irfan Ahmed","doi":"10.1016/j.fsidi.2025.301925","DOIUrl":"10.1016/j.fsidi.2025.301925","url":null,"abstract":"<div><div>As 3D printing is widely adopted across critical sectors, malicious users exploit this technology to produce illegal tools for criminal activities. The increasing availability of affordable 3D printers and the limitations of current regulations highlight the urgent need for robust forensic capabilities. While existing research focuses on the physical forensics of printed objects, the digital aspects of 3D printing forensics remain underexplored, resulting in a significant investigative gap. This paper introduces <em>SliceSnap</em>, a novel memory forensics framework that analyzes the volatile memory of slicing software, which is essential for converting 3D models into printer-executable G-code instructions. Our investigation focuses on Ultimaker Cura, the most popular Python-based slicing tool. By leveraging the Python garbage collector and conducting structural analysis of its objects, <em>SliceSnap</em> can extract the mesh data of 3D models, G-code instructions, slicing settings, detailed 3D printer metadata, and logging information. Given the potential for slicing software compromises, our framework extends beyond artifact extraction to include the complementary analysis tool, <em>G-parser</em>. This tool detects malicious G-code manipulations by finding the discrepancies between the original settings and those extracted from the G-code. Evaluation results demonstrated the effectiveness of <em>SliceSnap</em> in recovering design files and G-code of various criminal tools, such as firearms and TSA master keys, with 100% accuracy, in addition to providing detailed information about the slicing software and 3D printer. The evaluation also analyzed the temporal persistence of memory artifacts across critical stages of Cura's lifecycle. Moreover, through <em>G-parser</em>, the framework successfully detected the G-code manipulations conducted by our novel attack vector that targets G-code during inter-process communication within the software. Implemented as Volatility 3 plugins, <em>SliceSnap</em> provides investigators with automated capabilities to detect 3D printing-related criminal activities.</div></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":"53 ","pages":"Article 301925"},"PeriodicalIF":2.2,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144749088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-01Epub Date: 2025-08-01DOI: 10.1016/j.fsidi.2025.301927
Carlo Jakobs, Axel Mahr, Martin Lambertz, Mariia Rybalka, Daniel Plohmann
This research explores the application of bytewise approximate matching algorithms on executable files, evaluating the effectiveness of ssdeep, sdhash, TLSH, and MRSHv2 across various scenarios, where approximate matching seems to be a natural tool to employ. Previous works already underlined that approximate matching is often used for tasks where the algorithms have not been thoroughly and systematically evaluated. Pagani et al. (2018), in particular, highlighted the shortcomings of previous research and tried to improve current knowledge about the applicability of approximate matching in the context of executable files by evaluating typical use cases. We extend their work by taking a closer look at further common scenarios that are not covered in their article. Specifically, we examine use cases such as different versions of the same software and comparisons between on-disk and in-memory representations of the same program, both for malicious and benign software.
Our findings reveal that the considered algorithms’ performance across all evaluated scenarios was generally unsatisfactory. Notably, they struggle with size-related and localized modifications introduced during the loading stage. Furthermore, executables with no functional similarity may be mismatched due to shared byte-level similarity caused by embedded resources or inherent to certain programming languages or runtime environments. Consequently, these algorithms should be used cautiously and regarded as assisting tools rather than reliable methods for indicating similarity between executable files, as both false positives and false negatives can occur, and users should be aware of them.
Moreover, while some of the unfavored results stem from design decisions, we observed unexpected behavior in some experiments that we could trace back to issues in the reference implementations of the algorithms. After fixing the implementations, the strange effects in our results indeed disappeared. It is still an open question if and to what extent previous experiments and evaluations were affected by these issues.
{"title":"Bytewise approximate matching: Evaluating common scenarios for executable files","authors":"Carlo Jakobs, Axel Mahr, Martin Lambertz, Mariia Rybalka, Daniel Plohmann","doi":"10.1016/j.fsidi.2025.301927","DOIUrl":"10.1016/j.fsidi.2025.301927","url":null,"abstract":"<div><div>This research explores the application of bytewise approximate matching algorithms on executable files, evaluating the effectiveness of ssdeep, sdhash, TLSH, and MRSHv2 across various scenarios, where approximate matching seems to be a natural tool to employ. Previous works already underlined that approximate matching is often used for tasks where the algorithms have not been thoroughly and systematically evaluated. Pagani et al. (2018), in particular, highlighted the shortcomings of previous research and tried to improve current knowledge about the applicability of approximate matching in the context of executable files by evaluating typical use cases. We extend their work by taking a closer look at further common scenarios that are not covered in their article. Specifically, we examine use cases such as different versions of the same software and comparisons between on-disk and in-memory representations of the same program, both for malicious and benign software.</div><div>Our findings reveal that the considered algorithms’ performance across all evaluated scenarios was generally unsatisfactory. Notably, they struggle with size-related and localized modifications introduced during the loading stage. Furthermore, executables with no functional similarity may be mismatched due to shared byte-level similarity caused by embedded resources or inherent to certain programming languages or runtime environments. Consequently, these algorithms should be used cautiously and regarded as assisting tools rather than reliable methods for indicating similarity between executable files, as both false positives and false negatives can occur, and users should be aware of them.</div><div>Moreover, while some of the unfavored results stem from design decisions, we observed unexpected behavior in some experiments that we could trace back to issues in the reference implementations of the algorithms. After fixing the implementations, the strange effects in our results indeed disappeared. It is still an open question if and to what extent previous experiments and evaluations were affected by these issues.</div></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":"53 ","pages":"Article 301927"},"PeriodicalIF":2.2,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144749090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-01Epub Date: 2025-08-01DOI: 10.1016/j.fsidi.2025.301928
Roland Nagy
Rootkit infections have plagued IT systems for several decades now. As non-trivial threats often employed by sophisticated adversaries, rootkits have received a large amount of attention, from both the industrial and academic communities. Consequently, rootkit detection has a rich literature, but most papers focus on only detecting the fact that an infection happened. They rarely offer mitigation, let alone identifying the piece of malware. We aim to solve this by not only detecting rootkit infections but by finding the malware as well. Our paper has three main goals: extend the state of the art of cross-view-based detection of Loadable Kernel Modules (the de-facto delivery method of Linux kernel rootkits), provide a memory forensics tool that implements our detection method and enables further investigation of loaded modules, and publish the dataset we used to evaluate our solution. We implemented our tool in the form of a Volatility plugin and compared it to the already existing module detection capability of Volatility. We tested them on 55 rootkit-infected memory dumps, covering 27 different versions of the Linux kernel. We also provide compatibility analysis with different kernel versions, ranging from the initial release to the latest (6.13, at the time of writing).
{"title":"Detecting hidden kernel modules in memory snapshots","authors":"Roland Nagy","doi":"10.1016/j.fsidi.2025.301928","DOIUrl":"10.1016/j.fsidi.2025.301928","url":null,"abstract":"<div><div>Rootkit infections have plagued IT systems for several decades now. As non-trivial threats often employed by sophisticated adversaries, rootkits have received a large amount of attention, from both the industrial and academic communities. Consequently, rootkit detection has a rich literature, but most papers focus on only detecting the fact that an infection happened. They rarely offer mitigation, let alone identifying the piece of malware. We aim to solve this by not only detecting rootkit infections but by finding the malware as well. Our paper has three main goals: extend the state of the art of cross-view-based detection of Loadable Kernel Modules (the de-facto delivery method of Linux kernel rootkits), provide a memory forensics tool that implements our detection method and enables further investigation of loaded modules, and publish the dataset we used to evaluate our solution. We implemented our tool in the form of a Volatility plugin and compared it to the already existing module detection capability of Volatility. We tested them on 55 rootkit-infected memory dumps, covering 27 different versions of the Linux kernel. We also provide compatibility analysis with different kernel versions, ranging from the initial release to the latest (6.13, at the time of writing).</div></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":"53 ","pages":"Article 301928"},"PeriodicalIF":2.2,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144749091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-01Epub Date: 2025-03-14DOI: 10.1016/j.fsidi.2025.301909
Dario Stabili, Filip Valgimigli, Mirco Marchetti
Modern vehicles are equipped with In-Vehicle Infotainment (IVI) systems that offers different functions, such as typical radio and multimedia services, navigation and internet browsing. To operate properly, IVI systems have to store locally different types of data, reflecting user preferences and behaviors. If stored and managed insecurely, these data might expose sensitive information and represent a privacy risk. In this paper we address this issue by presenting a methodology for the extraction of privacy-sensitive information from the popular COMMAND IVI system (specifically, the version by Harman), deployed in some Mercedes-Benz vehicles from 2013 to 2019. We show that it is possible to extract information related to geographic locations and various vehicles events (such as ignition and doors opening and closing) dating back to the previous 8 months, and that these data can be cross-referenced to precisely identify the activities and habits of the driver. Moreover, we develop a novel forensic tool to automate this task.1 Given the past usage of the system, our work might have real life implications for the privacy of millions of drivers, owners and passengers. As a final contribution, we develop a novel technique for SQLite data carving specifically designed to identify deleted data. Comparison with existing state-of-the-art tools for SQLite3 data recovery demonstrates that our approach is more effective in recovering deleted traces than general purpose tools.
{"title":"I know where you have been last summer: Extracting privacy-sensitive information via forensic analysis of the Mercedes-Benz NTG5*2 infotainment system","authors":"Dario Stabili, Filip Valgimigli, Mirco Marchetti","doi":"10.1016/j.fsidi.2025.301909","DOIUrl":"10.1016/j.fsidi.2025.301909","url":null,"abstract":"<div><div>Modern vehicles are equipped with In-Vehicle Infotainment (IVI) systems that offers different functions, such as typical radio and multimedia services, navigation and internet browsing. To operate properly, IVI systems have to store locally different types of data, reflecting user preferences and behaviors. If stored and managed insecurely, these data might expose sensitive information and represent a privacy risk. In this paper we address this issue by presenting a methodology for the extraction of privacy-sensitive information from the popular <span><math><mi>N</mi><mi>T</mi><mi>G</mi><mn>5</mn></math></span> COMMAND IVI system (specifically, the <span><math><mi>N</mi><mi>T</mi><mi>G</mi><mn>5</mn><mo>⁎</mo><mn>2</mn></math></span> version by Harman), deployed in some Mercedes-Benz vehicles from 2013 to 2019. We show that it is possible to extract information related to geographic locations and various vehicles events (such as ignition and doors opening and closing) dating back to the previous 8 months, and that these data can be cross-referenced to precisely identify the activities and habits of the driver. Moreover, we develop a novel forensic tool to automate this task.<span><span><sup>1</sup></span></span> Given the past usage of the <span><math><mi>N</mi><mi>T</mi><mi>G</mi><mn>5</mn></math></span> system, our work might have real life implications for the privacy of millions of drivers, owners and passengers. As a final contribution, we develop a novel technique for SQLite data carving specifically designed to identify deleted data. Comparison with existing state-of-the-art tools for SQLite3 data recovery demonstrates that our approach is more effective in recovering deleted traces than general purpose tools.</div></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":"53 ","pages":"Article 301909"},"PeriodicalIF":2.0,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143620486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-01Epub Date: 2025-03-13DOI: 10.1016/j.fsidi.2025.301911
Mohammad Abbasi-Azar , Mehdi Teimouri , Mohsen Nikray
Network forensics faces major challenges, including increasingly sophisticated cyberattacks and the difficulty of obtaining labeled datasets for training AI-driven security tools. Blind Protocol Identification (BPI), essential for detecting covert data transfers, is particularly impacted by these data limitations. This paper introduces a novel and inherently scalable method for generating synthetic datasets tailored for BPI in network forensics. Our approach emphasizes feature engineering and a statistical-analytical model of feature distributions to address the scarcity and imbalance of labeled data. We demonstrate the effectiveness of this method through a case study on geographic protocols, where we train Random Forest models using only synthetic datasets and evaluate their performance on real-world traffic. This work presents a promising solution to the data challenges in BPI, enabling reliable protocol identification while maintaining data privacy and overcoming traditional data collection limitations.
{"title":"Blind protocol identification using synthetic dataset: A case study on geographic protocols","authors":"Mohammad Abbasi-Azar , Mehdi Teimouri , Mohsen Nikray","doi":"10.1016/j.fsidi.2025.301911","DOIUrl":"10.1016/j.fsidi.2025.301911","url":null,"abstract":"<div><div>Network forensics faces major challenges, including increasingly sophisticated cyberattacks and the difficulty of obtaining labeled datasets for training AI-driven security tools. Blind Protocol Identification (BPI), essential for detecting covert data transfers, is particularly impacted by these data limitations. This paper introduces a novel and inherently scalable method for generating synthetic datasets tailored for BPI in network forensics. Our approach emphasizes feature engineering and a statistical-analytical model of feature distributions to address the scarcity and imbalance of labeled data. We demonstrate the effectiveness of this method through a case study on geographic protocols, where we train Random Forest models using only synthetic datasets and evaluate their performance on real-world traffic. This work presents a promising solution to the data challenges in BPI, enabling reliable protocol identification while maintaining data privacy and overcoming traditional data collection limitations.</div></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":"53 ","pages":"Article 301911"},"PeriodicalIF":2.0,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143610262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}