Forensic Science International-Digital Investigation最新文献_第3页

Exploring the potential of large language models for author profiling tasks in digital text forensics 探索大型语言模型在数字文本取证中用于作者特征描述任务的潜力

IF 2 4区医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Forensic Science International-Digital Investigation

Pub Date : 2024-10-01 DOI: 10.1016/j.fsidi.2024.301814

Sang-Hyun Cho , Dohyun Kim , Hyuk-Chul Kwon , Minho Kim

The rapid advancement of large language models (LLMs) has opened up new possibilities for various natural language processing tasks. This study explores the potential of LLMs for author profiling in digital text forensics, which involves identifying characteristics such as age and gender from writing style—a crucial task in forensic investigations of anonymous or pseudonymous communications. Experiments were conducted using state-of-the-art LLMs, including Polyglot, EEVE, and Bllossom, to evaluate their performance in author profiling. Different fine-tuning strategies, such as full fine-tuning, Low-Rank Adaptation (LoRA), and Quantized LoRA (QLoRA), were compared to determine the most effective methods for adapting LLMs to the specific needs of this task. The results show that fine-tuned LLMs can effectively predict authors’ age and gender based on their writing styles, with Polyglot-based models generally outperforming EEVE and Bllossom models. Additionally, LoRA and QLoRA strategies significantly reduce computational costs and memory requirements while maintaining performance comparable to full fine-tuning. However, error analysis reveals limitations in the current LLM-based approach, including difficulty in capturing subtle linguistic variations across age groups and potential biases from pre-training data. These challenges are discussed and future research directions to address them are proposed. This study underscores the potential of LLMs in author profiling for digital text forensics, suggesting promising avenues for further exploration and refinement.

大型语言模型（LLM）的快速发展为各种自然语言处理任务提供了新的可能性。本研究探讨了 LLMs 在数字文本取证中进行作者特征描述的潜力，这涉及从写作风格中识别年龄和性别等特征--这是匿名或假名通信取证调查中的一项重要任务。我们使用最先进的 LLM（包括 Polyglot、EEVE 和 Bllossom）进行了实验，以评估它们在作者特征分析中的性能。比较了不同的微调策略，如完全微调、Low-Rank Adaptation (LoRA) 和 Quantized LoRA (QLoRA)，以确定最有效的方法，使 LLM 适应这项任务的特定需求。结果表明，经过微调的 LLM 可以根据写作风格有效预测作者的年龄和性别，其中基于 Polyglot 的模型普遍优于 EEVE 和 Bllossom 模型。此外，LoRA 和 QLoRA 策略大大降低了计算成本和内存需求，同时保持了与完全微调相当的性能。然而，误差分析揭示了当前基于 LLM 方法的局限性，包括难以捕捉不同年龄组的微妙语言变化以及预训练数据可能带来的偏差。本研究讨论了这些挑战，并提出了解决这些问题的未来研究方向。这项研究强调了 LLM 在数字文本取证的作者特征描述方面的潜力，并提出了进一步探索和完善的前景广阔的途径。

{"title":"Exploring the potential of large language models for author profiling tasks in digital text forensics","authors":"Sang-Hyun Cho , Dohyun Kim , Hyuk-Chul Kwon , Minho Kim","doi":"10.1016/j.fsidi.2024.301814","DOIUrl":"10.1016/j.fsidi.2024.301814","url":null,"abstract":"<div><div>The rapid advancement of large language models (LLMs) has opened up new possibilities for various natural language processing tasks. This study explores the potential of LLMs for author profiling in digital text forensics, which involves identifying characteristics such as age and gender from writing style—a crucial task in forensic investigations of anonymous or pseudonymous communications. Experiments were conducted using state-of-the-art LLMs, including Polyglot, EEVE, and Bllossom, to evaluate their performance in author profiling. Different fine-tuning strategies, such as full fine-tuning, Low-Rank Adaptation (LoRA), and Quantized LoRA (QLoRA), were compared to determine the most effective methods for adapting LLMs to the specific needs of this task. The results show that fine-tuned LLMs can effectively predict authors’ age and gender based on their writing styles, with Polyglot-based models generally outperforming EEVE and Bllossom models. Additionally, LoRA and QLoRA strategies significantly reduce computational costs and memory requirements while maintaining performance comparable to full fine-tuning. However, error analysis reveals limitations in the current LLM-based approach, including difficulty in capturing subtle linguistic variations across age groups and potential biases from pre-training data. These challenges are discussed and future research directions to address them are proposed. This study underscores the potential of LLMs in author profiling for digital text forensics, suggesting promising avenues for further exploration and refinement.</div></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":"50 ","pages":"Article 301814"},"PeriodicalIF":2.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142530440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DFRWS EURO 2025 Brno DFRWS 2025 年布尔诺欧洲杯

IF 2 4区医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Forensic Science International-Digital Investigation

Pub Date : 2024-10-01 DOI: 10.1016/S2666-2817(24)00160-4

引用次数: 0

Nintendo 3DS forensics: A secondhand case study 任天堂 3DS 取证：二手案例研究

IF 2 4区医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Forensic Science International-Digital Investigation

Pub Date : 2024-10-01 DOI: 10.1016/j.fsidi.2024.301815

Huw O.L. Read , Konstantinos Xynos , Iain Sutherland , Matthew Bovee , Clyde Tamburro

Computer and console-based video games are an important part of the entertainment industry. Such devices may be found in evidence lockers as part of investigations, or overlooked as their intrinsic value to an investigation may not be well-understood. Modern games consoles provide network connectivity and functionality that allows a significant degree of interaction via peer-to-peer connections and/or the Internet. These gaming consoles store settings, user preferences, user information, and can capture photos, audio and video, all of which potentially contain forensic artifacts about a person of interest. Games consoles have a fixed lifespan, eventually superseded by newer models with an expanded range of capabilities. As there are significant numbers of consoles available on the secondhand market, there is clear evidence that older consoles remain in circulation even after production has ceased. What is unclear, however, is the actual extent of forensic data available within these consoles. This paper shares the results of a digital forensic case-study undertaken to assess what artifacts are retrievable based on ‘real-world’ dataset, particularly the aging, but popular Nintendo 3DS series. A total of 47 Nintendo 3DS/2DS handheld systems were purchased secondhand. They were forensically imaged then examined to identify what artifacts are commonly found ‘in the wild’ on these often overlooked systems. Results presented in this paper provide guidance to digital forensic investigators of what may be realistically obtained from these non-traditional devices.

计算机和游戏机视频游戏是娱乐业的重要组成部分。此类设备可能作为调查的一部分出现在证据柜中，也可能因其对调查的内在价值不甚明了而被忽视。现代游戏机提供网络连接和功能，允许通过点对点连接和/或互联网进行大量互动。这些游戏机可存储设置、用户偏好、用户信息，并可捕捉照片、音频和视频，所有这些都可能包含有关嫌疑人的法证文物。游戏机有固定的使用寿命，最终会被功能更强大的更新机型所取代。由于二手市场上有大量游戏机，因此有明显证据表明，即使在停止生产后，旧游戏机仍在流通。然而，尚不清楚的是这些游戏机中实际存在的取证数据。本文分享了一项数字取证案例研究的结果，该研究旨在根据 "真实世界 "数据集，特别是老化但流行的任天堂 3DS 系列，评估有哪些人工制品可以检索。共有 47 台任天堂 3DS/2DS 手持系统被购买为二手货。对这些系统进行了取证成像，然后进行了检查，以确定在这些经常被忽视的系统上 "野外 "通常会发现哪些人工制品。本文介绍的结果为数字取证调查人员提供了指导，让他们了解从这些非传统设备上可以实际获得什么。

{"title":"Nintendo 3DS forensics: A secondhand case study","authors":"Huw O.L. Read , Konstantinos Xynos , Iain Sutherland , Matthew Bovee , Clyde Tamburro","doi":"10.1016/j.fsidi.2024.301815","DOIUrl":"10.1016/j.fsidi.2024.301815","url":null,"abstract":"<div><div>Computer and console-based video games are an important part of the entertainment industry. Such devices may be found in evidence lockers as part of investigations, or overlooked as their intrinsic value to an investigation may not be well-understood. Modern games consoles provide network connectivity and functionality that allows a significant degree of interaction via peer-to-peer connections and/or the Internet. These gaming consoles store settings, user preferences, user information, and can capture photos, audio and video, all of which potentially contain forensic artifacts about a person of interest. Games consoles have a fixed lifespan, eventually superseded by newer models with an expanded range of capabilities. As there are significant numbers of consoles available on the secondhand market, there is clear evidence that older consoles remain in circulation even after production has ceased. What is unclear, however, is the actual extent of forensic data available within these consoles. This paper shares the results of a digital forensic case-study undertaken to assess what artifacts are retrievable based on ‘real-world’ dataset, particularly the aging, but popular Nintendo 3DS series. A total of 47 Nintendo 3DS/2DS handheld systems were purchased secondhand. They were forensically imaged then examined to identify what artifacts are commonly found ‘in the wild’ on these often overlooked systems. Results presented in this paper provide guidance to digital forensic investigators of what may be realistically obtained from these non-traditional devices.</div></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":"50 ","pages":"Article 301815"},"PeriodicalIF":2.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142530441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DFRWS USA 2025 Chicago 美国 2025 芝加哥

IF 2 4区医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Forensic Science International-Digital Investigation

Pub Date : 2024-10-01 DOI: 10.1016/S2666-2817(24)00161-6

引用次数: 0

TAENet: Two-branch Autoencoder Network for Interpretable Deepfake Detection TAENet：用于可解释深度伪造检测的双分支自动编码器网络

IF 2 4区医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Forensic Science International-Digital Investigation

Pub Date : 2024-10-01 DOI: 10.1016/j.fsidi.2024.301808

Fuqiang Du , Min Yu , Boquan Li , Kam Pui Chow , Jianguo Jiang , Yixin Zhang , Yachao Liang , Min Li , Weiqing Huang

Deepfake detection attracts increasingly attention due to serious security issues caused by facial manipulation techniques. Recently, deep learning-based detectors have achieved promising performance. However, these detectors suffer severe untrustworthy due to the lack of interpretability. Thus, it is essential to work on the interpretibility of deepfake detectors to improve the reliability and traceability of digital evidence. In this work, we propose a two-branch autoencoder network named TAENet for interpretable deepfake detection. TAENet is composed of Content Feature Disentanglement (CFD), Content Map Generation (CMG), and Classification. CFD extracts latent features of real and forged content with dual encoder and feature discriminator. CMG employs a Pixel-level Content Map Generation Loss (PCMGL) to guide the dual decoder in visualizing the latent representations of real and forged contents as real-map and fake-map. In classification module, the Auxiliary Classifier (AC) serves as map amplifier to improve the accuracy of real-map image extraction. Finally, the learned model decouples the input image into two maps that have the same size as the input, providing visualized evidence for deepfake detection. Extensive experiments demonstrate that TAENet can offer interpretability in deepfake detection without compromising accuracy.

由于面部伪造技术造成的严重安全问题，深度伪造检测越来越受到人们的关注。最近，基于深度学习的检测器取得了可喜的成绩。然而，由于缺乏可解释性，这些检测器存在严重的不可信问题。因此，有必要研究深度伪造检测器的可解释性，以提高数字证据的可靠性和可追溯性。在这项工作中，我们提出了一种名为 TAENet 的双分支自动编码器网络，用于可解释的深度赝品检测。TAENet 由内容特征分解（Content Feature Disentanglement，CFD）、内容地图生成（Content Map Generation，CMG）和分类（Classification）三部分组成。CFD 通过双编码器和特征判别器提取真实和伪造内容的潜在特征。CMG 采用像素级内容映射生成损耗（PCMGL）引导双解码器将真实和伪造内容的潜在表示可视化为真实映射和伪造映射。在分类模块中，辅助分类器（AC）充当地图放大器，以提高真实地图图像提取的准确性。最后，学习到的模型将输入图像解耦为两个与输入图像大小相同的地图，为深度赝品检测提供可视化证据。广泛的实验证明，TAENet 可以在不影响准确性的情况下为深度伪造检测提供可解释性。

{"title":"TAENet: Two-branch Autoencoder Network for Interpretable Deepfake Detection","authors":"Fuqiang Du , Min Yu , Boquan Li , Kam Pui Chow , Jianguo Jiang , Yixin Zhang , Yachao Liang , Min Li , Weiqing Huang","doi":"10.1016/j.fsidi.2024.301808","DOIUrl":"10.1016/j.fsidi.2024.301808","url":null,"abstract":"<div><div>Deepfake detection attracts increasingly attention due to serious security issues caused by facial manipulation techniques. Recently, deep learning-based detectors have achieved promising performance. However, these detectors suffer severe untrustworthy due to the lack of interpretability. Thus, it is essential to work on the interpretibility of deepfake detectors to improve the reliability and traceability of digital evidence. In this work, we propose a two-branch autoencoder network named TAENet for interpretable deepfake detection. TAENet is composed of Content Feature Disentanglement (CFD), Content Map Generation (CMG), and Classification. CFD extracts latent features of real and forged content with dual encoder and feature discriminator. CMG employs a Pixel-level Content Map Generation Loss (PCMGL) to guide the dual decoder in visualizing the latent representations of real and forged contents as real-map and fake-map. In classification module, the Auxiliary Classifier (AC) serves as map amplifier to improve the accuracy of real-map image extraction. Finally, the learned model decouples the input image into two maps that have the same size as the input, providing visualized evidence for deepfake detection. Extensive experiments demonstrate that TAENet can offer interpretability in deepfake detection without compromising accuracy.</div></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":"50 ","pages":"Article 301808"},"PeriodicalIF":2.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142530826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards a unified XAI-based framework for digital forensic investigations 为数字取证调查建立基于 XAI 的统一框架

IF 2 4区医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Forensic Science International-Digital Investigation

Pub Date : 2024-10-01 DOI: 10.1016/j.fsidi.2024.301806

Zainab Khalid , Farkhund Iqbal , Benjamin C.M. Fung

Explainable Artificial Intelligence (XAI) aims to alleviate the black-box AI conundrum in the field of Digital Forensics (DF) (and others) by providing layman-interpretable explanations to predictions made by AI models. It also handles the increasing volumes of forensic images that are impossible to investigate via manual methods; or even automated forensic tools. A holistic, generalized, yet exhaustive framework detailing the workflow of XAI for DF is proposed for standardization. A case study examining the implementation of the framework in a network forensics investigative scenario is presented for demonstration. In addition, the XAI-DF project lays the basis for a collaborative effort from the forensics community, aimed at creating an open-source forensic database that may be employed to train AI models for the digital forensics domain. As an onset contribution to the project, we create a memory forensics database of 27 memory dumps (Windows 7, 10, and 11) simulating malware activity and extracting relevant features (specific to processes, injected code, network connections, API hooks, and process privileges) that may be used for training, testing, and validating AI models in keeping with the XAI-DF framework.

可解释人工智能（XAI）旨在通过为人工智能模型的预测提供通俗易懂的解释，缓解数字取证（DF）（及其他）领域的黑箱人工智能难题。它还能处理越来越多无法通过人工方法甚至自动取证工具进行调查的取证图像。为实现标准化，我们提出了一个全面、通用但详尽的框架，详细说明了用于 DF 的 XAI 工作流程。为进行演示，介绍了在网络取证调查场景中实施该框架的案例研究。此外，XAI-DF 项目为取证社区的合作努力奠定了基础，旨在创建一个开源取证数据库，可用于训练数字取证领域的人工智能模型。作为对该项目的初步贡献，我们创建了一个包含 27 个内存转储（Windows 7、10 和 11）的内存取证数据库，模拟恶意软件活动并提取相关特征（特定于进程、注入代码、网络连接、API 挂钩和进程权限），可用于训练、测试和验证符合 XAI-DF 框架的人工智能模型。

{"title":"Towards a unified XAI-based framework for digital forensic investigations","authors":"Zainab Khalid , Farkhund Iqbal , Benjamin C.M. Fung","doi":"10.1016/j.fsidi.2024.301806","DOIUrl":"10.1016/j.fsidi.2024.301806","url":null,"abstract":"<div><div>Explainable Artificial Intelligence (XAI) aims to alleviate the black-box AI conundrum in the field of Digital Forensics (DF) (and others) by providing layman-interpretable explanations to predictions made by AI models. It also handles the increasing volumes of forensic images that are impossible to investigate via manual methods; or even automated forensic tools. A holistic, generalized, yet exhaustive framework detailing the workflow of XAI for DF is proposed for standardization. A case study examining the implementation of the framework in a network forensics investigative scenario is presented for demonstration. In addition, the XAI-DF project lays the basis for a collaborative effort from the forensics community, aimed at creating an open-source forensic database that may be employed to train AI models for the digital forensics domain. As an onset contribution to the project, we create a memory forensics database of 27 memory dumps (Windows 7, 10, and 11) simulating malware activity and extracting relevant features (specific to processes, injected code, network connections, API hooks, and process privileges) that may be used for training, testing, and validating AI models in keeping with the XAI-DF framework.</div></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":"50 ","pages":"Article 301806"},"PeriodicalIF":2.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142530436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

GBKPA and AuxShield: Addressing adversarial robustness and transferability in android malware detection GBKPA 和 AuxShield：解决安卓恶意软件检测中的对抗鲁棒性和可转移性问题

IF 2 4区医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Forensic Science International-Digital Investigation

Pub Date : 2024-10-01 DOI: 10.1016/j.fsidi.2024.301816

Kumarakrishna Valeti, Hemant Rathore

Android stands as the predominant operating system within the mobile ecosystem. Users can download applications from official sources like Google Play Store and other third-party platforms. However, malicious actors can attempt to compromise user device integrity through malicious applications. Traditionally, signatures, rules, and other methods have been employed to detect malware attacks and protect device integrity. However, the growing number and complexity of malicious applications have prompted the exploration of newer techniques like machine learning (ML) and deep learning (DL). Many recent studies have demonstrated promising results in detecting malicious applications using ML and DL solutions. However, research in other fields, such as computer vision, has shown that ML and DL solutions are vulnerable to targeted adversarial attacks. Malicious actors can develop malicious adversarial applications that can bypass ML and DL based anti-viruses. The study of adversarial techniques related to malware detection has now captured the security community’s attention. In this work, we utilise android permissions and intents to construct 28 distinct malware detection models using 14 classification algorithms. Later, we introduce a novel targeted false-negative evasion attack, Gradient Based K Perturbation Attack (GBKPA), designed for grey-box knowledge scenarios to assess the robustness of these models. The GBKPA attempts to craft malicious adversarial samples by making minimal perturbations without violating the syntactic and functional structure of the application. GBKPA achieved an average fooling rate (FR) of 77 % with only five perturbations across the 28 detection models. Additionally, we identified the most vulnerable android permissions and intents that malicious actors can exploit for evasion attacks. Furthermore, we analyse the transferability of adversarial samples across different classes of models and provide explanations for the same. Finally, we proposed AuxShield defence mechanism to develop robust detection models. AuxShield reduced the average FR to 3.25 % against 28 detection models. Our findings underscore the need to understand the causation of adversarial samples, their transferability, and robust defence strategies before deploying ML and DL solutions in the real world.

安卓是移动生态系统中最主要的操作系统。用户可以从官方渠道下载应用程序，如 Google Play 商店和其他第三方平台。然而，恶意行为者可能试图通过恶意应用程序破坏用户设备的完整性。传统上，人们采用签名、规则和其他方法来检测恶意软件攻击并保护设备完整性。然而，恶意应用程序的数量和复杂性不断增加，促使人们探索机器学习（ML）和深度学习（DL）等更新的技术。最近的许多研究表明，使用 ML 和 DL 解决方案检测恶意应用程序的效果很好。然而，计算机视觉等其他领域的研究表明，ML 和 DL 解决方案很容易受到有针对性的恶意攻击。恶意行为者可以开发恶意对抗应用程序，绕过基于 ML 和 DL 的反病毒程序。与恶意软件检测相关的对抗技术研究现已引起了安全界的关注。在这项工作中，我们利用安卓权限和意图，使用 14 种分类算法构建了 28 种不同的恶意软件检测模型。随后，我们引入了一种新颖的有针对性的假阴性规避攻击--基于梯度的 K Perturbation 攻击（GBKPA），该攻击专为灰盒知识场景设计，用于评估这些模型的鲁棒性。GBKPA 尝试在不违反应用程序语法和功能结构的前提下，通过最小的扰动来制作恶意对抗样本。在 28 个检测模型中，GBKPA 只用了 5 次扰动，就实现了 77% 的平均欺骗率 (FR)。此外，我们还确定了恶意行为者可用于规避攻击的最脆弱的安卓权限和意图。此外，我们还分析了对抗样本在不同类别模型中的可转移性，并提供了相应的解释。最后，我们提出了 AuxShield 防御机制，以开发稳健的检测模型。在 28 个检测模型中，AuxShield 将平均 FR 降低到 3.25%。我们的研究结果强调，在现实世界中部署 ML 和 DL 解决方案之前，有必要了解对抗样本的成因、其可转移性以及稳健的防御策略。

{"title":"GBKPA and AuxShield: Addressing adversarial robustness and transferability in android malware detection","authors":"Kumarakrishna Valeti, Hemant Rathore","doi":"10.1016/j.fsidi.2024.301816","DOIUrl":"10.1016/j.fsidi.2024.301816","url":null,"abstract":"<div><div>Android stands as the predominant operating system within the mobile ecosystem. Users can download applications from official sources like <em>Google Play Store</em> and other third-party platforms. However, malicious actors can attempt to compromise user device integrity through malicious applications. Traditionally, signatures, rules, and other methods have been employed to detect malware attacks and protect device integrity. However, the growing number and complexity of malicious applications have prompted the exploration of newer techniques like machine learning (ML) and deep learning (DL). Many recent studies have demonstrated promising results in detecting malicious applications using ML and DL solutions. However, research in other fields, such as computer vision, has shown that ML and DL solutions are vulnerable to targeted adversarial attacks. Malicious actors can develop malicious adversarial applications that can bypass ML and DL based anti-viruses. The study of adversarial techniques related to malware detection has now captured the security community’s attention. In this work, we utilise android permissions and intents to construct 28 distinct malware detection models using 14 classification algorithms. Later, we introduce a novel targeted false-negative evasion attack, <em>Gradient Based K Perturbation Attack (GBKPA)</em>, designed for grey-box knowledge scenarios to assess the robustness of these models. The GBKPA attempts to craft malicious adversarial samples by making minimal perturbations without violating the syntactic and functional structure of the application. GBKPA achieved an average fooling rate (FR) of 77 % with only five perturbations across the 28 detection models. Additionally, we identified the most vulnerable android permissions and intents that malicious actors can exploit for evasion attacks. Furthermore, we analyse the transferability of adversarial samples across different classes of models and provide explanations for the same. Finally, we proposed <em>AuxShield</em> defence mechanism to develop robust detection models. AuxShield reduced the average FR to 3.25 % against 28 detection models. Our findings underscore the need to understand the causation of adversarial samples, their transferability, and robust defence strategies before deploying ML and DL solutions in the real world.</div></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":"50 ","pages":"Article 301816"},"PeriodicalIF":2.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142530442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The provenance of Apple Health data: A timeline of update history 苹果健康数据的来源：更新历史年表

IF 2 4区医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Forensic Science International-Digital Investigation

Pub Date : 2024-10-01 DOI: 10.1016/j.fsidi.2024.301804

Luke Jennings , Matthew Sorell , Hugo G. Espinosa

Fitness tracking smart watches are becoming more prevalent in investigations and the need to understand and document their forensic potential and limitations is important for practitioners and researchers. Such fitness devices have undergone several hardware and software upgrades, changing the way they operate and evolving as more sophisticated pieces of technology. One example is the Apple Watch, working in conjunction with the Apple iPhone, to measure and record a vast amount of health information in the Apple Health database, healthdb_secure.sqlite. Over time, an end user will update their devices, but their health data, uniquely, carries over from one device to the next. In this paper, we investigate and analyse the hardware and software provenance of a real 5+ year Apple Health dataset to determine changes, patterns and anomalies over time. This provenance investigation provides insights in the form of (1) a timeline, representing the dataset's history of device and firmware updates that can be used in the context of investigation validation, (2) anomaly detection and, (3) insights into cyber hygiene. Analysis of the non-health data recorded in the health database arguably provides just as much insightful information as the health data itself.

健身追踪智能手表在调查中越来越普遍，对于从业人员和研究人员来说，了解和记录其取证潜力和局限性非常重要。此类健身设备经历了多次硬件和软件升级，改变了其操作方式，并发展成为更先进的技术。其中一个例子是 Apple Watch，它与 Apple iPhone 配合使用，可测量大量健康信息并将其记录到 Apple Health 数据库 healthdb_secure.sqlite。随着时间的推移，终端用户会更新他们的设备，但他们的健康数据会从一台设备唯一地延续到下一台设备。在本文中，我们调查并分析了一个真实的 5 年以上 Apple Health 数据集的硬件和软件出处，以确定随时间推移出现的变化、模式和异常。这种出处调查提供了以下形式的见解：(1) 时间轴，代表数据集的设备和固件更新历史，可用于调查验证；(2) 异常检测；(3) 网络卫生见解。可以说，对健康数据库中记录的非健康数据进行分析，与健康数据本身一样能提供具有洞察力的信息。

{"title":"The provenance of Apple Health data: A timeline of update history","authors":"Luke Jennings , Matthew Sorell , Hugo G. Espinosa","doi":"10.1016/j.fsidi.2024.301804","DOIUrl":"10.1016/j.fsidi.2024.301804","url":null,"abstract":"<div><div>Fitness tracking smart watches are becoming more prevalent in investigations and the need to understand and document their forensic potential and limitations is important for practitioners and researchers. Such fitness devices have undergone several hardware and software upgrades, changing the way they operate and evolving as more sophisticated pieces of technology. One example is the Apple Watch, working in conjunction with the Apple iPhone, to measure and record a vast amount of health information in the Apple Health database, <em>healthdb</em>_<em>secure</em>.<em>sqlite</em>. Over time, an end user will update their devices, but their health data, uniquely, carries over from one device to the next. In this paper, we investigate and analyse the hardware and software provenance of a real 5+ year Apple Health dataset to determine changes, patterns and anomalies over time. This provenance investigation provides insights in the form of (1) a timeline, representing the dataset's history of device and firmware updates that can be used in the context of investigation validation, (2) anomaly detection and, (3) insights into cyber hygiene. Analysis of the non-health data recorded in the health database arguably provides just as much insightful information as the health data itself.</div></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":"50 ","pages":"Article 301804"},"PeriodicalIF":2.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142530821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Mount SMB.pcap: Reconstructing file systems and file operations from network traffic 安装 SMB.pcap：从网络流量中重建文件系统和文件操作

IF 2 4区医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Forensic Science International-Digital Investigation

Pub Date : 2024-10-01 DOI: 10.1016/j.fsidi.2024.301807

Jan-Niclas Hilgert, Axel Mahr, Martin Lambertz

File system and network forensics are fundamental in forensic investigations, but are often treated as distinct disciplines. This work seeks to unify these fields by introducing a novel framework capable of mounting network captures, enabling investigators to seamlessly browse data using conventional tools. Although our implementation supports various protocols such as HTTP, TLS, and FTP, this work will particularly focus on the complexities of the Server Message Block (SMB) protocol, which is fundamental for shared file system access, especially within local networks.

For this, we present a detailed methodology to extract essential file system data from SMB network traffic, aiming to reconstruct the share's file system as accurately as the original. Our approach goes beyond traditional tools like Wireshark, which typically only extract individual files from SMB transmissions. Instead, we reconstruct the entire file system hierarchy, retrieve all associated metadata, and handle multiple versions of files captured within the same network traffic. In addition, we also investigate how file operations impact SMB commands and show how these can be used to accurately recreate user activities on an SMB share based solely on network traffic. Although both methodologies and implementations can be applied independently, their combination provides investigators with a comprehensive view of the reconstructed file system along with the corresponding user activities extracted from network traffic.

文件系统和网络取证是取证调查的基础，但通常被视为不同的学科。这项工作旨在通过引入一种能够安装网络捕获的新型框架来统一这些领域，使调查人员能够使用传统工具无缝浏览数据。虽然我们的实现支持 HTTP、TLS 和 FTP 等各种协议，但这项工作将特别关注服务器消息块（SMB）协议的复杂性，该协议是共享文件系统访问的基础，尤其是在本地网络中。为此，我们提出了一种从 SMB 网络流量中提取基本文件系统数据的详细方法，旨在重建与原始文件系统一样精确的共享文件系统。我们的方法超越了 Wireshark 等传统工具，这些工具通常只能从 SMB 传输中提取单个文件。相反，我们会重建整个文件系统的层次结构，检索所有相关的元数据，并处理在同一网络流量中捕获的多个版本的文件。此外，我们还研究了文件操作对 SMB 命令的影响，并展示了如何利用这些影响，仅根据网络流量就能准确还原 SMB 共享上的用户活动。虽然这两种方法和实现方式都可以独立应用，但它们的结合为调查人员提供了重建文件系统的全面视图，以及从网络流量中提取的相应用户活动。

{"title":"Mount SMB.pcap: Reconstructing file systems and file operations from network traffic","authors":"Jan-Niclas Hilgert, Axel Mahr, Martin Lambertz","doi":"10.1016/j.fsidi.2024.301807","DOIUrl":"10.1016/j.fsidi.2024.301807","url":null,"abstract":"<div><div>File system and network forensics are fundamental in forensic investigations, but are often treated as distinct disciplines. This work seeks to unify these fields by introducing a novel framework capable of mounting network captures, enabling investigators to seamlessly browse data using conventional tools. Although our implementation supports various protocols such as HTTP, TLS, and FTP, this work will particularly focus on the complexities of the Server Message Block (SMB) protocol, which is fundamental for shared file system access, especially within local networks.</div><div>For this, we present a detailed methodology to extract essential file system data from SMB network traffic, aiming to reconstruct the share's file system as accurately as the original. Our approach goes beyond traditional tools like Wireshark, which typically only extract individual files from SMB transmissions. Instead, we reconstruct the entire file system hierarchy, retrieve all associated metadata, and handle multiple versions of files captured within the same network traffic. In addition, we also investigate how file operations impact SMB commands and show how these can be used to accurately recreate user activities on an SMB share based solely on network traffic. Although both methodologies and implementations can be applied independently, their combination provides investigators with a comprehensive view of the reconstructed file system along with the corresponding user activities extracted from network traffic.</div></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":"50 ","pages":"Article 301807"},"PeriodicalIF":2.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142530825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Re-imagen: Generating coherent background activity in synthetic scenario-based forensic datasets using large language models Re-imagen：使用大型语言模型在合成场景法证数据集中生成连贯的背景活动

IF 2 4区医学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Forensic Science International-Digital Investigation

Pub Date : 2024-10-01 DOI: 10.1016/j.fsidi.2024.301805

Lena L. Voigt , Felix Freiling , Christopher J. Hargreaves

Due to legal and privacy-related restrictions, the generation of synthetic data is recommended for creating datasets for digital forensic education and training. One challenge when synthesizing scenario-based forensic data is the creation of coherent background activity besides evidential actions. This work leverages the creative writing abilities of large language models (LLMs) to generate personas and actions that describe the background usage of a device consistent with the created persona. These actions are subsequently converted into a machine-readable format and executed on a virtualized device using VM control automation. We introduce Re-imagen, a framework that combines state-of-the-art LLMs and a recent unintrusive GUI automation tool to produce synthetic disk images that contain arguably coherent “wear-and-tear” artifacts that current synthesis platforms lack. While, for now, the focus is on the coherence of the generated background activity, we believe that the proposed approach is a step toward more realistic synthetic disk image generation.

由于法律和隐私方面的限制，建议生成合成数据，以便为数字取证教育和培训创建数据集。合成基于场景的法证数据时面临的一个挑战是，除了证据行动外，如何创建连贯的背景活动。这项工作利用大型语言模型（LLM）的创造性写作能力，生成角色和动作，描述与所创建角色一致的设备背景使用情况。这些操作随后被转换成机器可读的格式，并通过虚拟机控制自动化在虚拟化设备上执行。我们介绍的 Re-imagen 是一个框架，它结合了最先进的 LLM 和最新的非侵入式图形用户界面自动化工具，可生成合成磁盘映像，其中包含当前合成平台所缺乏的连贯的 "磨损 "人工痕迹。虽然目前的重点是生成的背景活动的连贯性，但我们相信所提出的方法是向生成更逼真的合成磁盘图像迈出的一步。

{"title":"Re-imagen: Generating coherent background activity in synthetic scenario-based forensic datasets using large language models","authors":"Lena L. Voigt , Felix Freiling , Christopher J. Hargreaves","doi":"10.1016/j.fsidi.2024.301805","DOIUrl":"10.1016/j.fsidi.2024.301805","url":null,"abstract":"<div><div>Due to legal and privacy-related restrictions, the generation of <em>synthetic</em> data is recommended for creating datasets for digital forensic education and training. One challenge when synthesizing scenario-based forensic data is the creation of coherent background activity besides evidential actions. This work leverages the creative writing abilities of large language models (LLMs) to generate personas and actions that describe the background usage of a device consistent with the created persona. These actions are subsequently converted into a machine-readable format and executed on a virtualized device using VM control automation. We introduce Re-imagen, a framework that combines state-of-the-art LLMs and a recent unintrusive GUI automation tool to produce synthetic disk images that contain arguably coherent “wear-and-tear” artifacts that current synthesis platforms lack. While, for now, the focus is on the coherence of the generated background activity, we believe that the proposed approach is a step toward more <em>realistic</em> synthetic disk image generation.</div></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":"50 ","pages":"Article 301805"},"PeriodicalIF":2.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142530435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0