SIFT: Sifting file types—application of explainable artificial intelligence in cyber forensics

IF 3.9 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Cybersecurity Pub Date : 2024-09-11 DOI:10.1186/s42400-024-00241-9

Shahid Alam, Alper Kamil Demir

{"title":"SIFT: Sifting file types—application of explainable artificial intelligence in cyber forensics","authors":"Shahid Alam, Alper Kamil Demir","doi":"10.1186/s42400-024-00241-9","DOIUrl":null,"url":null,"abstract":"Artificial Intelligence (AI) is being applied to improve the efficiency of software systems used in various domains, especially in the health and forensic sciences. Explainable AI (XAI) is one of the fields of AI that interprets and explains the methods used in AI. One of the techniques used in XAI to provide such interpretations is by computing the relevance of the input features to the output of an AI model. File fragment classification is one of the vital issues of file carving in Cyber Forensics (CF) and becomes challenging when the filesystem metadata is missing. Other major challenges it faces are: proliferation of file formats, file embeddings, automation, We leverage and utilize interpretations provided by XAI to optimize the classification of file fragments and propose a novel sifting approach, named SIFT (Sifting File Types). SIFT employs TF-IDF to assign weight to a byte (feature), which is used to select features from a file fragment. Threshold-based LIME and SHAP (the two XAI techniques) feature relevance values are computed for the selected features to optimize file fragment classification. To improve multinomial classification, a Multilayer Perceptron model is developed and optimized with five hidden layers, each layer with \\(i \\times n\\) neurons, where i = the layer number and n = the total number of classes in the dataset. When tested with 47,482 samples of 20 file types (classes), SIFT achieves a detection rate of 82.1% and outperforms the other state-of-the-art techniques by at least 10%. To the best of our knowledge, this is the first effort of applying XAI in CF for optimizing file fragment classification.","PeriodicalId":36402,"journal":{"name":"Cybersecurity","volume":"17 1","pages":""},"PeriodicalIF":3.9000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cybersecurity","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1186/s42400-024-00241-9","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Artificial Intelligence (AI) is being applied to improve the efficiency of software systems used in various domains, especially in the health and forensic sciences. Explainable AI (XAI) is one of the fields of AI that interprets and explains the methods used in AI. One of the techniques used in XAI to provide such interpretations is by computing the relevance of the input features to the output of an AI model. File fragment classification is one of the vital issues of file carving in Cyber Forensics (CF) and becomes challenging when the filesystem metadata is missing. Other major challenges it faces are: proliferation of file formats, file embeddings, automation, We leverage and utilize interpretations provided by XAI to optimize the classification of file fragments and propose a novel sifting approach, named SIFT (Sifting File Types). SIFT employs TF-IDF to assign weight to a byte (feature), which is used to select features from a file fragment. Threshold-based LIME and SHAP (the two XAI techniques) feature relevance values are computed for the selected features to optimize file fragment classification. To improve multinomial classification, a Multilayer Perceptron model is developed and optimized with five hidden layers, each layer with \(i \times n\) neurons, where i = the layer number and n = the total number of classes in the dataset. When tested with 47,482 samples of 20 file types (classes), SIFT achieves a detection rate of 82.1% and outperforms the other state-of-the-art techniques by at least 10%. To the best of our knowledge, this is the first effort of applying XAI in CF for optimizing file fragment classification.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

SIFT：筛选文件类型--可解释人工智能在网络取证中的应用

人工智能（AI）正被用于提高各领域软件系统的效率，尤其是在健康和法医学领域。可解释的人工智能（XAI）是人工智能的一个领域，它对人工智能中使用的方法进行解释和说明。XAI 中用于提供此类解释的技术之一是计算输入特征与人工智能模型输出的相关性。文件片段分类是网络取证（CF）中文件雕刻的重要问题之一，当文件系统元数据缺失时，文件片段分类就变得非常具有挑战性。我们利用 XAI 提供的解释来优化文件片段的分类，并提出了一种名为 SIFT（筛选文件类型）的新型筛选方法。SIFT 采用 TF-IDF 为字节（特征）分配权重，用于从文件片段中选择特征。为所选特征计算基于阈值的 LIME 和 SHAP（两种 XAI 技术）特征相关性值，以优化文件片段分类。为了改进多项式分类，开发并优化了多层感知器模型，该模型有 5 个隐藏层，每层有 \(i \times n\) 个神经元，其中 i = 层数，n = 数据集中类别的总数。在对 20 种文件类型（类）的 47,482 个样本进行测试时，SIFT 的检测率达到了 82.1%，比其他最先进的技术至少高出 10%。据我们所知，这是首次在 CF 中应用 XAI 来优化文件片段分类。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊