使用基于机器学习的方法加强法证分析

2023 6th International Conference on Advanced Communication Technologies and Networking (CommNet) Pub Date : 2023-12-11 DOI:10.1109/CommNet60167.2023.10365260

Samira Benkerroum, Khalid Chougdali

{"title":"使用基于机器学习的方法加强法证分析","authors":"Samira Benkerroum, Khalid Chougdali","doi":"10.1109/CommNet60167.2023.10365260","DOIUrl":null,"url":null,"abstract":"In recent years, computers or digital devices contribute to the global spread of cyber threats and cyber crimes. These cyberattacks leave some artefacts on the storage of the target device, for this reason they require special treatment, and which will have to be the subject of various investigations in order to study its behavior and analyze and prevent it so that this never happen again.Despite the continued development of digital forensic investigations for the recovery of evidence whether volatile or non-volatile, manual investigations are both time-intensive and laborious. The proposed solution is to use a method to automate manual forensic investigation tasks (forensic analysis) to reduce human effort and improve time efficiency.This paper presents a summary of the digital forensic investigation process, we discuss existing ML solutions to automate the analysis process.Finally, the paper proposes an approach based on machine learning where the binary classification was performed using the algorithms K-Nearest Neighbors, Naive Bayes, Random Forest, Support Vector Machine, Decision Tree, Logistic Regression, Gradient Boosted Tree, Multi-Layer Perceptron, using CIC-MalMem-2022 dataset to identify malware.The algorithms’ respective performances were contrasted. The performance metrics Precision, F1-score, Accuracy, Recall, and Area Under the Curve were used to assess the outcomes. Consequently, the Random Forest and Gradient Boosted Tree algorithms demonstrated superior performance, achieving a remarkable accuracy level of 99.98% in the detection of malware through memory scans. The Logistic Regression algorithm exhibited the least favorable performance in analyzing malware using memory data, achieving an accuracy rate of 95.75%. According to the results obtained, many algorithms used have obtained very satisfactory results.","PeriodicalId":505542,"journal":{"name":"2023 6th International Conference on Advanced Communication Technologies and Networking (CommNet)","volume":"19 4","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2023-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhancing Forensic Analysis Using a Machine Learning-based Approach\",\"authors\":\"Samira Benkerroum, Khalid Chougdali\",\"doi\":\"10.1109/CommNet60167.2023.10365260\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, computers or digital devices contribute to the global spread of cyber threats and cyber crimes. These cyberattacks leave some artefacts on the storage of the target device, for this reason they require special treatment, and which will have to be the subject of various investigations in order to study its behavior and analyze and prevent it so that this never happen again.Despite the continued development of digital forensic investigations for the recovery of evidence whether volatile or non-volatile, manual investigations are both time-intensive and laborious. The proposed solution is to use a method to automate manual forensic investigation tasks (forensic analysis) to reduce human effort and improve time efficiency.This paper presents a summary of the digital forensic investigation process, we discuss existing ML solutions to automate the analysis process.Finally, the paper proposes an approach based on machine learning where the binary classification was performed using the algorithms K-Nearest Neighbors, Naive Bayes, Random Forest, Support Vector Machine, Decision Tree, Logistic Regression, Gradient Boosted Tree, Multi-Layer Perceptron, using CIC-MalMem-2022 dataset to identify malware.The algorithms’ respective performances were contrasted. The performance metrics Precision, F1-score, Accuracy, Recall, and Area Under the Curve were used to assess the outcomes. Consequently, the Random Forest and Gradient Boosted Tree algorithms demonstrated superior performance, achieving a remarkable accuracy level of 99.98% in the detection of malware through memory scans. The Logistic Regression algorithm exhibited the least favorable performance in analyzing malware using memory data, achieving an accuracy rate of 95.75%. According to the results obtained, many algorithms used have obtained very satisfactory results.\",\"PeriodicalId\":505542,\"journal\":{\"name\":\"2023 6th International Conference on Advanced Communication Technologies and Networking (CommNet)\",\"volume\":\"19 4\",\"pages\":\"1-6\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-12-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 6th International Conference on Advanced Communication Technologies and Networking (CommNet)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CommNet60167.2023.10365260\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 6th International Conference on Advanced Communication Technologies and Networking (CommNet)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CommNet60167.2023.10365260","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

近年来，计算机或数字设备助长了网络威胁和网络犯罪在全球的蔓延。这些网络攻击会在目标设备的存储设备上留下一些人工痕迹，因此需要对其进行特殊处理，并对其进行各种调查，以研究其行为，分析并预防其发生，从而避免此类事件再次发生。尽管数字取证调查在恢复易失性或非易失性证据方面不断发展，但人工调查既费时又费力。本文概述了数字取证调查过程，讨论了现有的 ML 解决方案，以实现分析过程的自动化。最后，本文提出了一种基于机器学习的方法，即使用 K-Nearest Neighbors、Naive Bayes、Random Forest、Support Vector Machine、Decision Tree、Logistic Regression、Gradient Boosted Tree、Multi-Layer Perceptron 等算法进行二元分类，使用 CIC-MalMem-2022 数据集来识别恶意软件。使用精度、F1 分数、准确率、召回率和曲线下面积等性能指标来评估结果。结果显示，随机森林算法和梯度提升树算法表现优异，在通过内存扫描检测恶意软件方面达到了 99.98% 的出色准确率水平。逻辑回归算法在利用内存数据分析恶意软件方面表现最差，准确率仅为 95.75%。从获得的结果来看，所使用的许多算法都取得了非常令人满意的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Enhancing Forensic Analysis Using a Machine Learning-based Approach

In recent years, computers or digital devices contribute to the global spread of cyber threats and cyber crimes. These cyberattacks leave some artefacts on the storage of the target device, for this reason they require special treatment, and which will have to be the subject of various investigations in order to study its behavior and analyze and prevent it so that this never happen again.Despite the continued development of digital forensic investigations for the recovery of evidence whether volatile or non-volatile, manual investigations are both time-intensive and laborious. The proposed solution is to use a method to automate manual forensic investigation tasks (forensic analysis) to reduce human effort and improve time efficiency.This paper presents a summary of the digital forensic investigation process, we discuss existing ML solutions to automate the analysis process.Finally, the paper proposes an approach based on machine learning where the binary classification was performed using the algorithms K-Nearest Neighbors, Naive Bayes, Random Forest, Support Vector Machine, Decision Tree, Logistic Regression, Gradient Boosted Tree, Multi-Layer Perceptron, using CIC-MalMem-2022 dataset to identify malware.The algorithms’ respective performances were contrasted. The performance metrics Precision, F1-score, Accuracy, Recall, and Area Under the Curve were used to assess the outcomes. Consequently, the Random Forest and Gradient Boosted Tree algorithms demonstrated superior performance, achieving a remarkable accuracy level of 99.98% in the detection of malware through memory scans. The Logistic Regression algorithm exhibited the least favorable performance in analyzing malware using memory data, achieving an accuracy rate of 95.75%. According to the results obtained, many algorithms used have obtained very satisfactory results.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 6th International Conference on Advanced Communication Technologies and Networking (CommNet)

自引率

0.00%

发文量