{"title":"Enhancing Forensic Analysis Using a Machine Learning-based Approach","authors":"Samira Benkerroum, Khalid Chougdali","doi":"10.1109/CommNet60167.2023.10365260","DOIUrl":null,"url":null,"abstract":"In recent years, computers or digital devices contribute to the global spread of cyber threats and cyber crimes. These cyberattacks leave some artefacts on the storage of the target device, for this reason they require special treatment, and which will have to be the subject of various investigations in order to study its behavior and analyze and prevent it so that this never happen again.Despite the continued development of digital forensic investigations for the recovery of evidence whether volatile or non-volatile, manual investigations are both time-intensive and laborious. The proposed solution is to use a method to automate manual forensic investigation tasks (forensic analysis) to reduce human effort and improve time efficiency.This paper presents a summary of the digital forensic investigation process, we discuss existing ML solutions to automate the analysis process.Finally, the paper proposes an approach based on machine learning where the binary classification was performed using the algorithms K-Nearest Neighbors, Naive Bayes, Random Forest, Support Vector Machine, Decision Tree, Logistic Regression, Gradient Boosted Tree, Multi-Layer Perceptron, using CIC-MalMem-2022 dataset to identify malware.The algorithms’ respective performances were contrasted. The performance metrics Precision, F1-score, Accuracy, Recall, and Area Under the Curve were used to assess the outcomes. Consequently, the Random Forest and Gradient Boosted Tree algorithms demonstrated superior performance, achieving a remarkable accuracy level of 99.98% in the detection of malware through memory scans. The Logistic Regression algorithm exhibited the least favorable performance in analyzing malware using memory data, achieving an accuracy rate of 95.75%. According to the results obtained, many algorithms used have obtained very satisfactory results.","PeriodicalId":505542,"journal":{"name":"2023 6th International Conference on Advanced Communication Technologies and Networking (CommNet)","volume":"19 4","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2023-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 6th International Conference on Advanced Communication Technologies and Networking (CommNet)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CommNet60167.2023.10365260","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In recent years, computers or digital devices contribute to the global spread of cyber threats and cyber crimes. These cyberattacks leave some artefacts on the storage of the target device, for this reason they require special treatment, and which will have to be the subject of various investigations in order to study its behavior and analyze and prevent it so that this never happen again.Despite the continued development of digital forensic investigations for the recovery of evidence whether volatile or non-volatile, manual investigations are both time-intensive and laborious. The proposed solution is to use a method to automate manual forensic investigation tasks (forensic analysis) to reduce human effort and improve time efficiency.This paper presents a summary of the digital forensic investigation process, we discuss existing ML solutions to automate the analysis process.Finally, the paper proposes an approach based on machine learning where the binary classification was performed using the algorithms K-Nearest Neighbors, Naive Bayes, Random Forest, Support Vector Machine, Decision Tree, Logistic Regression, Gradient Boosted Tree, Multi-Layer Perceptron, using CIC-MalMem-2022 dataset to identify malware.The algorithms’ respective performances were contrasted. The performance metrics Precision, F1-score, Accuracy, Recall, and Area Under the Curve were used to assess the outcomes. Consequently, the Random Forest and Gradient Boosted Tree algorithms demonstrated superior performance, achieving a remarkable accuracy level of 99.98% in the detection of malware through memory scans. The Logistic Regression algorithm exhibited the least favorable performance in analyzing malware using memory data, achieving an accuracy rate of 95.75%. According to the results obtained, many algorithms used have obtained very satisfactory results.