Enhanced detection of obfuscated malware in memory dumps: a machine learning approach for advanced cybersecurity

IF 3.7 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Cybersecurity Pub Date : 2024-01-25 DOI:10.1186/s42400-024-00205-z

Md. Alamgir Hossain, Md. Saiful Islam

{"title":"Enhanced detection of obfuscated malware in memory dumps: a machine learning approach for advanced cybersecurity","authors":"Md. Alamgir Hossain, Md. Saiful Islam","doi":"10.1186/s42400-024-00205-z","DOIUrl":null,"url":null,"abstract":"<p>In the realm of cybersecurity, the detection and analysis of obfuscated malware remain a critical challenge, especially in the context of memory dumps. This research paper presents a novel machine learning-based framework designed to enhance the detection and analytical capabilities against such elusive threats for binary and multi type’s malware. Our approach leverages a comprehensive dataset comprising benign and malicious memory dumps, encompassing a wide array of obfuscated malware types including Spyware, Ransomware, and Trojan Horses with their sub-categories. We begin by employing rigorous data preprocessing methods, including the normalization of memory dumps and encoding of categorical data. To tackle the issue of class imbalance, a Synthetic Minority Over-sampling Technique is utilized, ensuring a balanced representation of various malware types. Feature selection is meticulously conducted through Chi-Square tests, mutual information, and correlation analyses, refining the model’s focus on the most indicative attributes of obfuscated malware. The heart of our framework lies in the deployment of an Ensemble-based Classifier, chosen for its robustness and effectiveness in handling complex data structures. The model’s performance is rigorously evaluated using a suite of metrics, including accuracy, precision, recall, F1-score, and the area under the ROC curve (AUC) with other evaluation metrics to assess the model’s efficiency. The proposed model demonstrates a detection accuracy exceeding 99% across all cases, surpassing the performance of all existing models in the realm of malware detection.</p>","PeriodicalId":36402,"journal":{"name":"Cybersecurity","volume":"16 1","pages":""},"PeriodicalIF":3.7000,"publicationDate":"2024-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cybersecurity","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1186/s42400-024-00205-z","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

In the realm of cybersecurity, the detection and analysis of obfuscated malware remain a critical challenge, especially in the context of memory dumps. This research paper presents a novel machine learning-based framework designed to enhance the detection and analytical capabilities against such elusive threats for binary and multi type’s malware. Our approach leverages a comprehensive dataset comprising benign and malicious memory dumps, encompassing a wide array of obfuscated malware types including Spyware, Ransomware, and Trojan Horses with their sub-categories. We begin by employing rigorous data preprocessing methods, including the normalization of memory dumps and encoding of categorical data. To tackle the issue of class imbalance, a Synthetic Minority Over-sampling Technique is utilized, ensuring a balanced representation of various malware types. Feature selection is meticulously conducted through Chi-Square tests, mutual information, and correlation analyses, refining the model’s focus on the most indicative attributes of obfuscated malware. The heart of our framework lies in the deployment of an Ensemble-based Classifier, chosen for its robustness and effectiveness in handling complex data structures. The model’s performance is rigorously evaluated using a suite of metrics, including accuracy, precision, recall, F1-score, and the area under the ROC curve (AUC) with other evaluation metrics to assess the model’s efficiency. The proposed model demonstrates a detection accuracy exceeding 99% across all cases, surpassing the performance of all existing models in the realm of malware detection.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

增强对内存转储中混淆恶意软件的检测：先进网络安全的机器学习方法

在网络安全领域，混淆恶意软件的检测和分析仍然是一项严峻的挑战，尤其是在内存转储的情况下。本研究论文介绍了一种新颖的基于机器学习的框架，旨在增强对二进制和多类型恶意软件的检测和分析能力，以应对此类难以捉摸的威胁。我们的方法利用了由良性和恶意内存转储组成的综合数据集，其中包括间谍软件、勒索软件和木马及其子类等多种混淆恶意软件类型。我们首先采用了严格的数据预处理方法，包括内存转储的规范化和分类数据的编码。为了解决类不平衡的问题，我们采用了合成少数群体过度采样技术，以确保各种恶意软件类型的均衡代表性。通过秩方检验、互信息和相关性分析，对特征进行了细致的选择，从而使模型更加专注于混淆恶意软件最具指示性的属性。我们框架的核心在于部署一个基于集合的分类器，该分类器因其在处理复杂数据结构时的鲁棒性和有效性而被选中。我们使用一系列指标对模型的性能进行了严格评估，包括准确度、精确度、召回率、F1 分数、ROC 曲线下面积（AUC）以及其他评估指标，以评估模型的效率。所提出的模型在所有情况下的检测准确率都超过了 99%，超越了恶意软件检测领域所有现有模型的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊