MalFe—Malware Feature Engineering Generation Platform

IF 2.6 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Computers Pub Date : 2023-10-08 DOI:10.3390/computers12100201

Avinash Singh, Richard Adeyemi Ikuesan, Hein Venter

{"title":"MalFe—Malware Feature Engineering Generation Platform","authors":"Avinash Singh, Richard Adeyemi Ikuesan, Hein Venter","doi":"10.3390/computers12100201","DOIUrl":null,"url":null,"abstract":"The growing sophistication of malware has resulted in diverse challenges, especially among security researchers who are expected to develop mechanisms to thwart these malicious attacks. While security researchers have turned to machine learning to combat this surge in malware attacks and enhance detection and prevention methods, they often encounter limitations when it comes to sourcing malware binaries. This limitation places the burden on malware researchers to create context-specific datasets and detection mechanisms, a time-consuming and intricate process that involves a series of experiments. The lack of accessible analysis reports and a centralized platform for sharing and verifying findings has resulted in many research outputs that can neither be replicated nor validated. To address this critical gap, a malware analysis data curation platform was developed. This platform offers malware researchers a highly customizable feature generation process drawing from analysis data reports, particularly those generated in sandbox-based environments such as Cuckoo Sandbox. To evaluate the effectiveness of the platform, a replication of existing studies was conducted in the form of case studies. These studies revealed that the developed platform offers an effective approach that can aid malware detection research. Moreover, a real-world scenario involving over 3000 ransomware and benign samples for ransomware detection based on PE entropy was explored. This yielded an impressive accuracy score of 98.8% and an AUC of 0.97 when employing the decision tree algorithm, with a low latency of 1.51 ms. These results emphasize the necessity of the proposed platform while demonstrating its capacity to construct a comprehensive detection mechanism. By fostering community-driven interactive databanks, this platform enables the creation of datasets as well as the sharing of reports, both of which can substantially reduce experimentation time and enhance research repeatability.","PeriodicalId":46292,"journal":{"name":"Computers","volume":"62 1","pages":"0"},"PeriodicalIF":2.6000,"publicationDate":"2023-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/computers12100201","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

The growing sophistication of malware has resulted in diverse challenges, especially among security researchers who are expected to develop mechanisms to thwart these malicious attacks. While security researchers have turned to machine learning to combat this surge in malware attacks and enhance detection and prevention methods, they often encounter limitations when it comes to sourcing malware binaries. This limitation places the burden on malware researchers to create context-specific datasets and detection mechanisms, a time-consuming and intricate process that involves a series of experiments. The lack of accessible analysis reports and a centralized platform for sharing and verifying findings has resulted in many research outputs that can neither be replicated nor validated. To address this critical gap, a malware analysis data curation platform was developed. This platform offers malware researchers a highly customizable feature generation process drawing from analysis data reports, particularly those generated in sandbox-based environments such as Cuckoo Sandbox. To evaluate the effectiveness of the platform, a replication of existing studies was conducted in the form of case studies. These studies revealed that the developed platform offers an effective approach that can aid malware detection research. Moreover, a real-world scenario involving over 3000 ransomware and benign samples for ransomware detection based on PE entropy was explored. This yielded an impressive accuracy score of 98.8% and an AUC of 0.97 when employing the decision tree algorithm, with a low latency of 1.51 ms. These results emphasize the necessity of the proposed platform while demonstrating its capacity to construct a comprehensive detection mechanism. By fostering community-driven interactive databanks, this platform enables the creation of datasets as well as the sharing of reports, both of which can substantially reduce experimentation time and enhance research repeatability.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

恶意软件特征工程生成平台

越来越复杂的恶意软件带来了各种各样的挑战，特别是对于那些希望开发机制来阻止这些恶意攻击的安全研究人员来说。虽然安全研究人员已经转向机器学习来对抗恶意软件攻击的激增，并增强检测和预防方法，但他们在寻找恶意软件二进制文件时经常遇到限制。这种限制给恶意软件研究人员增加了负担，他们需要创建特定于上下文的数据集和检测机制，这是一个耗时且复杂的过程，涉及一系列实验。由于缺乏可访问的分析报告和共享和验证发现的集中平台，导致许多研究成果既无法复制也无法验证。为了解决这一关键差距，开发了一个恶意软件分析数据管理平台。该平台为恶意软件研究人员提供了一个高度可定制的特征生成过程，从分析数据报告中绘制，特别是那些在基于沙盒的环境中生成的，如杜鹃沙盒。为了评估该平台的有效性，以案例研究的形式对现有研究进行了复制。这些研究表明，开发的平台提供了一种有效的方法，可以帮助恶意软件检测研究。此外，本文还探讨了一个包含3000多个勒索软件和良性样本的真实场景，用于基于PE熵的勒索软件检测。当使用决策树算法时，这产生了令人印象深刻的98.8%的准确率和0.97的AUC，延迟低至1.51 ms。这些结果强调了所提出的平台的必要性，同时展示了其构建综合检测机制的能力。通过培育社区驱动的交互式数据库，该平台可以创建数据集并共享报告，这两者都可以大大减少实验时间并提高研究的可重复性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊