基于简单机器学习算法的恶意软件检测特征选择探索性分析

IF 0.6 Q4 COMPUTER SCIENCE, INFORMATION SYSTEMS Journal of Communications Software and Systems Pub Date : 2023-01-01 DOI:10.24138/jcomss-2023-0091
Md Ashikur Rahman, Syful Islam, Yusuf Sulistyo Nugroho, Fatah Yasin Al Irsyadi, Md Javed Hossain
{"title":"基于简单机器学习算法的恶意软件检测特征选择探索性分析","authors":"Md Ashikur Rahman, Syful Islam, Yusuf Sulistyo Nugroho, Fatah Yasin Al Irsyadi, Md Javed Hossain","doi":"10.24138/jcomss-2023-0091","DOIUrl":null,"url":null,"abstract":"Computers have become increasingly vulnerable to malicious attacks with an increase in popularity and the proliferation of open system architectures. There are numerous malware detection technologies available to protect the computer operating system from such attacks. This type of malware detector targets programs based on patterns detected in the properties of computer applications. As the amount of analytical data increases, the computer defense system is adversely affected. The performance of the detection mechanism has been hindered due to the presence of numerous irrelevant characteristics. The goal of this study is to provide a feature selection approach that will help malware detection systems be more accurate by detecting pertinent and significant traits. Furthermore, by selecting the most important features, it is possible to maintain an acceptable level of accuracy in the detection of malware while significantly lowering the computational cost. The proposed method displays the most important features (MIFs) obtained from each machine learning method, including data cleaning and feature selection. Furthermore, the method applies six machine learning classification techniques to the selected feature set. Several classifiers were evaluated based on several characteristics for malware detection, including Support Vector Machines (SVM), Logistic Regression (LR), K-nearest neighbor (K-NN), Decision Tree (DT), Naive Bayes (NB), and Random Forest (RF). Our suggested model was tested on two malware datasets to determine its effectiveness. In terms of accuracy, precision, F1 scores, and recall, the experimental findings show that RF and DT classifiers beat other techniques.","PeriodicalId":38910,"journal":{"name":"Journal of Communications Software and Systems","volume":"20 1","pages":"0"},"PeriodicalIF":0.6000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Exploratory Analysis of Feature Selection for Malware Detection with Simple Machine Learning Algorithms\",\"authors\":\"Md Ashikur Rahman, Syful Islam, Yusuf Sulistyo Nugroho, Fatah Yasin Al Irsyadi, Md Javed Hossain\",\"doi\":\"10.24138/jcomss-2023-0091\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Computers have become increasingly vulnerable to malicious attacks with an increase in popularity and the proliferation of open system architectures. There are numerous malware detection technologies available to protect the computer operating system from such attacks. This type of malware detector targets programs based on patterns detected in the properties of computer applications. As the amount of analytical data increases, the computer defense system is adversely affected. The performance of the detection mechanism has been hindered due to the presence of numerous irrelevant characteristics. The goal of this study is to provide a feature selection approach that will help malware detection systems be more accurate by detecting pertinent and significant traits. Furthermore, by selecting the most important features, it is possible to maintain an acceptable level of accuracy in the detection of malware while significantly lowering the computational cost. The proposed method displays the most important features (MIFs) obtained from each machine learning method, including data cleaning and feature selection. Furthermore, the method applies six machine learning classification techniques to the selected feature set. Several classifiers were evaluated based on several characteristics for malware detection, including Support Vector Machines (SVM), Logistic Regression (LR), K-nearest neighbor (K-NN), Decision Tree (DT), Naive Bayes (NB), and Random Forest (RF). Our suggested model was tested on two malware datasets to determine its effectiveness. In terms of accuracy, precision, F1 scores, and recall, the experimental findings show that RF and DT classifiers beat other techniques.\",\"PeriodicalId\":38910,\"journal\":{\"name\":\"Journal of Communications Software and Systems\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.6000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Communications Software and Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.24138/jcomss-2023-0091\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Communications Software and Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.24138/jcomss-2023-0091","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

随着开放系统架构的普及和扩散,计算机越来越容易受到恶意攻击。有许多恶意软件检测技术可用于保护计算机操作系统免受此类攻击。这种类型的恶意软件检测器基于在计算机应用程序的属性中检测到的模式来针对程序。随着分析数据量的增加,计算机防御系统受到不利影响。由于存在许多不相关的特征,检测机制的性能受到阻碍。本研究的目的是提供一种特征选择方法,通过检测相关和重要的特征,帮助恶意软件检测系统更加准确。此外,通过选择最重要的特征,可以在检测恶意软件时保持可接受的准确性水平,同时显着降低计算成本。该方法显示了从每种机器学习方法中获得的最重要特征(MIFs),包括数据清洗和特征选择。此外,该方法将六种机器学习分类技术应用于选定的特征集。基于恶意软件检测的几个特征评估了几种分类器,包括支持向量机(SVM)、逻辑回归(LR)、k近邻(K-NN)、决策树(DT)、朴素贝叶斯(NB)和随机森林(RF)。我们提出的模型在两个恶意软件数据集上进行了测试,以确定其有效性。在准确性、精密度、F1分数和召回率方面,实验结果表明RF和DT分类器优于其他技术。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
An Exploratory Analysis of Feature Selection for Malware Detection with Simple Machine Learning Algorithms
Computers have become increasingly vulnerable to malicious attacks with an increase in popularity and the proliferation of open system architectures. There are numerous malware detection technologies available to protect the computer operating system from such attacks. This type of malware detector targets programs based on patterns detected in the properties of computer applications. As the amount of analytical data increases, the computer defense system is adversely affected. The performance of the detection mechanism has been hindered due to the presence of numerous irrelevant characteristics. The goal of this study is to provide a feature selection approach that will help malware detection systems be more accurate by detecting pertinent and significant traits. Furthermore, by selecting the most important features, it is possible to maintain an acceptable level of accuracy in the detection of malware while significantly lowering the computational cost. The proposed method displays the most important features (MIFs) obtained from each machine learning method, including data cleaning and feature selection. Furthermore, the method applies six machine learning classification techniques to the selected feature set. Several classifiers were evaluated based on several characteristics for malware detection, including Support Vector Machines (SVM), Logistic Regression (LR), K-nearest neighbor (K-NN), Decision Tree (DT), Naive Bayes (NB), and Random Forest (RF). Our suggested model was tested on two malware datasets to determine its effectiveness. In terms of accuracy, precision, F1 scores, and recall, the experimental findings show that RF and DT classifiers beat other techniques.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Communications Software and Systems
Journal of Communications Software and Systems Engineering-Electrical and Electronic Engineering
CiteScore
2.00
自引率
14.30%
发文量
28
审稿时长
8 weeks
期刊最新文献
Assessment of Transmitted Power Density in the Planar Multilayer Tissue Model due to Radiation from Dipole Antenna Signature-based Tree for Finding Frequent Itemsets Friendy: A Deep Learning based Framework for Assisting in Young Autistic Children Psychotherapy Interventions Ensemble of Local Texture Descriptor for Accurate Breast Cancer Detection from Histopathologic Images Comparison of Similarity Measures for Trajectory Clustering - Aviation Use Case
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1