Relevant and Non-Redundant Feature Subset Selection Applied to the Detection of Malware in a Network

Franklin Parrales-Bravo, Joel Torres-Urresto, Dayannara Avila-Maldonado, Julio Barzola-Monteses
{"title":"Relevant and Non-Redundant Feature Subset Selection Applied to the Detection of Malware in a Network","authors":"Franklin Parrales-Bravo, Joel Torres-Urresto, Dayannara Avila-Maldonado, Julio Barzola-Monteses","doi":"10.1109/ETCM53643.2021.9590777","DOIUrl":null,"url":null,"abstract":"Removing redundant features is one of the goals addressed by the feature subset selection techniques (FSS). According to some studies, the selection of non-redundant features is not guaranteed when using only a filter or a wrapper FSS approach. Thus, the aim of this research is to present a methodology to train intrusion detection models that considers a combination of filter and wrapper FSS techniques to guarantee the selection of non-redundant attributes in the data pre-processing phase. To test the effectiveness of the proposed technique, the accuracy of the trained models with the features selected by the proposed technique was evaluated on a set of malware detection data. The classifying algorithms selected for training the malware-detection models were: i) Random Forest, ii) C4.5, iii) Adaboost, iv) Gradient boosting. Based on the accuracy metric, the malware detection model that obtained the best results was the one trained with the RandomForest algorithm. This model achieved an average of 99.42% accuracy when using the proposed feature selection technique, improving by 0.10% the accuracy of the model trained with the same algorithm, but without the use of the proposed methodology. Therefore, we can conclude that the models trained with the proposed methodology provide similar results to the models that do not use it, having the advantage of removing all redundant features from the dataset.","PeriodicalId":438567,"journal":{"name":"2021 IEEE Fifth Ecuador Technical Chapters Meeting (ETCM)","volume":"180 9","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE Fifth Ecuador Technical Chapters Meeting (ETCM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ETCM53643.2021.9590777","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Removing redundant features is one of the goals addressed by the feature subset selection techniques (FSS). According to some studies, the selection of non-redundant features is not guaranteed when using only a filter or a wrapper FSS approach. Thus, the aim of this research is to present a methodology to train intrusion detection models that considers a combination of filter and wrapper FSS techniques to guarantee the selection of non-redundant attributes in the data pre-processing phase. To test the effectiveness of the proposed technique, the accuracy of the trained models with the features selected by the proposed technique was evaluated on a set of malware detection data. The classifying algorithms selected for training the malware-detection models were: i) Random Forest, ii) C4.5, iii) Adaboost, iv) Gradient boosting. Based on the accuracy metric, the malware detection model that obtained the best results was the one trained with the RandomForest algorithm. This model achieved an average of 99.42% accuracy when using the proposed feature selection technique, improving by 0.10% the accuracy of the model trained with the same algorithm, but without the use of the proposed methodology. Therefore, we can conclude that the models trained with the proposed methodology provide similar results to the models that do not use it, having the advantage of removing all redundant features from the dataset.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
相关和非冗余特征子集选择在网络恶意软件检测中的应用
去除冗余特征是特征子集选择技术(FSS)的目标之一。根据一些研究,当仅使用过滤器或包装器FSS方法时,不能保证非冗余特征的选择。因此,本研究的目的是提出一种训练入侵检测模型的方法,该方法考虑了过滤器和包装器FSS技术的组合,以保证在数据预处理阶段选择非冗余属性。为了验证所提技术的有效性,在一组恶意软件检测数据上对所提技术选择的特征训练模型的准确性进行了评估。用于训练恶意软件检测模型的分类算法为:i) Random Forest, ii) C4.5, iii) Adaboost, iv) Gradient boosting。基于精度度量,随机森林算法训练出的恶意软件检测模型效果最好。当使用本文提出的特征选择技术时,该模型的平均准确率达到99.42%,比使用相同算法训练的模型的准确率提高了0.10%,但没有使用本文提出的方法。因此,我们可以得出结论,使用所提出的方法训练的模型与不使用它的模型提供相似的结果,具有从数据集中删除所有冗余特征的优势。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Relevant and Non-Redundant Feature Subset Selection Applied to the Detection of Malware in a Network Multi-objective Optimization of Active and Reactive Power to assess Bus Loadability Limit On the Monitoring of the Electromagnetic Fields Accompanying the Seismic and Volcanic Activity of the Chiles Volcano: Preliminary Results Text-based CAPTCHA Vulnerability Assessment using a Deep Learning-based Solver Secure Systems via Reconfigurable Intelligent Surfaces over Correlated Rayleigh Channels
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1