Malware Detection Method Based on File and Registry Operations Using Machine Learning

Ömer Aslan, Erdal Akin
{"title":"Malware Detection Method Based on File and Registry Operations Using Machine Learning","authors":"Ömer Aslan, Erdal Akin","doi":"10.35377/saucis...1049798","DOIUrl":null,"url":null,"abstract":"Malware (Malicious Software) is any software which performs malicious activities on computer-based systems without the user's consent. The number, severity, and complexity of malware have been increasing recently. The detection of malware becomes challenging because new malware variants are using obfuscation techniques to hide themselves from the malware detection systems. In this paper, a new behavioral-based malware detection method is proposed based on file-registry operations. When malware features are generated, only the operations which are performed on specific file and registry locations are considered. The file-registry operations divided into five groups: autostart file locations, temporary file locations, specific system file locations, autostart registry locations, and DLLs related registry locations. Based on the file-registry operations and where they performed, the malware features are generated. These features are seen in malware samples with high frequencies, while rarely seen in benign samples. The proposed method is tested on malware and benign samples in a virtual environment, and a dataset is created. Well-known machine learning algorithms including C4.5 (J48), RF (Random Forest), SLR (Simple Logistic Regression), AdaBoost (Adaptive Boosting), SMO (Sequential Minimal Optimization), and KNN (K-Nearest Neighbors) are used for classification. In the best case, we obtained 98.8% true positive rate, 0% false positive rate, 100% precision and 99.05% accuracy which is quite high when compared with leading methods in the literature.","PeriodicalId":257636,"journal":{"name":"Sakarya University Journal of Computer and Information Sciences","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sakarya University Journal of Computer and Information Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.35377/saucis...1049798","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Malware (Malicious Software) is any software which performs malicious activities on computer-based systems without the user's consent. The number, severity, and complexity of malware have been increasing recently. The detection of malware becomes challenging because new malware variants are using obfuscation techniques to hide themselves from the malware detection systems. In this paper, a new behavioral-based malware detection method is proposed based on file-registry operations. When malware features are generated, only the operations which are performed on specific file and registry locations are considered. The file-registry operations divided into five groups: autostart file locations, temporary file locations, specific system file locations, autostart registry locations, and DLLs related registry locations. Based on the file-registry operations and where they performed, the malware features are generated. These features are seen in malware samples with high frequencies, while rarely seen in benign samples. The proposed method is tested on malware and benign samples in a virtual environment, and a dataset is created. Well-known machine learning algorithms including C4.5 (J48), RF (Random Forest), SLR (Simple Logistic Regression), AdaBoost (Adaptive Boosting), SMO (Sequential Minimal Optimization), and KNN (K-Nearest Neighbors) are used for classification. In the best case, we obtained 98.8% true positive rate, 0% false positive rate, 100% precision and 99.05% accuracy which is quite high when compared with leading methods in the literature.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于机器学习的文件和注册表操作的恶意软件检测方法
恶意软件(恶意软件)是指未经用户同意在计算机系统上执行恶意活动的任何软件。最近,恶意软件的数量、严重性和复杂性都在不断增加。恶意软件的检测变得具有挑战性,因为新的恶意软件变体正在使用混淆技术来隐藏自己,以躲避恶意软件检测系统。本文提出了一种基于文件注册表操作的基于行为的恶意软件检测方法。当生成恶意软件特性时,只考虑对特定文件和注册表位置执行的操作。文件注册表操作分为五组:自动启动文件位置、临时文件位置、特定系统文件位置、自动启动注册表位置和dll相关注册表位置。基于文件注册表操作及其执行位置,生成恶意软件特性。这些特征在恶意软件样本中出现频率很高,而在良性样本中很少出现。在虚拟环境中对恶意软件和良性样本进行了测试,并建立了数据集。著名的机器学习算法包括C4.5 (J48)、RF(随机森林)、SLR(简单逻辑回归)、AdaBoost(自适应增强)、SMO(顺序最小优化)和KNN (k -近邻)用于分类。在最佳情况下,我们获得了98.8%的真阳性率,0%的假阳性率,100%的精密度和99.05%的准确度,与文献中领先的方法相比,这是相当高的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Prediction of Cardiovascular Disease Based on Voting Ensemble Model and SHAP Analysis A NOVEL ADDITIVE INTERNET OF THINGS (IoT) FEATURES AND CONVOLUTIONAL NEURAL NETWORK FOR CLASSIFICATION AND SOURCE IDENTIFICATION OF IoT DEVICES High-Capacity Multiplier Design Using Look Up Table Sequential and Correlated Image Hash Code Generation with Deep Reinforcement Learning Price Prediction Using Web Scraping and Machine Learning Algorithms in the Used Car Market
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1