基于操作码频率的机器学习恶意软件检测

Pavitra Mohandas, Sudesh Kumar Santhosh Kumar, Sandeep Pai Kulyadi, M. J. Shankar Raman, Vasan V S, B. Venkataswami
{"title":"基于操作码频率的机器学习恶意软件检测","authors":"Pavitra Mohandas, Sudesh Kumar Santhosh Kumar, Sandeep Pai Kulyadi, M. J. Shankar Raman, Vasan V S, B. Venkataswami","doi":"10.1109/IAICT52856.2021.9532521","DOIUrl":null,"url":null,"abstract":"One of the many methods for identifying malware is to disassemble the malware files and obtain the opcodes from them. Since malware have predominantly been found to contain specific opcode sequences in them, the presence of the same sequences in any incoming file or network content can be taken up as a possible malware identification scheme. Malware detection systems help us to understand more about ways on how malware attack a system and how it can be prevented. The proposed method analyses malware executable files with the help of opcode information by converting the incoming executable files to assembly language thereby extracting opcode information (opcode count) from the same. The opcode count is then converted into opcode frequency which is stored in a CSV file format. The CSV file is passed to various machine learning algorithms like Decision Tree Classifier, Random Forest Classifier and Naive Bayes Classifier. Random Forest Classifier produced the highest accuracy and hence the same model was used to predict whether an incoming file contains a potential malware or not.","PeriodicalId":416542,"journal":{"name":"2021 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Detection of Malware using Machine Learning based on Operation Code Frequency\",\"authors\":\"Pavitra Mohandas, Sudesh Kumar Santhosh Kumar, Sandeep Pai Kulyadi, M. J. Shankar Raman, Vasan V S, B. Venkataswami\",\"doi\":\"10.1109/IAICT52856.2021.9532521\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"One of the many methods for identifying malware is to disassemble the malware files and obtain the opcodes from them. Since malware have predominantly been found to contain specific opcode sequences in them, the presence of the same sequences in any incoming file or network content can be taken up as a possible malware identification scheme. Malware detection systems help us to understand more about ways on how malware attack a system and how it can be prevented. The proposed method analyses malware executable files with the help of opcode information by converting the incoming executable files to assembly language thereby extracting opcode information (opcode count) from the same. The opcode count is then converted into opcode frequency which is stored in a CSV file format. The CSV file is passed to various machine learning algorithms like Decision Tree Classifier, Random Forest Classifier and Naive Bayes Classifier. Random Forest Classifier produced the highest accuracy and hence the same model was used to predict whether an incoming file contains a potential malware or not.\",\"PeriodicalId\":416542,\"journal\":{\"name\":\"2021 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT)\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-07-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IAICT52856.2021.9532521\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IAICT52856.2021.9532521","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

识别恶意软件的众多方法之一是反汇编恶意软件文件并从中获取操作码。由于恶意软件主要被发现包含特定的操作码序列,因此在任何传入文件或网络内容中存在相同的序列都可以被视为可能的恶意软件识别方案。恶意软件检测系统帮助我们更多地了解恶意软件如何攻击系统以及如何阻止它。该方法通过将传入的可执行文件转换为汇编语言,从而从中提取操作码信息(操作码计数),从而利用操作码信息分析恶意软件可执行文件。然后将操作码计数转换为操作码频率,操作码频率以CSV文件格式存储。CSV文件被传递给各种机器学习算法,如决策树分类器,随机森林分类器和朴素贝叶斯分类器。随机森林分类器产生了最高的准确性,因此使用相同的模型来预测传入的文件是否包含潜在的恶意软件。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Detection of Malware using Machine Learning based on Operation Code Frequency
One of the many methods for identifying malware is to disassemble the malware files and obtain the opcodes from them. Since malware have predominantly been found to contain specific opcode sequences in them, the presence of the same sequences in any incoming file or network content can be taken up as a possible malware identification scheme. Malware detection systems help us to understand more about ways on how malware attack a system and how it can be prevented. The proposed method analyses malware executable files with the help of opcode information by converting the incoming executable files to assembly language thereby extracting opcode information (opcode count) from the same. The opcode count is then converted into opcode frequency which is stored in a CSV file format. The CSV file is passed to various machine learning algorithms like Decision Tree Classifier, Random Forest Classifier and Naive Bayes Classifier. Random Forest Classifier produced the highest accuracy and hence the same model was used to predict whether an incoming file contains a potential malware or not.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Wi-Fi CSI Based Human Sign Language Recognition using LSTM Network Effect of Antenna Power Roll-Off on Performance and Coverage of 4G Cellular Network from High Altitude Platforms Virtual Reality Experience in Tourism: A Factor Analysis Assessment Design of Integrated Control System Based On IoT With Context Aware Method In Hydroponic Plants Stability Control for Bipedal Robot in Standing and Walking using Fuzzy Logic Controller
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1