使用机器学习分析基于宏的攻击的办公文档语料库

V Ravi, S.P. Gururaj, H.K. Vedamurthy, M.B. Nirmala
{"title":"使用机器学习分析基于宏的攻击的办公文档语料库","authors":"V Ravi,&nbsp;S.P. Gururaj,&nbsp;H.K. Vedamurthy,&nbsp;M.B. Nirmala","doi":"10.1016/j.gltp.2022.04.004","DOIUrl":null,"url":null,"abstract":"<div><p>Macro-based malware attacks are on the rise in recent cyber-attacks using malicious code written in visual basic code which can be used to target computers to achieve various exploitations. Macro malware can be obfuscated using various tools and easily evade antivirus software. To detect this macro malware, several methods of machine learning techniques have been proposed with an inadequate dataset for both benign and malicious macro codes which are not reproducible and evaluated on unbalanced datasets. In this paper, use of word embedding technique such as Word2Vec embedding is used for code analysis is proposed to analyze and process macro code written in visual basic language to understand and detect the attack vector before opening the documents. The proposed word embedding technique, called <em>Obfuscated-Word2vec</em> is proposed to detect obfuscated keywords, Obfuscated function names from the macro code and classify them as obfuscated or benign function calls which are later used as feature vectors to train models to extract the most relevant features from macro code and even to help the classifiers to detect more accurately as a downloader, dropper malware, shellcode, PowerShell exploits, etc. Experimental results show that proposed method is reproducible and could detect completely new macro malware by analyzing the macro code by the help of Random forest classifier with 82.65 percent accuracy.</p></div>","PeriodicalId":100588,"journal":{"name":"Global Transitions Proceedings","volume":"3 1","pages":"Pages 20-24"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666285X22000401/pdfft?md5=e7b876b452a7172444358a89eb62dde6&pid=1-s2.0-S2666285X22000401-main.pdf","citationCount":"4","resultStr":"{\"title\":\"Analysing corpus of office documents for macro-based attacks using Machine Learning\",\"authors\":\"V Ravi,&nbsp;S.P. Gururaj,&nbsp;H.K. Vedamurthy,&nbsp;M.B. Nirmala\",\"doi\":\"10.1016/j.gltp.2022.04.004\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Macro-based malware attacks are on the rise in recent cyber-attacks using malicious code written in visual basic code which can be used to target computers to achieve various exploitations. Macro malware can be obfuscated using various tools and easily evade antivirus software. To detect this macro malware, several methods of machine learning techniques have been proposed with an inadequate dataset for both benign and malicious macro codes which are not reproducible and evaluated on unbalanced datasets. In this paper, use of word embedding technique such as Word2Vec embedding is used for code analysis is proposed to analyze and process macro code written in visual basic language to understand and detect the attack vector before opening the documents. The proposed word embedding technique, called <em>Obfuscated-Word2vec</em> is proposed to detect obfuscated keywords, Obfuscated function names from the macro code and classify them as obfuscated or benign function calls which are later used as feature vectors to train models to extract the most relevant features from macro code and even to help the classifiers to detect more accurately as a downloader, dropper malware, shellcode, PowerShell exploits, etc. Experimental results show that proposed method is reproducible and could detect completely new macro malware by analyzing the macro code by the help of Random forest classifier with 82.65 percent accuracy.</p></div>\",\"PeriodicalId\":100588,\"journal\":{\"name\":\"Global Transitions Proceedings\",\"volume\":\"3 1\",\"pages\":\"Pages 20-24\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2666285X22000401/pdfft?md5=e7b876b452a7172444358a89eb62dde6&pid=1-s2.0-S2666285X22000401-main.pdf\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Global Transitions Proceedings\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666285X22000401\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Global Transitions Proceedings","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666285X22000401","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

摘要

基于宏的恶意软件攻击在最近的网络攻击中呈上升趋势,这些攻击使用visual basic代码编写的恶意代码可以用来攻击计算机以实现各种利用。宏恶意软件可以使用各种工具混淆,很容易逃避杀毒软件。为了检测这种宏恶意软件,已经提出了几种机器学习技术方法,这些方法具有不充分的数据集,用于良性和恶意宏代码,这些宏代码不可复制并在不平衡数据集上进行评估。本文提出利用Word2Vec嵌入等词嵌入技术进行代码分析,对用visual basic语言编写的宏代码进行分析和处理,在打开文档之前理解和检测攻击向量。提出的词嵌入技术,称为obfusated - word2vec,用于从宏代码中检测被混淆的关键字、被混淆的函数名,并将其分类为被混淆的或良性的函数调用,这些函数调用随后用作特征向量来训练模型,以从宏代码中提取最相关的特征,甚至帮助分类器更准确地检测downloader、droppper恶意软件、shellcode、PowerShell漏洞等。实验结果表明,该方法具有良好的可重复性,可以利用随机森林分类器对宏代码进行分析,检测出全新的宏恶意软件,准确率达到82.65%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Analysing corpus of office documents for macro-based attacks using Machine Learning

Macro-based malware attacks are on the rise in recent cyber-attacks using malicious code written in visual basic code which can be used to target computers to achieve various exploitations. Macro malware can be obfuscated using various tools and easily evade antivirus software. To detect this macro malware, several methods of machine learning techniques have been proposed with an inadequate dataset for both benign and malicious macro codes which are not reproducible and evaluated on unbalanced datasets. In this paper, use of word embedding technique such as Word2Vec embedding is used for code analysis is proposed to analyze and process macro code written in visual basic language to understand and detect the attack vector before opening the documents. The proposed word embedding technique, called Obfuscated-Word2vec is proposed to detect obfuscated keywords, Obfuscated function names from the macro code and classify them as obfuscated or benign function calls which are later used as feature vectors to train models to extract the most relevant features from macro code and even to help the classifiers to detect more accurately as a downloader, dropper malware, shellcode, PowerShell exploits, etc. Experimental results show that proposed method is reproducible and could detect completely new macro malware by analyzing the macro code by the help of Random forest classifier with 82.65 percent accuracy.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Enhanced Energy Efficient Secure Routing Protocol for Mobile Ad-Hoc Network Grid interconnected H-bridge multilevel inverter for renewable power applications using repeating units and level boosting network Power Generation Using Ocean Waves: A Review Development of an Arabic HQAS-based ASAG to consider an ignored knowledge in misspelled multiple words short answers Smartphone assist deep neural network to detect the citrus diseases in Agri-informatics
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1