V Ravi, S.P. Gururaj, H.K. Vedamurthy, M.B. Nirmala
{"title":"Analysing corpus of office documents for macro-based attacks using Machine Learning","authors":"V Ravi, S.P. Gururaj, H.K. Vedamurthy, M.B. Nirmala","doi":"10.1016/j.gltp.2022.04.004","DOIUrl":null,"url":null,"abstract":"<div><p>Macro-based malware attacks are on the rise in recent cyber-attacks using malicious code written in visual basic code which can be used to target computers to achieve various exploitations. Macro malware can be obfuscated using various tools and easily evade antivirus software. To detect this macro malware, several methods of machine learning techniques have been proposed with an inadequate dataset for both benign and malicious macro codes which are not reproducible and evaluated on unbalanced datasets. In this paper, use of word embedding technique such as Word2Vec embedding is used for code analysis is proposed to analyze and process macro code written in visual basic language to understand and detect the attack vector before opening the documents. The proposed word embedding technique, called <em>Obfuscated-Word2vec</em> is proposed to detect obfuscated keywords, Obfuscated function names from the macro code and classify them as obfuscated or benign function calls which are later used as feature vectors to train models to extract the most relevant features from macro code and even to help the classifiers to detect more accurately as a downloader, dropper malware, shellcode, PowerShell exploits, etc. Experimental results show that proposed method is reproducible and could detect completely new macro malware by analyzing the macro code by the help of Random forest classifier with 82.65 percent accuracy.</p></div>","PeriodicalId":100588,"journal":{"name":"Global Transitions Proceedings","volume":"3 1","pages":"Pages 20-24"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666285X22000401/pdfft?md5=e7b876b452a7172444358a89eb62dde6&pid=1-s2.0-S2666285X22000401-main.pdf","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Global Transitions Proceedings","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666285X22000401","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Macro-based malware attacks are on the rise in recent cyber-attacks using malicious code written in visual basic code which can be used to target computers to achieve various exploitations. Macro malware can be obfuscated using various tools and easily evade antivirus software. To detect this macro malware, several methods of machine learning techniques have been proposed with an inadequate dataset for both benign and malicious macro codes which are not reproducible and evaluated on unbalanced datasets. In this paper, use of word embedding technique such as Word2Vec embedding is used for code analysis is proposed to analyze and process macro code written in visual basic language to understand and detect the attack vector before opening the documents. The proposed word embedding technique, called Obfuscated-Word2vec is proposed to detect obfuscated keywords, Obfuscated function names from the macro code and classify them as obfuscated or benign function calls which are later used as feature vectors to train models to extract the most relevant features from macro code and even to help the classifiers to detect more accurately as a downloader, dropper malware, shellcode, PowerShell exploits, etc. Experimental results show that proposed method is reproducible and could detect completely new macro malware by analyzing the macro code by the help of Random forest classifier with 82.65 percent accuracy.