{"title":"Interpretable Detection of Malicious Behavior in Windows Portable Executables Using Multi-Head 2D Transformers","authors":"Sohail Khan, Mohammad Nauman","doi":"10.26599/bdma.2023.9020025","DOIUrl":null,"url":null,"abstract":": Windows malware is becoming an increasingly pressing problem as the amount of malware continues to grow and more sensitive information is stored on systems. One of the major challenges in tackling this problem is the complexity of malware analysis, which requires expertise from human analysts. Recent developments in machine learning have led to the creation of deep models for malware detection. However, these models often lack transparency, making it difficult to understand the reasoning behind the model’s decisions, otherwise known as the black-box problem. To address these limitations, this paper presents a novel model for malware detection, utilizing vision transformers to analyze the opcode sequences of more than 350,000 Windows portable executable malware samples from real-world datasets. The model achieved a high accuracy of 0.9864, not only surpassing previous results but also providing valuable insights into the reasoning behind the classification. Our model is able to pinpoint specific instructions that lead to malicious behavior in malware samples, aiding human experts in their analysis and driving further advancements in the field. We report our findings and show how causality can be established between malicious code and actual classification by a deep learning model thus opening up this black-box problem for deeper analysis.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":null,"pages":null},"PeriodicalIF":7.7000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Big Data Mining and Analytics","FirstCategoryId":"1093","ListUrlMain":"https://doi.org/10.26599/bdma.2023.9020025","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
: Windows malware is becoming an increasingly pressing problem as the amount of malware continues to grow and more sensitive information is stored on systems. One of the major challenges in tackling this problem is the complexity of malware analysis, which requires expertise from human analysts. Recent developments in machine learning have led to the creation of deep models for malware detection. However, these models often lack transparency, making it difficult to understand the reasoning behind the model’s decisions, otherwise known as the black-box problem. To address these limitations, this paper presents a novel model for malware detection, utilizing vision transformers to analyze the opcode sequences of more than 350,000 Windows portable executable malware samples from real-world datasets. The model achieved a high accuracy of 0.9864, not only surpassing previous results but also providing valuable insights into the reasoning behind the classification. Our model is able to pinpoint specific instructions that lead to malicious behavior in malware samples, aiding human experts in their analysis and driving further advancements in the field. We report our findings and show how causality can be established between malicious code and actual classification by a deep learning model thus opening up this black-box problem for deeper analysis.
期刊介绍:
Big Data Mining and Analytics, a publication by Tsinghua University Press, presents groundbreaking research in the field of big data research and its applications. This comprehensive book delves into the exploration and analysis of vast amounts of data from diverse sources to uncover hidden patterns, correlations, insights, and knowledge.
Featuring the latest developments, research issues, and solutions, this book offers valuable insights into the world of big data. It provides a deep understanding of data mining techniques, data analytics, and their practical applications.
Big Data Mining and Analytics has gained significant recognition and is indexed and abstracted in esteemed platforms such as ESCI, EI, Scopus, DBLP Computer Science, Google Scholar, INSPEC, CSCD, DOAJ, CNKI, and more.
With its wealth of information and its ability to transform the way we perceive and utilize data, this book is a must-read for researchers, professionals, and anyone interested in the field of big data analytics.