利用多头二维变换器可解释地检测 Windows 可移植可执行文件中的恶意行为

IF 7.7 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Big Data Mining and Analytics Pub Date : 2024-06-01 DOI:10.26599/bdma.2023.9020025
Sohail Khan, Mohammad Nauman
{"title":"利用多头二维变换器可解释地检测 Windows 可移植可执行文件中的恶意行为","authors":"Sohail Khan, Mohammad Nauman","doi":"10.26599/bdma.2023.9020025","DOIUrl":null,"url":null,"abstract":": Windows malware is becoming an increasingly pressing problem as the amount of malware continues to grow and more sensitive information is stored on systems. One of the major challenges in tackling this problem is the complexity of malware analysis, which requires expertise from human analysts. Recent developments in machine learning have led to the creation of deep models for malware detection. However, these models often lack transparency, making it difficult to understand the reasoning behind the model’s decisions, otherwise known as the black-box problem. To address these limitations, this paper presents a novel model for malware detection, utilizing vision transformers to analyze the opcode sequences of more than 350,000 Windows portable executable malware samples from real-world datasets. The model achieved a high accuracy of 0.9864, not only surpassing previous results but also providing valuable insights into the reasoning behind the classification. Our model is able to pinpoint specific instructions that lead to malicious behavior in malware samples, aiding human experts in their analysis and driving further advancements in the field. We report our findings and show how causality can be established between malicious code and actual classification by a deep learning model thus opening up this black-box problem for deeper analysis.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":null,"pages":null},"PeriodicalIF":7.7000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Interpretable Detection of Malicious Behavior in Windows Portable Executables Using Multi-Head 2D Transformers\",\"authors\":\"Sohail Khan, Mohammad Nauman\",\"doi\":\"10.26599/bdma.2023.9020025\",\"DOIUrl\":null,\"url\":null,\"abstract\":\": Windows malware is becoming an increasingly pressing problem as the amount of malware continues to grow and more sensitive information is stored on systems. One of the major challenges in tackling this problem is the complexity of malware analysis, which requires expertise from human analysts. Recent developments in machine learning have led to the creation of deep models for malware detection. However, these models often lack transparency, making it difficult to understand the reasoning behind the model’s decisions, otherwise known as the black-box problem. To address these limitations, this paper presents a novel model for malware detection, utilizing vision transformers to analyze the opcode sequences of more than 350,000 Windows portable executable malware samples from real-world datasets. The model achieved a high accuracy of 0.9864, not only surpassing previous results but also providing valuable insights into the reasoning behind the classification. Our model is able to pinpoint specific instructions that lead to malicious behavior in malware samples, aiding human experts in their analysis and driving further advancements in the field. We report our findings and show how causality can be established between malicious code and actual classification by a deep learning model thus opening up this black-box problem for deeper analysis.\",\"PeriodicalId\":52355,\"journal\":{\"name\":\"Big Data Mining and Analytics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":7.7000,\"publicationDate\":\"2024-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Big Data Mining and Analytics\",\"FirstCategoryId\":\"1093\",\"ListUrlMain\":\"https://doi.org/10.26599/bdma.2023.9020025\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Big Data Mining and Analytics","FirstCategoryId":"1093","ListUrlMain":"https://doi.org/10.26599/bdma.2023.9020025","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

:随着恶意软件数量的不断增加以及系统中存储的敏感信息越来越多,Windows 恶意软件正成为一个日益紧迫的问题。解决这一问题的主要挑战之一是恶意软件分析的复杂性,这需要人类分析师的专业知识。机器学习的最新发展促使人们创建了用于恶意软件检测的深度模型。然而,这些模型往往缺乏透明度,因此很难理解模型决策背后的推理,也就是所谓的黑箱问题。为了解决这些局限性,本文提出了一种新型恶意软件检测模型,利用视觉转换器分析了来自真实世界数据集的 350,000 多个 Windows 可移植可执行恶意软件样本的操作码序列。该模型的准确率高达 0.9864,不仅超越了之前的结果,还为分类背后的推理提供了宝贵的见解。我们的模型能够精确定位导致恶意软件样本中恶意行为的特定指令,从而帮助人类专家进行分析,并推动该领域的进一步发展。我们报告了我们的发现,并展示了如何通过深度学习模型在恶意代码和实际分类之间建立因果关系,从而为更深入的分析打开这个黑箱问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Interpretable Detection of Malicious Behavior in Windows Portable Executables Using Multi-Head 2D Transformers
: Windows malware is becoming an increasingly pressing problem as the amount of malware continues to grow and more sensitive information is stored on systems. One of the major challenges in tackling this problem is the complexity of malware analysis, which requires expertise from human analysts. Recent developments in machine learning have led to the creation of deep models for malware detection. However, these models often lack transparency, making it difficult to understand the reasoning behind the model’s decisions, otherwise known as the black-box problem. To address these limitations, this paper presents a novel model for malware detection, utilizing vision transformers to analyze the opcode sequences of more than 350,000 Windows portable executable malware samples from real-world datasets. The model achieved a high accuracy of 0.9864, not only surpassing previous results but also providing valuable insights into the reasoning behind the classification. Our model is able to pinpoint specific instructions that lead to malicious behavior in malware samples, aiding human experts in their analysis and driving further advancements in the field. We report our findings and show how causality can be established between malicious code and actual classification by a deep learning model thus opening up this black-box problem for deeper analysis.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Big Data Mining and Analytics
Big Data Mining and Analytics Computer Science-Computer Science Applications
CiteScore
20.90
自引率
2.20%
发文量
84
期刊介绍: Big Data Mining and Analytics, a publication by Tsinghua University Press, presents groundbreaking research in the field of big data research and its applications. This comprehensive book delves into the exploration and analysis of vast amounts of data from diverse sources to uncover hidden patterns, correlations, insights, and knowledge. Featuring the latest developments, research issues, and solutions, this book offers valuable insights into the world of big data. It provides a deep understanding of data mining techniques, data analytics, and their practical applications. Big Data Mining and Analytics has gained significant recognition and is indexed and abstracted in esteemed platforms such as ESCI, EI, Scopus, DBLP Computer Science, Google Scholar, INSPEC, CSCD, DOAJ, CNKI, and more. With its wealth of information and its ability to transform the way we perceive and utilize data, this book is a must-read for researchers, professionals, and anyone interested in the field of big data analytics.
期刊最新文献
AI-Based Advanced Approaches and Dry Eye Disease Detection Based on Multi-Source Evidence: Cases, Applications, Issues, and Future Directions A Novel Recommendation Algorithm Integrates Resource Allocation and Resource Transfer in Weighted Bipartite Network AI/ML Enabled Automation System for Software Defined Disaggregated Open Radio Access Networks: Transforming Telecommunication Business Extending OpenStack Monasca for Predictive Elasticity Control Unstructured Big Data Threat Intelligence Parallel Mining Algorithm
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1