VulANalyzeR: Explainable Binary Vulnerability Detection with Multi-task Learning and Attentional Graph Convolution

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS ACM Transactions on Privacy and Security Pub Date : 2023-03-03 DOI:10.1145/3585386

Litao Li, Steven H. H. Ding, Yuan Tian, B. Fung, P. Charland, Weihan Ou, Leo Song, Congwei Chen

{"title":"VulANalyzeR: Explainable Binary Vulnerability Detection with Multi-task Learning and Attentional Graph Convolution","authors":"Litao Li, Steven H. H. Ding, Yuan Tian, B. Fung, P. Charland, Weihan Ou, Leo Song, Congwei Chen","doi":"10.1145/3585386","DOIUrl":null,"url":null,"abstract":"Software vulnerabilities have been posing tremendous reliability threats to the general public as well as critical infrastructures, and there have been many studies aiming to detect and mitigate software defects at the binary level. Most of the standard practices leverage both static and dynamic analysis, which have several drawbacks like heavy manual workload and high complexity. Existing deep learning-based solutions not only suffer to capture the complex relationships among different variables from raw binary code but also lack the explainability required for humans to verify, evaluate, and patch the detected bugs. We propose VulANalyzeR, a deep learning-based model, for automated binary vulnerability detection, Common Weakness Enumeration-type classification, and root cause analysis to enhance safety and security. VulANalyzeR features sequential and topological learning through recurrent units and graph convolution to simulate how a program is executed. The attention mechanism is integrated throughout the model, which shows how different instructions and the corresponding states contribute to the final classification. It also classifies the specific vulnerability type through multi-task learning as this not only provides further explanation but also allows faster patching for zero-day vulnerabilities. We show that VulANalyzeR achieves better performance for vulnerability detection over the state-of-the-art baselines. Additionally, a Common Vulnerability Exposure dataset is used to evaluate real complex vulnerabilities. We conduct case studies to show that VulANalyzeR is able to accurately identify the instructions and basic blocks that cause the vulnerability even without given any prior knowledge related to the locations during the training phase.","PeriodicalId":56050,"journal":{"name":"ACM Transactions on Privacy and Security","volume":"26 1","pages":"1 - 25"},"PeriodicalIF":3.0000,"publicationDate":"2023-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Privacy and Security","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3585386","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 1

Abstract

Software vulnerabilities have been posing tremendous reliability threats to the general public as well as critical infrastructures, and there have been many studies aiming to detect and mitigate software defects at the binary level. Most of the standard practices leverage both static and dynamic analysis, which have several drawbacks like heavy manual workload and high complexity. Existing deep learning-based solutions not only suffer to capture the complex relationships among different variables from raw binary code but also lack the explainability required for humans to verify, evaluate, and patch the detected bugs. We propose VulANalyzeR, a deep learning-based model, for automated binary vulnerability detection, Common Weakness Enumeration-type classification, and root cause analysis to enhance safety and security. VulANalyzeR features sequential and topological learning through recurrent units and graph convolution to simulate how a program is executed. The attention mechanism is integrated throughout the model, which shows how different instructions and the corresponding states contribute to the final classification. It also classifies the specific vulnerability type through multi-task learning as this not only provides further explanation but also allows faster patching for zero-day vulnerabilities. We show that VulANalyzeR achieves better performance for vulnerability detection over the state-of-the-art baselines. Additionally, a Common Vulnerability Exposure dataset is used to evaluate real complex vulnerabilities. We conduct case studies to show that VulANalyzeR is able to accurately identify the instructions and basic blocks that cause the vulnerability even without given any prior knowledge related to the locations during the training phase.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于多任务学习和注意图卷积的可解释二进制漏洞检测

软件漏洞一直对公众和关键基础设施构成巨大的可靠性威胁，许多研究旨在检测和减轻二进制级别的软件缺陷。大多数标准实践同时利用静态和动态分析，这有几个缺点，如手动工作量大和复杂性高。现有的基于深度学习的解决方案不仅难以从原始二进制代码中捕捉不同变量之间的复杂关系，而且缺乏人类验证、评估和修补检测到的错误所需的可解释性。我们提出了基于深度学习的VulANalyzeR模型，用于自动二进制漏洞检测、常见弱点枚举类型分类和根本原因分析，以增强安全性。VulANalyzeR的特点是通过递归单元和图卷积进行顺序和拓扑学习，以模拟程序的执行方式。注意力机制集成在整个模型中，显示了不同的指令和相应的状态如何对最终分类做出贡献。它还通过多任务学习对特定的漏洞类型进行了分类，因为这不仅提供了进一步的解释，而且可以更快地修补零日漏洞。我们表明，与最先进的基线相比，VulANalyzeR在漏洞检测方面实现了更好的性能。此外，通用漏洞暴露数据集用于评估真实的复杂漏洞。我们进行的案例研究表明，VulANalyzeR能够准确识别导致漏洞的指令和基本块，即使在训练阶段没有任何与位置相关的先验知识。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ACM Transactions on Privacy and Security Computer Science-General Computer Science

CiteScore

5.20

自引率

0.00%

发文量

期刊介绍： ACM Transactions on Privacy and Security (TOPS) (formerly known as TISSEC) publishes high-quality research results in the fields of information and system security and privacy. Studies addressing all aspects of these fields are welcomed, ranging from technologies, to systems and applications, to the crafting of policies.

期刊最新文献

ZPredict: ML-Based IPID Side-channel Measurements ZTA-IoT: A Novel Architecture for Zero-Trust in IoT Systems and an Ensuing Usage Control Model Security Analysis of the Consumer Remote SIM Provisioning Protocol X-squatter: AI Multilingual Generation of Cross-Language Sound-squatting Toward Robust ASR System against Audio Adversarial Examples using Agitated Logit