VulANalyzeR: Explainable Binary Vulnerability Detection with Multi-task Learning and Attentional Graph Convolution

IF 3 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS ACM Transactions on Privacy and Security Pub Date : 2023-04-14 DOI:https://dl.acm.org/doi/10.1145/3585386

Litao Li, Steven H. H. Ding, Yuan Tian, Benjamin C. M. Fung, Philippe Charland, Weihan Ou, Leo Song, Congwei Chen

{"title":"VulANalyzeR: Explainable Binary Vulnerability Detection with Multi-task Learning and Attentional Graph Convolution","authors":"Litao Li, Steven H. H. Ding, Yuan Tian, Benjamin C. M. Fung, Philippe Charland, Weihan Ou, Leo Song, Congwei Chen","doi":"https://dl.acm.org/doi/10.1145/3585386","DOIUrl":null,"url":null,"abstract":"<p>Software vulnerabilities have been posing tremendous reliability threats to the general public as well as critical infrastructures, and there have been many studies aiming to detect and mitigate software defects at the binary level. Most of the standard practices leverage both static and dynamic analysis, which have several drawbacks like heavy manual workload and high complexity. Existing deep learning-based solutions not only suffer to capture the complex relationships among different variables from raw binary code but also lack the explainability required for humans to verify, evaluate, and patch the detected bugs. </p><p>We propose VulANalyzeR, a deep learning-based model, for automated binary vulnerability detection, Common Weakness Enumeration-type classification, and root cause analysis to enhance safety and security. VulANalyzeR features sequential and topological learning through recurrent units and graph convolution to simulate how a program is executed. The attention mechanism is integrated throughout the model, which shows how different instructions and the corresponding states contribute to the final classification. It also classifies the specific vulnerability type through multi-task learning as this not only provides further explanation but also allows faster patching for zero-day vulnerabilities. We show that VulANalyzeR achieves better performance for vulnerability detection over the state-of-the-art baselines. Additionally, a Common Vulnerability Exposure dataset is used to evaluate real complex vulnerabilities. We conduct case studies to show that VulANalyzeR is able to accurately identify the instructions and basic blocks that cause the vulnerability even without given any prior knowledge related to the locations during the training phase.</p>","PeriodicalId":56050,"journal":{"name":"ACM Transactions on Privacy and Security","volume":"468 1","pages":""},"PeriodicalIF":3.0000,"publicationDate":"2023-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Privacy and Security","FirstCategoryId":"94","ListUrlMain":"https://doi.org/https://dl.acm.org/doi/10.1145/3585386","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Software vulnerabilities have been posing tremendous reliability threats to the general public as well as critical infrastructures, and there have been many studies aiming to detect and mitigate software defects at the binary level. Most of the standard practices leverage both static and dynamic analysis, which have several drawbacks like heavy manual workload and high complexity. Existing deep learning-based solutions not only suffer to capture the complex relationships among different variables from raw binary code but also lack the explainability required for humans to verify, evaluate, and patch the detected bugs.

We propose VulANalyzeR, a deep learning-based model, for automated binary vulnerability detection, Common Weakness Enumeration-type classification, and root cause analysis to enhance safety and security. VulANalyzeR features sequential and topological learning through recurrent units and graph convolution to simulate how a program is executed. The attention mechanism is integrated throughout the model, which shows how different instructions and the corresponding states contribute to the final classification. It also classifies the specific vulnerability type through multi-task learning as this not only provides further explanation but also allows faster patching for zero-day vulnerabilities. We show that VulANalyzeR achieves better performance for vulnerability detection over the state-of-the-art baselines. Additionally, a Common Vulnerability Exposure dataset is used to evaluate real complex vulnerabilities. We conduct case studies to show that VulANalyzeR is able to accurately identify the instructions and basic blocks that cause the vulnerability even without given any prior knowledge related to the locations during the training phase.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于多任务学习和注意图卷积的可解释二进制漏洞检测

软件漏洞已经对公众和关键基础设施造成了巨大的可靠性威胁，并且已经有许多研究旨在二进制级别检测和减轻软件缺陷。大多数标准实践都利用静态和动态分析，它们有一些缺点，比如繁重的手工工作负载和高复杂性。现有的基于深度学习的解决方案不仅难以从原始二进制代码中捕获不同变量之间的复杂关系，而且缺乏人类验证、评估和修补检测到的错误所需的可解释性。我们提出了基于深度学习的VulANalyzeR模型，用于自动二进制漏洞检测、常见弱点枚举类型分类和根本原因分析，以增强安全性和安全性。VulANalyzeR通过循环单元和图卷积进行顺序和拓扑学习，以模拟程序的执行方式。注意机制贯穿整个模型，显示了不同的指令和相应的状态对最终分类的贡献。它还通过多任务学习对特定的漏洞类型进行分类，因为这不仅提供了进一步的解释，而且还允许更快地修补零日漏洞。我们展示了VulANalyzeR在最先进的基线上实现了更好的漏洞检测性能。此外，通用漏洞暴露数据集用于评估真实的复杂漏洞。我们进行案例研究，以表明VulANalyzeR能够准确地识别导致漏洞的指令和基本块，即使在训练阶段没有给出任何与位置相关的先验知识。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ACM Transactions on Privacy and Security Computer Science-General Computer Science

CiteScore

5.20

自引率

0.00%

发文量

期刊介绍： ACM Transactions on Privacy and Security (TOPS) (formerly known as TISSEC) publishes high-quality research results in the fields of information and system security and privacy. Studies addressing all aspects of these fields are welcomed, ranging from technologies, to systems and applications, to the crafting of policies.

期刊最新文献

ZPredict: ML-Based IPID Side-channel Measurements ZTA-IoT: A Novel Architecture for Zero-Trust in IoT Systems and an Ensuing Usage Control Model Security Analysis of the Consumer Remote SIM Provisioning Protocol X-squatter: AI Multilingual Generation of Cross-Language Sound-squatting Toward Robust ASR System against Audio Adversarial Examples using Agitated Logit