DeepWukong

ACM Transactions on Software Engineering and Methodology (TOSEM) Pub Date : 2021-04-23 DOI:10.1145/3436877

Xiao Cheng, Haoyu Wang, Jiayi Hua, Guoai Xu, Yulei Sui

{"title":"DeepWukong","authors":"Xiao Cheng, Haoyu Wang, Jiayi Hua, Guoai Xu, Yulei Sui","doi":"10.1145/3436877","DOIUrl":null,"url":null,"abstract":"Static bug detection has shown its effectiveness in detecting well-defined memory errors, e.g., memory leaks, buffer overflows, and null dereference. However, modern software systems have a wide variety of vulnerabilities. These vulnerabilities are extremely complicated with sophisticated programming logic, and these bugs are often caused by different bad programming practices, challenging existing bug detection solutions. It is hard and labor-intensive to develop precise and efficient static analysis solutions for different types of vulnerabilities, particularly for those that may not have a clear specification as the traditional well-defined vulnerabilities. This article presents DeepWukong, a new deep-learning-based embedding approach to static detection of software vulnerabilities for C/C++ programs. Our approach makes a new attempt by leveraging advanced recent graph neural networks to embed code fragments in a compact and low-dimensional representation, producing a new code representation that preserves high-level programming logic (in the form of control- and data-flows) together with the natural language information of a program. Our evaluation studies the top 10 most common C/C++ vulnerabilities during the past 3 years. We have conducted our experiments using 105,428 real-world programs by comparing our approach with four well-known traditional static vulnerability detectors and three state-of-the-art deep-learning-based approaches. The experimental results demonstrate the effectiveness of our research and have shed light on the promising direction of combining program analysis with deep learning techniques to address the general static code analysis challenges.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"45 1","pages":"1 - 33"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"85","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Software Engineering and Methodology (TOSEM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3436877","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 85

Abstract

Static bug detection has shown its effectiveness in detecting well-defined memory errors, e.g., memory leaks, buffer overflows, and null dereference. However, modern software systems have a wide variety of vulnerabilities. These vulnerabilities are extremely complicated with sophisticated programming logic, and these bugs are often caused by different bad programming practices, challenging existing bug detection solutions. It is hard and labor-intensive to develop precise and efficient static analysis solutions for different types of vulnerabilities, particularly for those that may not have a clear specification as the traditional well-defined vulnerabilities. This article presents DeepWukong, a new deep-learning-based embedding approach to static detection of software vulnerabilities for C/C++ programs. Our approach makes a new attempt by leveraging advanced recent graph neural networks to embed code fragments in a compact and low-dimensional representation, producing a new code representation that preserves high-level programming logic (in the form of control- and data-flows) together with the natural language information of a program. Our evaluation studies the top 10 most common C/C++ vulnerabilities during the past 3 years. We have conducted our experiments using 105,428 real-world programs by comparing our approach with four well-known traditional static vulnerability detectors and three state-of-the-art deep-learning-based approaches. The experimental results demonstrate the effectiveness of our research and have shed light on the promising direction of combining program analysis with deep learning techniques to address the general static code analysis challenges.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

静态错误检测在检测定义良好的内存错误(如内存泄漏、缓冲区溢出和null解引用)方面已经显示出其有效性。然而，现代软件系统有各种各样的漏洞。这些漏洞非常复杂，具有复杂的编程逻辑，并且这些错误通常是由不同的不良编程实践引起的，对现有的错误检测解决方案提出了挑战。为不同类型的漏洞开发精确而有效的静态分析解决方案是非常困难和费力的，特别是对于那些可能不像传统的定义良好的漏洞那样具有清晰规范的漏洞。本文介绍了一种新的基于深度学习的嵌入方法——深度悟空，用于C/ c++程序的软件漏洞静态检测。我们的方法进行了新的尝试，利用先进的最新图形神经网络将代码片段嵌入到紧凑的低维表示中，产生一种新的代码表示，该表示保留了高级编程逻辑(以控制流和数据流的形式)以及程序的自然语言信息。我们的评估研究了过去3年中最常见的10个C/ c++漏洞。通过将我们的方法与四种众所周知的传统静态漏洞检测器和三种最先进的基于深度学习的方法进行比较，我们使用105,428个真实世界的程序进行了实验。实验结果证明了我们研究的有效性，并揭示了将程序分析与深度学习技术相结合以解决一般静态代码分析挑战的有希望的方向。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ACM Transactions on Software Engineering and Methodology (TOSEM)

自引率

0.00%

发文量

期刊最新文献

Turnover of Companies in OpenStack: Prevalence and Rationale Super-optimization of Smart Contracts Verification of Programs Sensitive to Heap Layout Assessing and Improving an Evaluation Dataset for Detecting Semantic Code Clones via Deep Learning Guaranteeing Timed Opacity using Parametric Timed Model Checking