DLAP：用于软件漏洞检测的深度学习增强大型语言模型提示框架

IF 3.8 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Journal of Systems and Software Pub Date : 2025-01-01 Epub Date: 2024-10-18 DOI:10.1016/j.jss.2024.112234

Yanjing Yang , Xin Zhou , Runfeng Mao , Jinwei Xu , Lanxin Yang , Yu Zhang , Haifeng Shen , He Zhang

{"title":"DLAP：用于软件漏洞检测的深度学习增强大型语言模型提示框架","authors":"Yanjing Yang , Xin Zhou , Runfeng Mao , Jinwei Xu , Lanxin Yang , Yu Zhang , Haifeng Shen , He Zhang","doi":"10.1016/j.jss.2024.112234","DOIUrl":null,"url":null,"abstract":"<div><div>Software vulnerability detection is generally supported by automated static analysis tools, which have recently been reinforced by deep learning (DL) models. However, despite the superior performance of DL-based approaches over rule-based ones in research, applying DL approaches to software vulnerability detection in practice remains a challenge. This is due to the complex structure of source code, the black-box nature of DL, and the extensive domain knowledge required to understand and validate the black-box results for addressing tasks after detection. Conventional DL models are trained by specific projects and, hence, excel in identifying vulnerabilities in these projects but not in others. These models with poor performance in vulnerability detection would impact the downstream tasks such as location and repair. More importantly, these models do not provide explanations for developers to comprehend detection results. In contrast, Large Language Models (LLMs) with prompting techniques achieve stable performance across projects and provide explanations for results. However, using existing prompting techniques, the detection performance of LLMs is relatively low and cannot be used for real-world vulnerability detections. This paper contributes <strong>DLAP</strong>, a <u><strong>D</strong></u>eep <u><strong>L</strong></u>earning <u><strong>A</strong></u>ugmented LLMs <u><strong>P</strong></u>rompting framework that combines the best of both DL models and LLMs to achieve exceptional vulnerability detection performance. Experimental evaluation results confirm that DLAP outperforms state-of-the-art prompting frameworks, including role-based prompts, auxiliary information prompts, chain-of-thought prompts, and in-context learning prompts, as well as fine-turning on multiple metrics.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"219 ","pages":"Article 112234"},"PeriodicalIF":3.8000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DLAP: A Deep Learning Augmented Large Language Model Prompting framework for software vulnerability detection\",\"authors\":\"Yanjing Yang , Xin Zhou , Runfeng Mao , Jinwei Xu , Lanxin Yang , Yu Zhang , Haifeng Shen , He Zhang\",\"doi\":\"10.1016/j.jss.2024.112234\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Software vulnerability detection is generally supported by automated static analysis tools, which have recently been reinforced by deep learning (DL) models. However, despite the superior performance of DL-based approaches over rule-based ones in research, applying DL approaches to software vulnerability detection in practice remains a challenge. This is due to the complex structure of source code, the black-box nature of DL, and the extensive domain knowledge required to understand and validate the black-box results for addressing tasks after detection. Conventional DL models are trained by specific projects and, hence, excel in identifying vulnerabilities in these projects but not in others. These models with poor performance in vulnerability detection would impact the downstream tasks such as location and repair. More importantly, these models do not provide explanations for developers to comprehend detection results. In contrast, Large Language Models (LLMs) with prompting techniques achieve stable performance across projects and provide explanations for results. However, using existing prompting techniques, the detection performance of LLMs is relatively low and cannot be used for real-world vulnerability detections. This paper contributes <strong>DLAP</strong>, a <u><strong>D</strong></u>eep <u><strong>L</strong></u>earning <u><strong>A</strong></u>ugmented LLMs <u><strong>P</strong></u>rompting framework that combines the best of both DL models and LLMs to achieve exceptional vulnerability detection performance. Experimental evaluation results confirm that DLAP outperforms state-of-the-art prompting frameworks, including role-based prompts, auxiliary information prompts, chain-of-thought prompts, and in-context learning prompts, as well as fine-turning on multiple metrics.</div></div>\",\"PeriodicalId\":51099,\"journal\":{\"name\":\"Journal of Systems and Software\",\"volume\":\"219 \",\"pages\":\"Article 112234\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2025-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Systems and Software\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0164121224002784\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/10/18 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems and Software","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0164121224002784","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/10/18 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

软件漏洞检测一般由自动静态分析工具提供支持，最近深度学习（DL）模型又加强了这种支持。然而，尽管在研究中基于深度学习的方法比基于规则的方法性能更优越，但在实践中将深度学习方法应用于软件漏洞检测仍是一项挑战。这是由于源代码的结构复杂、DL 的黑箱性质，以及理解和验证黑箱结果所需的大量领域知识，以解决检测后的任务。传统的 DL 模型是通过特定项目训练出来的，因此在识别这些项目中的漏洞方面表现出色，但在其他项目中就不行了。这些在漏洞检测方面表现不佳的模型会影响定位和修复等下游任务。更重要的是，这些模型无法为开发人员理解检测结果提供解释。相比之下，采用提示技术的大型语言模型（LLMs）可在不同项目中实现稳定的性能，并为结果提供解释。然而，使用现有的提示技术，LLMs 的检测性能相对较低，无法用于真实世界的漏洞检测。本文提出的 DLAP 是一种深度学习增强 LLMs 提示框架，它结合了 DL 模型和 LLMs 的优点，实现了卓越的漏洞检测性能。实验评估结果证实，DLAP 优于最先进的提示框架，包括基于角色的提示、辅助信息提示、思维链提示和上下文学习提示，并在多个指标上实现了微调。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

DLAP: A Deep Learning Augmented Large Language Model Prompting framework for software vulnerability detection

Software vulnerability detection is generally supported by automated static analysis tools, which have recently been reinforced by deep learning (DL) models. However, despite the superior performance of DL-based approaches over rule-based ones in research, applying DL approaches to software vulnerability detection in practice remains a challenge. This is due to the complex structure of source code, the black-box nature of DL, and the extensive domain knowledge required to understand and validate the black-box results for addressing tasks after detection. Conventional DL models are trained by specific projects and, hence, excel in identifying vulnerabilities in these projects but not in others. These models with poor performance in vulnerability detection would impact the downstream tasks such as location and repair. More importantly, these models do not provide explanations for developers to comprehend detection results. In contrast, Large Language Models (LLMs) with prompting techniques achieve stable performance across projects and provide explanations for results. However, using existing prompting techniques, the detection performance of LLMs is relatively low and cannot be used for real-world vulnerability detections. This paper contributes DLAP, a Deep Learning Augmented LLMs Prompting framework that combines the best of both DL models and LLMs to achieve exceptional vulnerability detection performance. Experimental evaluation results confirm that DLAP outperforms state-of-the-art prompting frameworks, including role-based prompts, auxiliary information prompts, chain-of-thought prompts, and in-context learning prompts, as well as fine-turning on multiple metrics.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Systems and Software 工程技术-计算机：理论方法

CiteScore

8.60

自引率

5.70%

发文量

193

审稿时长

16 weeks

期刊介绍： The Journal of Systems and Software publishes papers covering all aspects of software engineering and related hardware-software-systems issues. All articles should include a validation of the idea presented, e.g. through case studies, experiments, or systematic comparisons with other approaches already in practice. Topics of interest include, but are not limited to: •Methods and tools for, and empirical studies on, software requirements, design, architecture, verification and validation, maintenance and evolution •Agile, model-driven, service-oriented, open source and global software development •Approaches for mobile, multiprocessing, real-time, distributed, cloud-based, dependable and virtualized systems •Human factors and management concerns of software development •Data management and big data issues of software systems •Metrics and evaluation, data mining of software development resources •Business and economic aspects of software development processes The journal welcomes state-of-the-art surveys and reports of practical experience for all of these topics.