Few-VulD：用于软件漏洞检测的 Few-shot 学习框架

IF 4.8 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Computers & Security Pub Date : 2024-07-14 DOI:10.1016/j.cose.2024.103992

{"title":"Few-VulD：用于软件漏洞检测的 Few-shot 学习框架","authors":"","doi":"10.1016/j.cose.2024.103992","DOIUrl":null,"url":null,"abstract":"<div><p>The rapid development of artificial intelligence (AI) has led to the introduction of numerous software vulnerability detection methods based on deep learning algorithms. However, a significant challenge is their dependency on large volumes of code samples for effective training. This requirement poses a considerable hurdle, particularly when adapting to diverse software application scenarios and various vulnerability types, where gathering sufficient and relevant training data for different classification tasks is often arduous. To address the challenge, this paper introduces Few-VulD, a novel framework for software vulnerability detection based on few-shot learning. This framework is designed to be efficiently trained with a minimal number of samples from a variety of existing classification tasks. Its key advantage lies in its ability to rapidly adapt to new vulnerability detection tasks, such as identifying new types of vulnerabilities, with only a small set of learning samples. This capability is particularly beneficial in scenarios where available vulnerability samples are limited. We compare Few-VulD with five state-of-the-art methods on the SySeVR and Big-Vul datasets. On the SySeVR dataset, Few-VulD outperforms all other methods, achieving a recall rate of 87.9% and showing an improvement of 11.7% to 57.8%. On the Big-Vul dataset, Few-VulD outperforms three of the methods, including one that utilizes a pretrained large language model (LLM), with recall improvements ranging from 8.5% to 40.1%. The other two methods employ pretrained LLMs from Microsoft CodeXGLUE (Lu et al., 2021). Few-VulD reaches 78.7% and 95.5% of their recall rates without the need for extensive data pretraining. The performance proves the effectiveness of Few-VulD in vulnerability detection tasks with limited samples.</p></div>","PeriodicalId":51004,"journal":{"name":"Computers & Security","volume":null,"pages":null},"PeriodicalIF":4.8000,"publicationDate":"2024-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Few-VulD: A Few-shot learning framework for software vulnerability detection\",\"authors\":\"\",\"doi\":\"10.1016/j.cose.2024.103992\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>The rapid development of artificial intelligence (AI) has led to the introduction of numerous software vulnerability detection methods based on deep learning algorithms. However, a significant challenge is their dependency on large volumes of code samples for effective training. This requirement poses a considerable hurdle, particularly when adapting to diverse software application scenarios and various vulnerability types, where gathering sufficient and relevant training data for different classification tasks is often arduous. To address the challenge, this paper introduces Few-VulD, a novel framework for software vulnerability detection based on few-shot learning. This framework is designed to be efficiently trained with a minimal number of samples from a variety of existing classification tasks. Its key advantage lies in its ability to rapidly adapt to new vulnerability detection tasks, such as identifying new types of vulnerabilities, with only a small set of learning samples. This capability is particularly beneficial in scenarios where available vulnerability samples are limited. We compare Few-VulD with five state-of-the-art methods on the SySeVR and Big-Vul datasets. On the SySeVR dataset, Few-VulD outperforms all other methods, achieving a recall rate of 87.9% and showing an improvement of 11.7% to 57.8%. On the Big-Vul dataset, Few-VulD outperforms three of the methods, including one that utilizes a pretrained large language model (LLM), with recall improvements ranging from 8.5% to 40.1%. The other two methods employ pretrained LLMs from Microsoft CodeXGLUE (Lu et al., 2021). Few-VulD reaches 78.7% and 95.5% of their recall rates without the need for extensive data pretraining. The performance proves the effectiveness of Few-VulD in vulnerability detection tasks with limited samples.</p></div>\",\"PeriodicalId\":51004,\"journal\":{\"name\":\"Computers & Security\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.8000,\"publicationDate\":\"2024-07-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Security\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167404824002979\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Security","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167404824002979","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

随着人工智能（AI）的快速发展，基于深度学习算法的软件漏洞检测方法层出不穷。然而，这些方法面临的一个重大挑战是需要依赖大量代码样本进行有效训练。这一要求构成了相当大的障碍，尤其是在适应多样化的软件应用场景和各种漏洞类型时，为不同的分类任务收集足够的相关训练数据往往十分困难。为了应对这一挑战，本文介绍了 Few-VulD，一种基于少量学习的新型软件漏洞检测框架。该框架旨在使用来自各种现有分类任务的极少量样本进行高效训练。它的主要优势在于能够快速适应新的漏洞检测任务，例如只需少量的学习样本就能识别新类型的漏洞。在可用漏洞样本有限的情况下，这种能力尤为有利。我们在 SySeVR 和 Big-Vul 数据集上比较了 Few-VulD 和五种最先进的方法。在 SySeVR 数据集上，Few-VulD 的表现优于所有其他方法，召回率达到 87.9%，并提高了 11.7% 至 57.8%。在 Big-Vul 数据集上，Few-VulD 的表现优于其中三种方法，包括一种利用预训练大语言模型（LLM）的方法，召回率提高了 8.5% 至 40.1%。另外两种方法采用了来自微软 CodeXGLUE（Lu 等人，2021 年）的预训练 LLM。Few-VulD 的召回率分别达到 78.7% 和 95.5%，无需大量数据预训练。这些性能证明了 Few-VulD 在样本有限的漏洞检测任务中的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Few-VulD: A Few-shot learning framework for software vulnerability detection

The rapid development of artificial intelligence (AI) has led to the introduction of numerous software vulnerability detection methods based on deep learning algorithms. However, a significant challenge is their dependency on large volumes of code samples for effective training. This requirement poses a considerable hurdle, particularly when adapting to diverse software application scenarios and various vulnerability types, where gathering sufficient and relevant training data for different classification tasks is often arduous. To address the challenge, this paper introduces Few-VulD, a novel framework for software vulnerability detection based on few-shot learning. This framework is designed to be efficiently trained with a minimal number of samples from a variety of existing classification tasks. Its key advantage lies in its ability to rapidly adapt to new vulnerability detection tasks, such as identifying new types of vulnerabilities, with only a small set of learning samples. This capability is particularly beneficial in scenarios where available vulnerability samples are limited. We compare Few-VulD with five state-of-the-art methods on the SySeVR and Big-Vul datasets. On the SySeVR dataset, Few-VulD outperforms all other methods, achieving a recall rate of 87.9% and showing an improvement of 11.7% to 57.8%. On the Big-Vul dataset, Few-VulD outperforms three of the methods, including one that utilizes a pretrained large language model (LLM), with recall improvements ranging from 8.5% to 40.1%. The other two methods employ pretrained LLMs from Microsoft CodeXGLUE (Lu et al., 2021). Few-VulD reaches 78.7% and 95.5% of their recall rates without the need for extensive data pretraining. The performance proves the effectiveness of Few-VulD in vulnerability detection tasks with limited samples.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computers & Security 工程技术-计算机：信息系统

CiteScore

12.40

自引率

7.10%

发文量

365

审稿时长

10.7 months

期刊介绍： Computers & Security is the most respected technical journal in the IT security field. With its high-profile editorial board and informative regular features and columns, the journal is essential reading for IT security professionals around the world. Computers & Security provides you with a unique blend of leading edge research and sound practical management advice. It is aimed at the professional involved with computer security, audit, control and data integrity in all sectors - industry, commerce and academia. Recognized worldwide as THE primary source of reference for applied research and technical expertise it is your first step to fully secure systems.