Few-VulD:用于软件漏洞检测的 Few-shot 学习框架

IF 4.8 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Computers & Security Pub Date : 2024-07-14 DOI:10.1016/j.cose.2024.103992
{"title":"Few-VulD:用于软件漏洞检测的 Few-shot 学习框架","authors":"","doi":"10.1016/j.cose.2024.103992","DOIUrl":null,"url":null,"abstract":"<div><p>The rapid development of artificial intelligence (AI) has led to the introduction of numerous software vulnerability detection methods based on deep learning algorithms. However, a significant challenge is their dependency on large volumes of code samples for effective training. This requirement poses a considerable hurdle, particularly when adapting to diverse software application scenarios and various vulnerability types, where gathering sufficient and relevant training data for different classification tasks is often arduous. To address the challenge, this paper introduces Few-VulD, a novel framework for software vulnerability detection based on few-shot learning. This framework is designed to be efficiently trained with a minimal number of samples from a variety of existing classification tasks. Its key advantage lies in its ability to rapidly adapt to new vulnerability detection tasks, such as identifying new types of vulnerabilities, with only a small set of learning samples. This capability is particularly beneficial in scenarios where available vulnerability samples are limited. We compare Few-VulD with five state-of-the-art methods on the SySeVR and Big-Vul datasets. On the SySeVR dataset, Few-VulD outperforms all other methods, achieving a recall rate of 87.9% and showing an improvement of 11.7% to 57.8%. On the Big-Vul dataset, Few-VulD outperforms three of the methods, including one that utilizes a pretrained large language model (LLM), with recall improvements ranging from 8.5% to 40.1%. The other two methods employ pretrained LLMs from Microsoft CodeXGLUE (Lu et al., 2021). Few-VulD reaches 78.7% and 95.5% of their recall rates without the need for extensive data pretraining. The performance proves the effectiveness of Few-VulD in vulnerability detection tasks with limited samples.</p></div>","PeriodicalId":51004,"journal":{"name":"Computers & Security","volume":null,"pages":null},"PeriodicalIF":4.8000,"publicationDate":"2024-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Few-VulD: A Few-shot learning framework for software vulnerability detection\",\"authors\":\"\",\"doi\":\"10.1016/j.cose.2024.103992\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>The rapid development of artificial intelligence (AI) has led to the introduction of numerous software vulnerability detection methods based on deep learning algorithms. However, a significant challenge is their dependency on large volumes of code samples for effective training. This requirement poses a considerable hurdle, particularly when adapting to diverse software application scenarios and various vulnerability types, where gathering sufficient and relevant training data for different classification tasks is often arduous. To address the challenge, this paper introduces Few-VulD, a novel framework for software vulnerability detection based on few-shot learning. This framework is designed to be efficiently trained with a minimal number of samples from a variety of existing classification tasks. Its key advantage lies in its ability to rapidly adapt to new vulnerability detection tasks, such as identifying new types of vulnerabilities, with only a small set of learning samples. This capability is particularly beneficial in scenarios where available vulnerability samples are limited. We compare Few-VulD with five state-of-the-art methods on the SySeVR and Big-Vul datasets. On the SySeVR dataset, Few-VulD outperforms all other methods, achieving a recall rate of 87.9% and showing an improvement of 11.7% to 57.8%. On the Big-Vul dataset, Few-VulD outperforms three of the methods, including one that utilizes a pretrained large language model (LLM), with recall improvements ranging from 8.5% to 40.1%. The other two methods employ pretrained LLMs from Microsoft CodeXGLUE (Lu et al., 2021). Few-VulD reaches 78.7% and 95.5% of their recall rates without the need for extensive data pretraining. The performance proves the effectiveness of Few-VulD in vulnerability detection tasks with limited samples.</p></div>\",\"PeriodicalId\":51004,\"journal\":{\"name\":\"Computers & Security\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.8000,\"publicationDate\":\"2024-07-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Security\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167404824002979\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Security","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167404824002979","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

随着人工智能(AI)的快速发展,基于深度学习算法的软件漏洞检测方法层出不穷。然而,这些方法面临的一个重大挑战是需要依赖大量代码样本进行有效训练。这一要求构成了相当大的障碍,尤其是在适应多样化的软件应用场景和各种漏洞类型时,为不同的分类任务收集足够的相关训练数据往往十分困难。为了应对这一挑战,本文介绍了 Few-VulD,一种基于少量学习的新型软件漏洞检测框架。该框架旨在使用来自各种现有分类任务的极少量样本进行高效训练。它的主要优势在于能够快速适应新的漏洞检测任务,例如只需少量的学习样本就能识别新类型的漏洞。在可用漏洞样本有限的情况下,这种能力尤为有利。我们在 SySeVR 和 Big-Vul 数据集上比较了 Few-VulD 和五种最先进的方法。在 SySeVR 数据集上,Few-VulD 的表现优于所有其他方法,召回率达到 87.9%,并提高了 11.7% 至 57.8%。在 Big-Vul 数据集上,Few-VulD 的表现优于其中三种方法,包括一种利用预训练大语言模型(LLM)的方法,召回率提高了 8.5% 至 40.1%。另外两种方法采用了来自微软 CodeXGLUE(Lu 等人,2021 年)的预训练 LLM。Few-VulD 的召回率分别达到 78.7% 和 95.5%,无需大量数据预训练。这些性能证明了 Few-VulD 在样本有限的漏洞检测任务中的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Few-VulD: A Few-shot learning framework for software vulnerability detection

The rapid development of artificial intelligence (AI) has led to the introduction of numerous software vulnerability detection methods based on deep learning algorithms. However, a significant challenge is their dependency on large volumes of code samples for effective training. This requirement poses a considerable hurdle, particularly when adapting to diverse software application scenarios and various vulnerability types, where gathering sufficient and relevant training data for different classification tasks is often arduous. To address the challenge, this paper introduces Few-VulD, a novel framework for software vulnerability detection based on few-shot learning. This framework is designed to be efficiently trained with a minimal number of samples from a variety of existing classification tasks. Its key advantage lies in its ability to rapidly adapt to new vulnerability detection tasks, such as identifying new types of vulnerabilities, with only a small set of learning samples. This capability is particularly beneficial in scenarios where available vulnerability samples are limited. We compare Few-VulD with five state-of-the-art methods on the SySeVR and Big-Vul datasets. On the SySeVR dataset, Few-VulD outperforms all other methods, achieving a recall rate of 87.9% and showing an improvement of 11.7% to 57.8%. On the Big-Vul dataset, Few-VulD outperforms three of the methods, including one that utilizes a pretrained large language model (LLM), with recall improvements ranging from 8.5% to 40.1%. The other two methods employ pretrained LLMs from Microsoft CodeXGLUE (Lu et al., 2021). Few-VulD reaches 78.7% and 95.5% of their recall rates without the need for extensive data pretraining. The performance proves the effectiveness of Few-VulD in vulnerability detection tasks with limited samples.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Computers & Security
Computers & Security 工程技术-计算机:信息系统
CiteScore
12.40
自引率
7.10%
发文量
365
审稿时长
10.7 months
期刊介绍: Computers & Security is the most respected technical journal in the IT security field. With its high-profile editorial board and informative regular features and columns, the journal is essential reading for IT security professionals around the world. Computers & Security provides you with a unique blend of leading edge research and sound practical management advice. It is aimed at the professional involved with computer security, audit, control and data integrity in all sectors - industry, commerce and academia. Recognized worldwide as THE primary source of reference for applied research and technical expertise it is your first step to fully secure systems.
期刊最新文献
A survey on privacy and security issues in IoT-based environments: Technologies, protection measures and future directions Practically implementing an LLM-supported collaborative vulnerability remediation process: A team-based approach An enhanced Deep-Learning empowered Threat-Hunting Framework for software-defined Internet of Things Editorial Board ReckDroid: Detecting red packet fraud in Android apps
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1