Enhancing vulnerability detection efficiency: An exploration of light-weight LLMs with hybrid code features

IF 3.8 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Journal of Information Security and Applications Pub Date : 2024-11-30 DOI:10.1016/j.jisa.2024.103925
Jianing Liu , Guanjun Lin , Huan Mei , Fan Yang , Yonghang Tai
{"title":"Enhancing vulnerability detection efficiency: An exploration of light-weight LLMs with hybrid code features","authors":"Jianing Liu ,&nbsp;Guanjun Lin ,&nbsp;Huan Mei ,&nbsp;Fan Yang ,&nbsp;Yonghang Tai","doi":"10.1016/j.jisa.2024.103925","DOIUrl":null,"url":null,"abstract":"<div><div>Vulnerability detection is a critical research topic. However, the performance of existing neural network-based approaches requires further improvement. The emergence of large language models (LLMs) has demonstrated their superior performance in natural language processing (NLP) compared to conventional neural architectures, motivating researchers to apply LLMs for vulnerability detection. This paper focuses on evaluating the performance of various Transformer-based LLMs for source-code-level vulnerability detection. We propose a framework named VulACLLM (AST &amp; CFG-based LLMs Vulnerability Detection), which leverages combined feature sets derived from abstract Syntax Tree (AST) and Control Flow Graph (CFG). The recall rate of VulACLLM in the field of vulnerability detection reached 0.73, while the F1-score achieved 0.725. Experimental results show that the proposed feature sets significantly enhance detection performance. To further improve the efficiency of LLM-based detection, we examine the performance of LLMs compressed using two techniques: Knowledge Distillation (KD) and Low-Rank Adaptation (LoRA). To assess the performance of these compressed models, we introduce efficiency metrics that quantify both performance loss and efficiency gains achieved through compression. Our findings reveal that, compared to KD, LLMs compressed with LoRA achieve higher recall, achieving a maximum recall rate of 0.82, while substantially reducing training time, taking only 20 min to complete one epoch, and disk size, requiring only 4.89 MB of memory. The experimental results demonstrate that LoRA compression effectively mitigates deployment challenges associated with large model sizes and high video memory consumption, enabling the deployment of LoRA-compressed LLMs on consumer-level GPUs without compromising vulnerability detection performance.</div></div>","PeriodicalId":48638,"journal":{"name":"Journal of Information Security and Applications","volume":"88 ","pages":"Article 103925"},"PeriodicalIF":3.8000,"publicationDate":"2024-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information Security and Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2214212624002278","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Vulnerability detection is a critical research topic. However, the performance of existing neural network-based approaches requires further improvement. The emergence of large language models (LLMs) has demonstrated their superior performance in natural language processing (NLP) compared to conventional neural architectures, motivating researchers to apply LLMs for vulnerability detection. This paper focuses on evaluating the performance of various Transformer-based LLMs for source-code-level vulnerability detection. We propose a framework named VulACLLM (AST & CFG-based LLMs Vulnerability Detection), which leverages combined feature sets derived from abstract Syntax Tree (AST) and Control Flow Graph (CFG). The recall rate of VulACLLM in the field of vulnerability detection reached 0.73, while the F1-score achieved 0.725. Experimental results show that the proposed feature sets significantly enhance detection performance. To further improve the efficiency of LLM-based detection, we examine the performance of LLMs compressed using two techniques: Knowledge Distillation (KD) and Low-Rank Adaptation (LoRA). To assess the performance of these compressed models, we introduce efficiency metrics that quantify both performance loss and efficiency gains achieved through compression. Our findings reveal that, compared to KD, LLMs compressed with LoRA achieve higher recall, achieving a maximum recall rate of 0.82, while substantially reducing training time, taking only 20 min to complete one epoch, and disk size, requiring only 4.89 MB of memory. The experimental results demonstrate that LoRA compression effectively mitigates deployment challenges associated with large model sizes and high video memory consumption, enabling the deployment of LoRA-compressed LLMs on consumer-level GPUs without compromising vulnerability detection performance.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
提高漏洞检测效率:具有混合代码特征的轻量级llm的探索
漏洞检测是一个重要的研究课题。然而,现有的基于神经网络的方法的性能需要进一步改进。大型语言模型(llm)的出现证明了其在自然语言处理(NLP)方面优于传统神经结构的性能,促使研究人员将llm应用于漏洞检测。本文重点评估了各种基于transformer的llm在源代码级漏洞检测中的性能。我们提出了一个名为VulACLLM (AST &;基于CFG的LLMs漏洞检测),它利用了抽象语法树(AST)和控制流图(CFG)派生的组合特征集。VulACLLM在漏洞检测领域的召回率达到0.73,f1得分达到0.725。实验结果表明,所提出的特征集显著提高了检测性能。为了进一步提高基于llm的检测效率,我们检查了使用两种技术压缩的llm的性能:知识蒸馏(KD)和低秩自适应(LoRA)。为了评估这些压缩模型的性能,我们引入了效率指标,量化了通过压缩实现的性能损失和效率收益。我们的研究结果表明,与KD相比,使用LoRA压缩的llm具有更高的召回率,达到了0.82的最大召回率,同时大大减少了训练时间,只需20分钟即可完成一个epoch,磁盘大小仅需要4.89 MB的内存。实验结果表明,LoRA压缩有效地缓解了与大模型尺寸和高视频内存消耗相关的部署挑战,使LoRA压缩llm能够在消费者级gpu上部署,而不会影响漏洞检测性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Information Security and Applications
Journal of Information Security and Applications Computer Science-Computer Networks and Communications
CiteScore
10.90
自引率
5.40%
发文量
206
审稿时长
56 days
期刊介绍: Journal of Information Security and Applications (JISA) focuses on the original research and practice-driven applications with relevance to information security and applications. JISA provides a common linkage between a vibrant scientific and research community and industry professionals by offering a clear view on modern problems and challenges in information security, as well as identifying promising scientific and "best-practice" solutions. JISA issues offer a balance between original research work and innovative industrial approaches by internationally renowned information security experts and researchers.
期刊最新文献
Hierarchical Threshold Multi-Key Fully Homomorphic Encryption Color image encryption algorithm based on hybrid chaos and layered strategies Efficient and verifiable keyword search over public-key ciphertexts based on blockchain Editorial Board Deepfakes in digital media forensics: Generation, AI-based detection and challenges
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1