LLM-Enhanced Software Patch Localization

Jinhong Yu, Yi Chen, Di Tang, Xiaozhong Liu, XiaoFeng Wang, Chen Wu, Haixu Tang
{"title":"LLM-Enhanced Software Patch Localization","authors":"Jinhong Yu, Yi Chen, Di Tang, Xiaozhong Liu, XiaoFeng Wang, Chen Wu, Haixu Tang","doi":"arxiv-2409.06816","DOIUrl":null,"url":null,"abstract":"Open source software (OSS) is integral to modern product development, and any\nvulnerability within it potentially compromises numerous products. While\ndevelopers strive to apply security patches, pinpointing these patches among\nextensive OSS updates remains a challenge. Security patch localization (SPL)\nrecommendation methods are leading approaches to address this. However,\nexisting SPL models often falter when a commit lacks a clear association with\nits corresponding CVE, and do not consider a scenario that a vulnerability has\nmultiple patches proposed over time before it has been fully resolved. To\naddress these challenges, we introduce LLM-SPL, a recommendation-based SPL\napproach that leverages the capabilities of the Large Language Model (LLM) to\nlocate the security patch commit for a given CVE. More specifically, we propose\na joint learning framework, in which the outputs of LLM serves as additional\nfeatures to aid our recommendation model in prioritizing security patches. Our\nevaluation on a dataset of 1,915 CVEs associated with 2,461 patches\ndemonstrates that LLM-SPL excels in ranking patch commits, surpassing the\nstate-of-the-art method in terms of Recall, while significantly reducing manual\neffort. Notably, for vulnerabilities requiring multiple patches, LLM-SPL\nsignificantly improves Recall by 22.83\\%, NDCG by 19.41\\%, and reduces manual\neffort by over 25\\% when checking up to the top 10 rankings. The dataset and\nsource code are available at\n\\url{https://anonymous.4open.science/r/LLM-SPL-91F8}.","PeriodicalId":501332,"journal":{"name":"arXiv - CS - Cryptography and Security","volume":"57 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Cryptography and Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.06816","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Open source software (OSS) is integral to modern product development, and any vulnerability within it potentially compromises numerous products. While developers strive to apply security patches, pinpointing these patches among extensive OSS updates remains a challenge. Security patch localization (SPL) recommendation methods are leading approaches to address this. However, existing SPL models often falter when a commit lacks a clear association with its corresponding CVE, and do not consider a scenario that a vulnerability has multiple patches proposed over time before it has been fully resolved. To address these challenges, we introduce LLM-SPL, a recommendation-based SPL approach that leverages the capabilities of the Large Language Model (LLM) to locate the security patch commit for a given CVE. More specifically, we propose a joint learning framework, in which the outputs of LLM serves as additional features to aid our recommendation model in prioritizing security patches. Our evaluation on a dataset of 1,915 CVEs associated with 2,461 patches demonstrates that LLM-SPL excels in ranking patch commits, surpassing the state-of-the-art method in terms of Recall, while significantly reducing manual effort. Notably, for vulnerabilities requiring multiple patches, LLM-SPL significantly improves Recall by 22.83\%, NDCG by 19.41\%, and reduces manual effort by over 25\% when checking up to the top 10 rankings. The dataset and source code are available at \url{https://anonymous.4open.science/r/LLM-SPL-91F8}.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
LLM 增强型软件补丁本地化
开放源码软件(OSS)是现代产品开发不可或缺的一部分,其中的任何漏洞都有可能危及众多产品。虽然开发人员努力应用安全补丁,但在广泛的开放源码软件更新中精确定位这些补丁仍然是一项挑战。安全补丁本地化(SPL)推荐方法是解决这一问题的主要方法。然而,现有的 SPL 模型在提交与相应的 CVE 缺乏明确关联时往往会出现问题,而且不会考虑漏洞在被完全解决之前会随着时间的推移被提出多个补丁的情况。为了应对这些挑战,我们引入了 LLM-SPL,这是一种基于推荐的 SPL 方法,它利用大语言模型(LLM)的功能来定位给定 CVE 的安全补丁提交。更具体地说,我们提出了一个联合学习框架,其中 LLM 的输出可作为额外特征,帮助我们的推荐模型确定安全补丁的优先级。对与 2,461 个补丁相关联的 1,915 个 CVE 数据集进行的评估表明,LLM-SPL 在补丁提交排序方面表现出色,在召回率方面超过了最先进的方法,同时大大减少了人工操作的工作量。值得注意的是,对于需要多个补丁的漏洞,LLM-SPL显著提高了22.83%的召回率(Recall)和19.41%的NDCG(NDCG),并且在检查前10个排名时减少了超过25%的人工工作量。数据集和源代码请访问:url{https://anonymous.4open.science/r/LLM-SPL-91F8}。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
PAD-FT: A Lightweight Defense for Backdoor Attacks via Data Purification and Fine-Tuning Artemis: Efficient Commit-and-Prove SNARKs for zkML A Survey-Based Quantitative Analysis of Stress Factors and Their Impacts Among Cybersecurity Professionals Log2graphs: An Unsupervised Framework for Log Anomaly Detection with Efficient Feature Extraction Practical Investigation on the Distinguishability of Longa's Atomic Patterns
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1