Jinhong Yu, Yi Chen, Di Tang, Xiaozhong Liu, XiaoFeng Wang, Chen Wu, Haixu Tang
{"title":"LLM 增强型软件补丁本地化","authors":"Jinhong Yu, Yi Chen, Di Tang, Xiaozhong Liu, XiaoFeng Wang, Chen Wu, Haixu Tang","doi":"arxiv-2409.06816","DOIUrl":null,"url":null,"abstract":"Open source software (OSS) is integral to modern product development, and any\nvulnerability within it potentially compromises numerous products. While\ndevelopers strive to apply security patches, pinpointing these patches among\nextensive OSS updates remains a challenge. Security patch localization (SPL)\nrecommendation methods are leading approaches to address this. However,\nexisting SPL models often falter when a commit lacks a clear association with\nits corresponding CVE, and do not consider a scenario that a vulnerability has\nmultiple patches proposed over time before it has been fully resolved. To\naddress these challenges, we introduce LLM-SPL, a recommendation-based SPL\napproach that leverages the capabilities of the Large Language Model (LLM) to\nlocate the security patch commit for a given CVE. More specifically, we propose\na joint learning framework, in which the outputs of LLM serves as additional\nfeatures to aid our recommendation model in prioritizing security patches. Our\nevaluation on a dataset of 1,915 CVEs associated with 2,461 patches\ndemonstrates that LLM-SPL excels in ranking patch commits, surpassing the\nstate-of-the-art method in terms of Recall, while significantly reducing manual\neffort. Notably, for vulnerabilities requiring multiple patches, LLM-SPL\nsignificantly improves Recall by 22.83\\%, NDCG by 19.41\\%, and reduces manual\neffort by over 25\\% when checking up to the top 10 rankings. The dataset and\nsource code are available at\n\\url{https://anonymous.4open.science/r/LLM-SPL-91F8}.","PeriodicalId":501332,"journal":{"name":"arXiv - CS - Cryptography and Security","volume":"57 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"LLM-Enhanced Software Patch Localization\",\"authors\":\"Jinhong Yu, Yi Chen, Di Tang, Xiaozhong Liu, XiaoFeng Wang, Chen Wu, Haixu Tang\",\"doi\":\"arxiv-2409.06816\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Open source software (OSS) is integral to modern product development, and any\\nvulnerability within it potentially compromises numerous products. While\\ndevelopers strive to apply security patches, pinpointing these patches among\\nextensive OSS updates remains a challenge. Security patch localization (SPL)\\nrecommendation methods are leading approaches to address this. However,\\nexisting SPL models often falter when a commit lacks a clear association with\\nits corresponding CVE, and do not consider a scenario that a vulnerability has\\nmultiple patches proposed over time before it has been fully resolved. To\\naddress these challenges, we introduce LLM-SPL, a recommendation-based SPL\\napproach that leverages the capabilities of the Large Language Model (LLM) to\\nlocate the security patch commit for a given CVE. More specifically, we propose\\na joint learning framework, in which the outputs of LLM serves as additional\\nfeatures to aid our recommendation model in prioritizing security patches. Our\\nevaluation on a dataset of 1,915 CVEs associated with 2,461 patches\\ndemonstrates that LLM-SPL excels in ranking patch commits, surpassing the\\nstate-of-the-art method in terms of Recall, while significantly reducing manual\\neffort. Notably, for vulnerabilities requiring multiple patches, LLM-SPL\\nsignificantly improves Recall by 22.83\\\\%, NDCG by 19.41\\\\%, and reduces manual\\neffort by over 25\\\\% when checking up to the top 10 rankings. The dataset and\\nsource code are available at\\n\\\\url{https://anonymous.4open.science/r/LLM-SPL-91F8}.\",\"PeriodicalId\":501332,\"journal\":{\"name\":\"arXiv - CS - Cryptography and Security\",\"volume\":\"57 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Cryptography and Security\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.06816\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Cryptography and Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.06816","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Open source software (OSS) is integral to modern product development, and any
vulnerability within it potentially compromises numerous products. While
developers strive to apply security patches, pinpointing these patches among
extensive OSS updates remains a challenge. Security patch localization (SPL)
recommendation methods are leading approaches to address this. However,
existing SPL models often falter when a commit lacks a clear association with
its corresponding CVE, and do not consider a scenario that a vulnerability has
multiple patches proposed over time before it has been fully resolved. To
address these challenges, we introduce LLM-SPL, a recommendation-based SPL
approach that leverages the capabilities of the Large Language Model (LLM) to
locate the security patch commit for a given CVE. More specifically, we propose
a joint learning framework, in which the outputs of LLM serves as additional
features to aid our recommendation model in prioritizing security patches. Our
evaluation on a dataset of 1,915 CVEs associated with 2,461 patches
demonstrates that LLM-SPL excels in ranking patch commits, surpassing the
state-of-the-art method in terms of Recall, while significantly reducing manual
effort. Notably, for vulnerabilities requiring multiple patches, LLM-SPL
significantly improves Recall by 22.83\%, NDCG by 19.41\%, and reduces manual
effort by over 25\% when checking up to the top 10 rankings. The dataset and
source code are available at
\url{https://anonymous.4open.science/r/LLM-SPL-91F8}.