LLMSecCode:评估用于安全编码的大型语言模型

Anton Rydén, Erik Näslund, Elad Michael Schiller, Magnus Almgren
{"title":"LLMSecCode:评估用于安全编码的大型语言模型","authors":"Anton Rydén, Erik Näslund, Elad Michael Schiller, Magnus Almgren","doi":"arxiv-2408.16100","DOIUrl":null,"url":null,"abstract":"The rapid deployment of Large Language Models (LLMs) requires careful\nconsideration of their effect on cybersecurity. Our work aims to improve the\nselection process of LLMs that are suitable for facilitating Secure Coding\n(SC). This raises challenging research questions, such as (RQ1) Which\nfunctionality can streamline the LLM evaluation? (RQ2) What should the\nevaluation measure? (RQ3) How to attest that the evaluation process is\nimpartial? To address these questions, we introduce LLMSecCode, an open-source\nevaluation framework designed to assess LLM SC capabilities objectively. We validate the LLMSecCode implementation through experiments. When varying\nparameters and prompts, we find a 10% and 9% difference in performance,\nrespectively. We also compare some results to reliable external actors, where\nour results show a 5% difference. We strive to ensure the ease of use of our open-source framework and\nencourage further development by external actors. With LLMSecCode, we hope to\nencourage the standardization and benchmarking of LLMs' capabilities in\nsecurity-oriented code and tasks.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"58 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"LLMSecCode: Evaluating Large Language Models for Secure Coding\",\"authors\":\"Anton Rydén, Erik Näslund, Elad Michael Schiller, Magnus Almgren\",\"doi\":\"arxiv-2408.16100\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The rapid deployment of Large Language Models (LLMs) requires careful\\nconsideration of their effect on cybersecurity. Our work aims to improve the\\nselection process of LLMs that are suitable for facilitating Secure Coding\\n(SC). This raises challenging research questions, such as (RQ1) Which\\nfunctionality can streamline the LLM evaluation? (RQ2) What should the\\nevaluation measure? (RQ3) How to attest that the evaluation process is\\nimpartial? To address these questions, we introduce LLMSecCode, an open-source\\nevaluation framework designed to assess LLM SC capabilities objectively. We validate the LLMSecCode implementation through experiments. When varying\\nparameters and prompts, we find a 10% and 9% difference in performance,\\nrespectively. We also compare some results to reliable external actors, where\\nour results show a 5% difference. We strive to ensure the ease of use of our open-source framework and\\nencourage further development by external actors. With LLMSecCode, we hope to\\nencourage the standardization and benchmarking of LLMs' capabilities in\\nsecurity-oriented code and tasks.\",\"PeriodicalId\":501422,\"journal\":{\"name\":\"arXiv - CS - Distributed, Parallel, and Cluster Computing\",\"volume\":\"58 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Distributed, Parallel, and Cluster Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.16100\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Distributed, Parallel, and Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.16100","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

快速部署大型语言模型(LLM)需要仔细考虑其对网络安全的影响。我们的工作旨在改进适合促进安全编码(SC)的 LLM 的选择过程。这就提出了一些具有挑战性的研究问题,例如 (RQ1) 哪些功能可以简化 LLM 评估?(问题 2)评估应该衡量什么?(问题 3)如何证明评价过程是公正的?为了解决这些问题,我们引入了 LLMSecCode,这是一个开源评估框架,旨在客观地评估 LLM SC 能力。我们通过实验验证了 LLMSecCode 的实现。当改变参数和提示时,我们发现性能分别有 10% 和 9% 的差异。我们还将一些结果与可靠的外部行为者进行了比较,结果显示两者相差 5%。我们努力确保开源框架的易用性,并鼓励外部参与者进一步开发。我们希望通过 LLMSecCode,鼓励对 LLM 在面向安全的代码和任务方面的能力进行标准化和基准测试。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
LLMSecCode: Evaluating Large Language Models for Secure Coding
The rapid deployment of Large Language Models (LLMs) requires careful consideration of their effect on cybersecurity. Our work aims to improve the selection process of LLMs that are suitable for facilitating Secure Coding (SC). This raises challenging research questions, such as (RQ1) Which functionality can streamline the LLM evaluation? (RQ2) What should the evaluation measure? (RQ3) How to attest that the evaluation process is impartial? To address these questions, we introduce LLMSecCode, an open-source evaluation framework designed to assess LLM SC capabilities objectively. We validate the LLMSecCode implementation through experiments. When varying parameters and prompts, we find a 10% and 9% difference in performance, respectively. We also compare some results to reliable external actors, where our results show a 5% difference. We strive to ensure the ease of use of our open-source framework and encourage further development by external actors. With LLMSecCode, we hope to encourage the standardization and benchmarking of LLMs' capabilities in security-oriented code and tasks.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Massively parallel CMA-ES with increasing population Communication Lower Bounds and Optimal Algorithms for Symmetric Matrix Computations Energy Efficiency Support for Software Defined Networks: a Serverless Computing Approach CountChain: A Decentralized Oracle Network for Counting Systems Delay Analysis of EIP-4844
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1