LLMSecCode: Evaluating Large Language Models for Secure Coding

arXiv - CS - Distributed, Parallel, and Cluster Computing Pub Date : 2024-08-28 DOI:arxiv-2408.16100

Anton Rydén, Erik Näslund, Elad Michael Schiller, Magnus Almgren

{"title":"LLMSecCode: Evaluating Large Language Models for Secure Coding","authors":"Anton Rydén, Erik Näslund, Elad Michael Schiller, Magnus Almgren","doi":"arxiv-2408.16100","DOIUrl":null,"url":null,"abstract":"The rapid deployment of Large Language Models (LLMs) requires careful\nconsideration of their effect on cybersecurity. Our work aims to improve the\nselection process of LLMs that are suitable for facilitating Secure Coding\n(SC). This raises challenging research questions, such as (RQ1) Which\nfunctionality can streamline the LLM evaluation? (RQ2) What should the\nevaluation measure? (RQ3) How to attest that the evaluation process is\nimpartial? To address these questions, we introduce LLMSecCode, an open-source\nevaluation framework designed to assess LLM SC capabilities objectively. We validate the LLMSecCode implementation through experiments. When varying\nparameters and prompts, we find a 10% and 9% difference in performance,\nrespectively. We also compare some results to reliable external actors, where\nour results show a 5% difference. We strive to ensure the ease of use of our open-source framework and\nencourage further development by external actors. With LLMSecCode, we hope to\nencourage the standardization and benchmarking of LLMs' capabilities in\nsecurity-oriented code and tasks.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"58 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Distributed, Parallel, and Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.16100","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The rapid deployment of Large Language Models (LLMs) requires careful consideration of their effect on cybersecurity. Our work aims to improve the selection process of LLMs that are suitable for facilitating Secure Coding (SC). This raises challenging research questions, such as (RQ1) Which functionality can streamline the LLM evaluation? (RQ2) What should the evaluation measure? (RQ3) How to attest that the evaluation process is impartial? To address these questions, we introduce LLMSecCode, an open-source evaluation framework designed to assess LLM SC capabilities objectively. We validate the LLMSecCode implementation through experiments. When varying parameters and prompts, we find a 10% and 9% difference in performance, respectively. We also compare some results to reliable external actors, where our results show a 5% difference. We strive to ensure the ease of use of our open-source framework and encourage further development by external actors. With LLMSecCode, we hope to encourage the standardization and benchmarking of LLMs' capabilities in security-oriented code and tasks.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

LLMSecCode：评估用于安全编码的大型语言模型

快速部署大型语言模型（LLM）需要仔细考虑其对网络安全的影响。我们的工作旨在改进适合促进安全编码（SC）的 LLM 的选择过程。这就提出了一些具有挑战性的研究问题，例如 (RQ1) 哪些功能可以简化 LLM 评估？(问题 2）评估应该衡量什么？(问题 3）如何证明评价过程是公正的？为了解决这些问题，我们引入了 LLMSecCode，这是一个开源评估框架，旨在客观地评估 LLM SC 能力。我们通过实验验证了 LLMSecCode 的实现。当改变参数和提示时，我们发现性能分别有 10% 和 9% 的差异。我们还将一些结果与可靠的外部行为者进行了比较，结果显示两者相差 5%。我们努力确保开源框架的易用性，并鼓励外部参与者进一步开发。我们希望通过 LLMSecCode，鼓励对 LLM 在面向安全的代码和任务方面的能力进行标准化和基准测试。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - CS - Distributed, Parallel, and Cluster Computing

自引率

0.00%

发文量

期刊最新文献

Massively parallel CMA-ES with increasing population Communication Lower Bounds and Optimal Algorithms for Symmetric Matrix Computations Energy Efficiency Support for Software Defined Networks: a Serverless Computing Approach CountChain: A Decentralized Oracle Network for Counting Systems Delay Analysis of EIP-4844