Anton Rydén, Erik Näslund, Elad Michael Schiller, Magnus Almgren
{"title":"LLMSecCode: Evaluating Large Language Models for Secure Coding","authors":"Anton Rydén, Erik Näslund, Elad Michael Schiller, Magnus Almgren","doi":"arxiv-2408.16100","DOIUrl":null,"url":null,"abstract":"The rapid deployment of Large Language Models (LLMs) requires careful\nconsideration of their effect on cybersecurity. Our work aims to improve the\nselection process of LLMs that are suitable for facilitating Secure Coding\n(SC). This raises challenging research questions, such as (RQ1) Which\nfunctionality can streamline the LLM evaluation? (RQ2) What should the\nevaluation measure? (RQ3) How to attest that the evaluation process is\nimpartial? To address these questions, we introduce LLMSecCode, an open-source\nevaluation framework designed to assess LLM SC capabilities objectively. We validate the LLMSecCode implementation through experiments. When varying\nparameters and prompts, we find a 10% and 9% difference in performance,\nrespectively. We also compare some results to reliable external actors, where\nour results show a 5% difference. We strive to ensure the ease of use of our open-source framework and\nencourage further development by external actors. With LLMSecCode, we hope to\nencourage the standardization and benchmarking of LLMs' capabilities in\nsecurity-oriented code and tasks.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"58 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Distributed, Parallel, and Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.16100","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The rapid deployment of Large Language Models (LLMs) requires careful
consideration of their effect on cybersecurity. Our work aims to improve the
selection process of LLMs that are suitable for facilitating Secure Coding
(SC). This raises challenging research questions, such as (RQ1) Which
functionality can streamline the LLM evaluation? (RQ2) What should the
evaluation measure? (RQ3) How to attest that the evaluation process is
impartial? To address these questions, we introduce LLMSecCode, an open-source
evaluation framework designed to assess LLM SC capabilities objectively. We validate the LLMSecCode implementation through experiments. When varying
parameters and prompts, we find a 10% and 9% difference in performance,
respectively. We also compare some results to reliable external actors, where
our results show a 5% difference. We strive to ensure the ease of use of our open-source framework and
encourage further development by external actors. With LLMSecCode, we hope to
encourage the standardization and benchmarking of LLMs' capabilities in
security-oriented code and tasks.