Multi-stage guided code generation for Large Language Models

IF 8 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Engineering Applications of Artificial Intelligence Pub Date : 2024-10-30 DOI:10.1016/j.engappai.2024.109491

Yewei Han , Chen Lyu

{"title":"Multi-stage guided code generation for Large Language Models","authors":"Yewei Han , Chen Lyu","doi":"10.1016/j.engappai.2024.109491","DOIUrl":null,"url":null,"abstract":"<div><div>Currently, although Large Language Models (LLMs) have shown significant performance in the field of code generation, their effectiveness in handling complex programming tasks remains limited. This is primarily due to the substantial distance between the problem description and the correct code, making it difficult to ensure accuracy when directly generating code. Human programmers, when faced with a complex programming problem, usually use multiple stages to solve it in order to reduce the difficulty of development. First, they analyze the problem and think about a solution plan, then they design a code architecture based on that plan, and finally they finish writing the detailed code. Based on this, we propose a multi-stage guided code generation strategy that aims to gradually shorten the transformation distance between the problem description and the correct code, thus improving the accuracy of code generation. Specifically, the approach consists of three stages: planning, design and implementation. In the planning phase, the Large Language Model (LLM) generates a solution plan based on the problem description; in the design phase, the code architecture is further designed based on the solution plan; and in the implementation phase, the previous solution plan and code architecture are utilized to guide the LLM in generating the final code. Additionally, we found that existing competition-level code generation benchmarks may overlap with the training data of the Chat Generative Pre-trained Transformer (ChatGPT), posing a risk of data leakage. To validate the above findings and circumvent this risk, we created a competition-level code generation dataset named CodeC, which contains data never used for training ChatGPT. Experimental results show that our method outperforms the most advanced baselines. On the CodeC dataset, our approach achieves a 34.7% relative improvement on the Pass@1 metric compared to the direct generation method of ChatGPT. We have published the relevant dataset at <span><span>https://github.com/hcode666/MSG</span><svg><path></path></svg></span> for further academic research and validation.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"139 ","pages":"Article 109491"},"PeriodicalIF":8.0000,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Applications of Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S095219762401649X","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Currently, although Large Language Models (LLMs) have shown significant performance in the field of code generation, their effectiveness in handling complex programming tasks remains limited. This is primarily due to the substantial distance between the problem description and the correct code, making it difficult to ensure accuracy when directly generating code. Human programmers, when faced with a complex programming problem, usually use multiple stages to solve it in order to reduce the difficulty of development. First, they analyze the problem and think about a solution plan, then they design a code architecture based on that plan, and finally they finish writing the detailed code. Based on this, we propose a multi-stage guided code generation strategy that aims to gradually shorten the transformation distance between the problem description and the correct code, thus improving the accuracy of code generation. Specifically, the approach consists of three stages: planning, design and implementation. In the planning phase, the Large Language Model (LLM) generates a solution plan based on the problem description; in the design phase, the code architecture is further designed based on the solution plan; and in the implementation phase, the previous solution plan and code architecture are utilized to guide the LLM in generating the final code. Additionally, we found that existing competition-level code generation benchmarks may overlap with the training data of the Chat Generative Pre-trained Transformer (ChatGPT), posing a risk of data leakage. To validate the above findings and circumvent this risk, we created a competition-level code generation dataset named CodeC, which contains data never used for training ChatGPT. Experimental results show that our method outperforms the most advanced baselines. On the CodeC dataset, our approach achieves a 34.7% relative improvement on the Pass@1 metric compared to the direct generation method of ChatGPT. We have published the relevant dataset at https://github.com/hcode666/MSG for further academic research and validation.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

大型语言模型的多阶段引导代码生成

目前，尽管大型语言模型（LLM）在代码生成领域表现出了显著的性能，但其在处理复杂编程任务方面的有效性仍然有限。这主要是由于问题描述与正确代码之间存在很大距离，因此在直接生成代码时很难确保准确性。人类程序员在面对复杂的编程问题时，通常会通过多个阶段来解决，以降低开发难度。首先，他们分析问题并思考解决计划，然后根据该计划设计代码架构，最后完成详细代码的编写。在此基础上，我们提出了一种多阶段引导代码生成策略，旨在逐步缩短问题描述与正确代码之间的转换距离，从而提高代码生成的准确性。具体来说，该方法包括三个阶段：规划、设计和实施。在规划阶段，大语言模型（LLM）根据问题描述生成解决方案计划；在设计阶段，根据解决方案计划进一步设计代码架构；在实施阶段，利用之前的解决方案计划和代码架构指导 LLM 生成最终代码。此外，我们还发现现有的竞赛级代码生成基准可能与聊天生成预训练转换器（ChatGPT）的训练数据重叠，存在数据泄露的风险。为了验证上述发现并规避这一风险，我们创建了一个竞赛级代码生成数据集，名为 CodeC，其中包含了从未用于训练 ChatGPT 的数据。实验结果表明，我们的方法优于最先进的基线方法。在 CodeC 数据集上，与 ChatGPT 的直接生成方法相比，我们的方法在 Pass@1 指标上实现了 34.7% 的相对改进。我们已在 https://github.com/hcode666/MSG 上发布了相关数据集，供进一步的学术研究和验证。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Engineering Applications of Artificial Intelligence 工程技术-工程：电子与电气

CiteScore

9.60

自引率

10.00%

发文量

505

审稿时长

68 days

期刊介绍： Artificial Intelligence (AI) is pivotal in driving the fourth industrial revolution, witnessing remarkable advancements across various machine learning methodologies. AI techniques have become indispensable tools for practicing engineers, enabling them to tackle previously insurmountable challenges. Engineering Applications of Artificial Intelligence serves as a global platform for the swift dissemination of research elucidating the practical application of AI methods across all engineering disciplines. Submitted papers are expected to present novel aspects of AI utilized in real-world engineering applications, validated using publicly available datasets to ensure the replicability of research outcomes. Join us in exploring the transformative potential of AI in engineering.