Evaluating the Performance of Code Generation Models for Solving Parsons Problems With Small Prompt Variations

Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1 Pub Date : 2023-06-29 DOI:10.1145/3587102.3588805

B. Reeves, Sami Sarsa, J. Prather, Paul Denny, Brett A. Becker, Arto Hellas, Bailey Kimmel, Garrett B. Powell, Juho Leinonen

{"title":"Evaluating the Performance of Code Generation Models for Solving Parsons Problems With Small Prompt Variations","authors":"B. Reeves, Sami Sarsa, J. Prather, Paul Denny, Brett A. Becker, Arto Hellas, Bailey Kimmel, Garrett B. Powell, Juho Leinonen","doi":"10.1145/3587102.3588805","DOIUrl":null,"url":null,"abstract":"The recent emergence of code generation tools powered by large language models has attracted wide attention. Models such as OpenAI Codex can take natural language problem descriptions as input and generate highly accurate source code solutions, with potentially significant implications for computing education. Given the many complexities that students face when learning to write code, they may quickly become reliant on such tools without properly understanding the underlying concepts. One popular approach for scaffolding the code writing process is to use Parsons problems, which present solution lines of code in a scrambled order. These remove the complexities of low-level syntax, and allow students to focus on algorithmic and design-level problem solving. It is unclear how well code generation models can be applied to solve Parsons problems, given the mechanics of these models and prior evidence that they underperform when problems include specific restrictions. In this paper, we explore the performance of the Codex model for solving Parsons problems over various prompt variations. Using a corpus of Parsons problems we sourced from the computing education literature, we find that Codex successfully reorders the problem blocks about half of the time, a much lower rate of success when compared to prior work on more free-form programming tasks. Regarding prompts, we find that small variations in prompting have a noticeable effect on model performance, although the effect is not as pronounced as between different problems.","PeriodicalId":410890,"journal":{"name":"Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1","volume":"3 2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3587102.3588805","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

Abstract

The recent emergence of code generation tools powered by large language models has attracted wide attention. Models such as OpenAI Codex can take natural language problem descriptions as input and generate highly accurate source code solutions, with potentially significant implications for computing education. Given the many complexities that students face when learning to write code, they may quickly become reliant on such tools without properly understanding the underlying concepts. One popular approach for scaffolding the code writing process is to use Parsons problems, which present solution lines of code in a scrambled order. These remove the complexities of low-level syntax, and allow students to focus on algorithmic and design-level problem solving. It is unclear how well code generation models can be applied to solve Parsons problems, given the mechanics of these models and prior evidence that they underperform when problems include specific restrictions. In this paper, we explore the performance of the Codex model for solving Parsons problems over various prompt variations. Using a corpus of Parsons problems we sourced from the computing education literature, we find that Codex successfully reorders the problem blocks about half of the time, a much lower rate of success when compared to prior work on more free-form programming tasks. Regarding prompts, we find that small variations in prompting have a noticeable effect on model performance, although the effect is not as pronounced as between different problems.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

评估代码生成模型在解决小提示变化的帕森斯问题中的性能

最近出现的由大型语言模型驱动的代码生成工具引起了广泛的关注。OpenAI Codex等模型可以将自然语言问题描述作为输入，并生成高度准确的源代码解决方案，这对计算机教育具有潜在的重要意义。考虑到学生在学习编写代码时面临的许多复杂性，他们可能很快就会依赖于这些工具，而没有正确理解底层概念。构建代码编写过程的一种流行方法是使用帕森斯问题，它以混乱的顺序呈现解决方案代码行。这些消除了低级语法的复杂性，使学生能够专注于算法和设计级问题的解决。目前还不清楚代码生成模型能在多大程度上应用于解决帕森斯问题，因为这些模型的机制和先前的证据表明，当问题包含特定限制时，它们表现不佳。在本文中，我们探讨了Codex模型的性能，以解决帕森斯问题在各种提示变化。使用我们从计算机教育文献中获得的帕森斯问题语料库，我们发现Codex在大约一半的时间里成功地重新排序了问题块，与之前在更自由形式的编程任务上的工作相比，成功率要低得多。关于提示，我们发现提示的微小变化对模型性能有明显的影响，尽管影响不像不同问题之间那么明显。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1

自引率

0.00%

发文量

期刊最新文献

Automatic Problem Generation for CTF-Style Assessments in IT Forensics Courses The Value of Time Extensions in Identifying Students Abilities Studied Questions in Data Structures and Algorithms Assessments Exploring CS1 Student's Notions of Code Quality Pseudocode vs. Compile-and-Run Prompts: Comparing Measures of Student Programming Ability in CS1 and CS2