Evaluating the Performance of Code Generation Models for Solving Parsons Problems With Small Prompt Variations

B. Reeves, Sami Sarsa, J. Prather, Paul Denny, Brett A. Becker, Arto Hellas, Bailey Kimmel, Garrett B. Powell, Juho Leinonen
{"title":"Evaluating the Performance of Code Generation Models for Solving Parsons Problems With Small Prompt Variations","authors":"B. Reeves, Sami Sarsa, J. Prather, Paul Denny, Brett A. Becker, Arto Hellas, Bailey Kimmel, Garrett B. Powell, Juho Leinonen","doi":"10.1145/3587102.3588805","DOIUrl":null,"url":null,"abstract":"The recent emergence of code generation tools powered by large language models has attracted wide attention. Models such as OpenAI Codex can take natural language problem descriptions as input and generate highly accurate source code solutions, with potentially significant implications for computing education. Given the many complexities that students face when learning to write code, they may quickly become reliant on such tools without properly understanding the underlying concepts. One popular approach for scaffolding the code writing process is to use Parsons problems, which present solution lines of code in a scrambled order. These remove the complexities of low-level syntax, and allow students to focus on algorithmic and design-level problem solving. It is unclear how well code generation models can be applied to solve Parsons problems, given the mechanics of these models and prior evidence that they underperform when problems include specific restrictions. In this paper, we explore the performance of the Codex model for solving Parsons problems over various prompt variations. Using a corpus of Parsons problems we sourced from the computing education literature, we find that Codex successfully reorders the problem blocks about half of the time, a much lower rate of success when compared to prior work on more free-form programming tasks. Regarding prompts, we find that small variations in prompting have a noticeable effect on model performance, although the effect is not as pronounced as between different problems.","PeriodicalId":410890,"journal":{"name":"Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1","volume":"3 2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3587102.3588805","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

Abstract

The recent emergence of code generation tools powered by large language models has attracted wide attention. Models such as OpenAI Codex can take natural language problem descriptions as input and generate highly accurate source code solutions, with potentially significant implications for computing education. Given the many complexities that students face when learning to write code, they may quickly become reliant on such tools without properly understanding the underlying concepts. One popular approach for scaffolding the code writing process is to use Parsons problems, which present solution lines of code in a scrambled order. These remove the complexities of low-level syntax, and allow students to focus on algorithmic and design-level problem solving. It is unclear how well code generation models can be applied to solve Parsons problems, given the mechanics of these models and prior evidence that they underperform when problems include specific restrictions. In this paper, we explore the performance of the Codex model for solving Parsons problems over various prompt variations. Using a corpus of Parsons problems we sourced from the computing education literature, we find that Codex successfully reorders the problem blocks about half of the time, a much lower rate of success when compared to prior work on more free-form programming tasks. Regarding prompts, we find that small variations in prompting have a noticeable effect on model performance, although the effect is not as pronounced as between different problems.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
评估代码生成模型在解决小提示变化的帕森斯问题中的性能
最近出现的由大型语言模型驱动的代码生成工具引起了广泛的关注。OpenAI Codex等模型可以将自然语言问题描述作为输入,并生成高度准确的源代码解决方案,这对计算机教育具有潜在的重要意义。考虑到学生在学习编写代码时面临的许多复杂性,他们可能很快就会依赖于这些工具,而没有正确理解底层概念。构建代码编写过程的一种流行方法是使用帕森斯问题,它以混乱的顺序呈现解决方案代码行。这些消除了低级语法的复杂性,使学生能够专注于算法和设计级问题的解决。目前还不清楚代码生成模型能在多大程度上应用于解决帕森斯问题,因为这些模型的机制和先前的证据表明,当问题包含特定限制时,它们表现不佳。在本文中,我们探讨了Codex模型的性能,以解决帕森斯问题在各种提示变化。使用我们从计算机教育文献中获得的帕森斯问题语料库,我们发现Codex在大约一半的时间里成功地重新排序了问题块,与之前在更自由形式的编程任务上的工作相比,成功率要低得多。关于提示,我们发现提示的微小变化对模型性能有明显的影响,尽管影响不像不同问题之间那么明显。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Automatic Problem Generation for CTF-Style Assessments in IT Forensics Courses The Value of Time Extensions in Identifying Students Abilities Studied Questions in Data Structures and Algorithms Assessments Exploring CS1 Student's Notions of Code Quality Pseudocode vs. Compile-and-Run Prompts: Comparing Measures of Student Programming Ability in CS1 and CS2
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1