Investigating the Potential of GPT-3 in Providing Feedback for Programming Assessments

Rishabh Balse, Bharath Valaboju, Shreya Singhal, J. Warriem, Prajish Prasad
{"title":"Investigating the Potential of GPT-3 in Providing Feedback for Programming Assessments","authors":"Rishabh Balse, Bharath Valaboju, Shreya Singhal, J. Warriem, Prajish Prasad","doi":"10.1145/3587102.3588852","DOIUrl":null,"url":null,"abstract":"Recent advances in artificial intelligence have led to the development of large language models (LLMs), which are able to generate text, images, and source code based on prompts provided by humans. In this paper, we explore the capabilities of an LLM - OpenAI's GPT-3 model to provide feedback for student written code. Specifically, we examine the feasibility of GPT-3 to check, critique and suggest changes to code written by learners in an online programming exam of an undergraduate Python programming course. We collected 1211 student code submissions from 7 questions asked in a programming exam, and provided the GPT-3 model with separate prompts to check, critique and provide suggestions on these submissions. We found that there was a high variability in the accuracy of the model's feedback for student submissions. Across questions, the range for accurately checking the correctness of the code was between 57% to 79%, between 41% to 77% for accurately critiquing code, and between 32% and 93% for suggesting appropriate changes to the code. We also found instances where the model generated incorrect and inconsistent feedback. These findings suggest that models like GPT-3 currently cannot be 'directly' used to provide feedback to students for programming assessments.","PeriodicalId":410890,"journal":{"name":"Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3587102.3588852","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

Recent advances in artificial intelligence have led to the development of large language models (LLMs), which are able to generate text, images, and source code based on prompts provided by humans. In this paper, we explore the capabilities of an LLM - OpenAI's GPT-3 model to provide feedback for student written code. Specifically, we examine the feasibility of GPT-3 to check, critique and suggest changes to code written by learners in an online programming exam of an undergraduate Python programming course. We collected 1211 student code submissions from 7 questions asked in a programming exam, and provided the GPT-3 model with separate prompts to check, critique and provide suggestions on these submissions. We found that there was a high variability in the accuracy of the model's feedback for student submissions. Across questions, the range for accurately checking the correctness of the code was between 57% to 79%, between 41% to 77% for accurately critiquing code, and between 32% and 93% for suggesting appropriate changes to the code. We also found instances where the model generated incorrect and inconsistent feedback. These findings suggest that models like GPT-3 currently cannot be 'directly' used to provide feedback to students for programming assessments.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
调查GPT-3在为规划评估提供反馈方面的潜力
人工智能的最新进展导致了大型语言模型(llm)的发展,这些模型能够根据人类提供的提示生成文本、图像和源代码。在本文中,我们探讨了LLM - OpenAI的GPT-3模型为学生编写的代码提供反馈的能力。具体来说,我们研究了GPT-3在本科生Python编程课程的在线编程考试中检查、批评和建议学习者编写的代码更改的可行性。我们从编程考试的7个问题中收集了1211个学生提交的代码,并为GPT-3模型提供了单独的提示来检查、评论和提供建议。我们发现,模型对学生提交的反馈的准确性存在很大的可变性。在所有问题中,准确检查代码正确性的范围在57%到79%之间,准确批评代码的范围在41%到77%之间,建议对代码进行适当更改的范围在32%到93%之间。我们还发现了模型生成不正确和不一致反馈的实例。这些发现表明,像GPT-3这样的模型目前还不能“直接”用于向学生提供编程评估的反馈。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Automatic Problem Generation for CTF-Style Assessments in IT Forensics Courses The Value of Time Extensions in Identifying Students Abilities Studied Questions in Data Structures and Algorithms Assessments Exploring CS1 Student's Notions of Code Quality Pseudocode vs. Compile-and-Run Prompts: Comparing Measures of Student Programming Ability in CS1 and CS2
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1