Investigating the Potential of GPT-3 in Providing Feedback for Programming Assessments

Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1 Pub Date : 2023-06-29 DOI:10.1145/3587102.3588852

Rishabh Balse, Bharath Valaboju, Shreya Singhal, J. Warriem, Prajish Prasad

{"title":"Investigating the Potential of GPT-3 in Providing Feedback for Programming Assessments","authors":"Rishabh Balse, Bharath Valaboju, Shreya Singhal, J. Warriem, Prajish Prasad","doi":"10.1145/3587102.3588852","DOIUrl":null,"url":null,"abstract":"Recent advances in artificial intelligence have led to the development of large language models (LLMs), which are able to generate text, images, and source code based on prompts provided by humans. In this paper, we explore the capabilities of an LLM - OpenAI's GPT-3 model to provide feedback for student written code. Specifically, we examine the feasibility of GPT-3 to check, critique and suggest changes to code written by learners in an online programming exam of an undergraduate Python programming course. We collected 1211 student code submissions from 7 questions asked in a programming exam, and provided the GPT-3 model with separate prompts to check, critique and provide suggestions on these submissions. We found that there was a high variability in the accuracy of the model's feedback for student submissions. Across questions, the range for accurately checking the correctness of the code was between 57% to 79%, between 41% to 77% for accurately critiquing code, and between 32% and 93% for suggesting appropriate changes to the code. We also found instances where the model generated incorrect and inconsistent feedback. These findings suggest that models like GPT-3 currently cannot be 'directly' used to provide feedback to students for programming assessments.","PeriodicalId":410890,"journal":{"name":"Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3587102.3588852","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Recent advances in artificial intelligence have led to the development of large language models (LLMs), which are able to generate text, images, and source code based on prompts provided by humans. In this paper, we explore the capabilities of an LLM - OpenAI's GPT-3 model to provide feedback for student written code. Specifically, we examine the feasibility of GPT-3 to check, critique and suggest changes to code written by learners in an online programming exam of an undergraduate Python programming course. We collected 1211 student code submissions from 7 questions asked in a programming exam, and provided the GPT-3 model with separate prompts to check, critique and provide suggestions on these submissions. We found that there was a high variability in the accuracy of the model's feedback for student submissions. Across questions, the range for accurately checking the correctness of the code was between 57% to 79%, between 41% to 77% for accurately critiquing code, and between 32% and 93% for suggesting appropriate changes to the code. We also found instances where the model generated incorrect and inconsistent feedback. These findings suggest that models like GPT-3 currently cannot be 'directly' used to provide feedback to students for programming assessments.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

调查GPT-3在为规划评估提供反馈方面的潜力

人工智能的最新进展导致了大型语言模型(llm)的发展，这些模型能够根据人类提供的提示生成文本、图像和源代码。在本文中，我们探讨了LLM - OpenAI的GPT-3模型为学生编写的代码提供反馈的能力。具体来说，我们研究了GPT-3在本科生Python编程课程的在线编程考试中检查、批评和建议学习者编写的代码更改的可行性。我们从编程考试的7个问题中收集了1211个学生提交的代码，并为GPT-3模型提供了单独的提示来检查、评论和提供建议。我们发现，模型对学生提交的反馈的准确性存在很大的可变性。在所有问题中，准确检查代码正确性的范围在57%到79%之间，准确批评代码的范围在41%到77%之间，建议对代码进行适当更改的范围在32%到93%之间。我们还发现了模型生成不正确和不一致反馈的实例。这些发现表明，像GPT-3这样的模型目前还不能“直接”用于向学生提供编程评估的反馈。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1

自引率

0.00%

发文量

期刊最新文献

Automatic Problem Generation for CTF-Style Assessments in IT Forensics Courses The Value of Time Extensions in Identifying Students Abilities Studied Questions in Data Structures and Algorithms Assessments Exploring CS1 Student's Notions of Code Quality Pseudocode vs. Compile-and-Run Prompts: Comparing Measures of Student Programming Ability in CS1 and CS2