Rishabh Balse, Bharath Valaboju, Shreya Singhal, J. Warriem, Prajish Prasad
{"title":"Investigating the Potential of GPT-3 in Providing Feedback for Programming Assessments","authors":"Rishabh Balse, Bharath Valaboju, Shreya Singhal, J. Warriem, Prajish Prasad","doi":"10.1145/3587102.3588852","DOIUrl":null,"url":null,"abstract":"Recent advances in artificial intelligence have led to the development of large language models (LLMs), which are able to generate text, images, and source code based on prompts provided by humans. In this paper, we explore the capabilities of an LLM - OpenAI's GPT-3 model to provide feedback for student written code. Specifically, we examine the feasibility of GPT-3 to check, critique and suggest changes to code written by learners in an online programming exam of an undergraduate Python programming course. We collected 1211 student code submissions from 7 questions asked in a programming exam, and provided the GPT-3 model with separate prompts to check, critique and provide suggestions on these submissions. We found that there was a high variability in the accuracy of the model's feedback for student submissions. Across questions, the range for accurately checking the correctness of the code was between 57% to 79%, between 41% to 77% for accurately critiquing code, and between 32% and 93% for suggesting appropriate changes to the code. We also found instances where the model generated incorrect and inconsistent feedback. These findings suggest that models like GPT-3 currently cannot be 'directly' used to provide feedback to students for programming assessments.","PeriodicalId":410890,"journal":{"name":"Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3587102.3588852","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
Recent advances in artificial intelligence have led to the development of large language models (LLMs), which are able to generate text, images, and source code based on prompts provided by humans. In this paper, we explore the capabilities of an LLM - OpenAI's GPT-3 model to provide feedback for student written code. Specifically, we examine the feasibility of GPT-3 to check, critique and suggest changes to code written by learners in an online programming exam of an undergraduate Python programming course. We collected 1211 student code submissions from 7 questions asked in a programming exam, and provided the GPT-3 model with separate prompts to check, critique and provide suggestions on these submissions. We found that there was a high variability in the accuracy of the model's feedback for student submissions. Across questions, the range for accurately checking the correctness of the code was between 57% to 79%, between 41% to 77% for accurately critiquing code, and between 32% and 93% for suggesting appropriate changes to the code. We also found instances where the model generated incorrect and inconsistent feedback. These findings suggest that models like GPT-3 currently cannot be 'directly' used to provide feedback to students for programming assessments.