评估 CS1 学生课程中由 LLM 生成的逻辑错误解释的质量

Rishabh Balse, Viraj Kumar, Prajish Prasad, J. Warriem
{"title":"评估 CS1 学生课程中由 LLM 生成的逻辑错误解释的质量","authors":"Rishabh Balse, Viraj Kumar, Prajish Prasad, J. Warriem","doi":"10.1145/3627217.3627233","DOIUrl":null,"url":null,"abstract":"When students in CS1 (Introductory Programming) write erroneous code, course staff can use automated tools to provide various types of helpful feedback. In this paper, we focus on syntactically correct student code containing logical errors. Tools that explain logical errors typically require course staff to invest greater effort than tools that detect such errors. To reduce this effort, prior work has investigated the use of Large Language Models (LLMs) such as GPT-3 to generate explanations. Unfortunately, these explanations can be incomplete or incorrect, and therefore unhelpful if presented to students directly. Nevertheless, LLM-generated explanations may be of adequate quality for Teaching Assistants (TAs) to efficiently craft helpful explanations on their basis. We evaluate the quality of explanations generated by an LLM (GPT-3.5-turbo) in two ways, for 30 buggy student solutions across 6 code-writing problems. First, in a study with 5 undergraduate TAs, we compare TA perception of LLM-generated and peer-generated explanation quality. TAs were unaware which explanations were LLM-generated, but they found them to be comparable in quality to peer-generated explanations. Second, we performed a detailed manual analysis of LLM-generated explanations for all 30 buggy solutions. We found at least one incorrect statement in 15/30 explanations (50%). However, in 28/30 cases (93%), the LLM-generated explanation correctly identified at least one logical error. Our results suggest that for large CS1 courses, TAs with adequate training to detect erroneous statements may be able to extract value from such explanations.","PeriodicalId":508655,"journal":{"name":"Proceedings of the 16th Annual ACM India Compute Conference","volume":"33 4","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluating the Quality of LLM-Generated Explanations for Logical Errors in CS1 Student Programs\",\"authors\":\"Rishabh Balse, Viraj Kumar, Prajish Prasad, J. Warriem\",\"doi\":\"10.1145/3627217.3627233\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"When students in CS1 (Introductory Programming) write erroneous code, course staff can use automated tools to provide various types of helpful feedback. In this paper, we focus on syntactically correct student code containing logical errors. Tools that explain logical errors typically require course staff to invest greater effort than tools that detect such errors. To reduce this effort, prior work has investigated the use of Large Language Models (LLMs) such as GPT-3 to generate explanations. Unfortunately, these explanations can be incomplete or incorrect, and therefore unhelpful if presented to students directly. Nevertheless, LLM-generated explanations may be of adequate quality for Teaching Assistants (TAs) to efficiently craft helpful explanations on their basis. We evaluate the quality of explanations generated by an LLM (GPT-3.5-turbo) in two ways, for 30 buggy student solutions across 6 code-writing problems. First, in a study with 5 undergraduate TAs, we compare TA perception of LLM-generated and peer-generated explanation quality. TAs were unaware which explanations were LLM-generated, but they found them to be comparable in quality to peer-generated explanations. Second, we performed a detailed manual analysis of LLM-generated explanations for all 30 buggy solutions. We found at least one incorrect statement in 15/30 explanations (50%). However, in 28/30 cases (93%), the LLM-generated explanation correctly identified at least one logical error. Our results suggest that for large CS1 courses, TAs with adequate training to detect erroneous statements may be able to extract value from such explanations.\",\"PeriodicalId\":508655,\"journal\":{\"name\":\"Proceedings of the 16th Annual ACM India Compute Conference\",\"volume\":\"33 4\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-12-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 16th Annual ACM India Compute Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3627217.3627233\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 16th Annual ACM India Compute Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3627217.3627233","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

当 CS1(程序设计入门)课程的学生编写错误代码时,课程工作人员可以使用自动化工具提供各种类型的有用反馈。在本文中,我们将重点关注包含逻辑错误的语法正确的学生代码。与检测逻辑错误的工具相比,解释逻辑错误的工具通常需要课程工作人员投入更多精力。为了减少这种工作量,之前的工作已经研究了使用大型语言模型(LLM)(如 GPT-3)来生成解释。遗憾的是,这些解释可能是不完整或不正确的,因此如果直接呈现给学生,将毫无帮助。不过,LLM 生成的解释可能具有足够的质量,教学助理(TA)可以在此基础上有效地制作出有用的解释。我们从两个方面评估了 LLM(GPT-3.5-turbo)针对 6 个代码编写问题中的 30 个错误学生解决方案生成的解释的质量。首先,在一项由 5 名本科生助教参与的研究中,我们比较了助教对 LLM 生成的解释和同伴生成的解释质量的感知。助教并不知道哪些解释是由 LLM 生成的,但他们认为这些解释的质量与同行生成的解释相当。其次,我们对 LLM 生成的所有 30 个错误解决方案的解释进行了详细的人工分析。我们在 15/30 个解释(50%)中发现了至少一个错误的陈述。然而,在 28/30 个案例(93%)中,LLM 生成的解释至少正确识别了一个逻辑错误。我们的结果表明,对于大型 CS1 课程而言,受过充分培训的助教可以从这些解释中发现错误陈述。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Evaluating the Quality of LLM-Generated Explanations for Logical Errors in CS1 Student Programs
When students in CS1 (Introductory Programming) write erroneous code, course staff can use automated tools to provide various types of helpful feedback. In this paper, we focus on syntactically correct student code containing logical errors. Tools that explain logical errors typically require course staff to invest greater effort than tools that detect such errors. To reduce this effort, prior work has investigated the use of Large Language Models (LLMs) such as GPT-3 to generate explanations. Unfortunately, these explanations can be incomplete or incorrect, and therefore unhelpful if presented to students directly. Nevertheless, LLM-generated explanations may be of adequate quality for Teaching Assistants (TAs) to efficiently craft helpful explanations on their basis. We evaluate the quality of explanations generated by an LLM (GPT-3.5-turbo) in two ways, for 30 buggy student solutions across 6 code-writing problems. First, in a study with 5 undergraduate TAs, we compare TA perception of LLM-generated and peer-generated explanation quality. TAs were unaware which explanations were LLM-generated, but they found them to be comparable in quality to peer-generated explanations. Second, we performed a detailed manual analysis of LLM-generated explanations for all 30 buggy solutions. We found at least one incorrect statement in 15/30 explanations (50%). However, in 28/30 cases (93%), the LLM-generated explanation correctly identified at least one logical error. Our results suggest that for large CS1 courses, TAs with adequate training to detect erroneous statements may be able to extract value from such explanations.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
The Forum Factor: Exploring the Link between Online Discourse and Student Achievement in Higher Education Evaluating the difficulty for novice engineers in learning and using Transition Systems for modeling software systems Impacting the Submission Timing of Student Work Using Gamification From Learning Outcomes to Competencies based Computing Curricula for India Learning to Rank for Search Results Re-ranking in Learning Experience Platforms
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1