Assessing the proficiency of large language models in automatic feedback generation: An evaluation study

Wei Dai , Yi-Shan Tsai , Jionghao Lin , Ahmad Aldino , Hua Jin , Tongguang Li , Dragan Gašević , Guanliang Chen
{"title":"Assessing the proficiency of large language models in automatic feedback generation: An evaluation study","authors":"Wei Dai ,&nbsp;Yi-Shan Tsai ,&nbsp;Jionghao Lin ,&nbsp;Ahmad Aldino ,&nbsp;Hua Jin ,&nbsp;Tongguang Li ,&nbsp;Dragan Gašević ,&nbsp;Guanliang Chen","doi":"10.1016/j.caeai.2024.100299","DOIUrl":null,"url":null,"abstract":"<div><p>Assessment feedback is important to student learning. Learning analytics (LA) powered by artificial intelligence exhibits profound potential in helping instructors with the laborious provision of feedback. Inspired by the recent advancements made by Generative Pre-trained Transformer (GPT) models, we conducted a study to examine the extent to which GPT models hold the potential to advance the existing knowledge of LA-supported feedback systems towards improving the efficiency of feedback provision. Therefore, our study explored the ability of two versions of GPT models – i.e., GPT-3.5 (ChatGPT) and GPT-4 – to generate assessment feedback on students' writing assessment tasks, common in higher education, with open-ended topics for a data science-related course. We compared the feedback generated by GPT models (namely GPT-3.5 and GPT-4) with the feedback provided by human instructors in terms of readability, effectiveness (content containing effective feedback components), and reliability (correct assessment on student performance). Results showed that (1) both GPT-3.5 and GPT-4 were able to generate more readable feedback with greater consistency than human instructors, (2) GPT-4 outperformed GPT-3.5 and human instructors in providing feedback containing information about effective feedback dimensions, including feeding-up, feeding-forward, process level, and self-regulation level, and (3) GPT-4 demonstrated higher reliability of feedback compared to GPT-3.5. Based on our findings, we discussed the potential opportunities and challenges of utilising GPT models in assessment feedback generation.</p></div>","PeriodicalId":34469,"journal":{"name":"Computers and Education Artificial Intelligence","volume":"7 ","pages":"Article 100299"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666920X24001024/pdfft?md5=54f423223507728d270e7cb6c02a10bb&pid=1-s2.0-S2666920X24001024-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers and Education Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666920X24001024","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Social Sciences","Score":null,"Total":0}
引用次数: 0

Abstract

Assessment feedback is important to student learning. Learning analytics (LA) powered by artificial intelligence exhibits profound potential in helping instructors with the laborious provision of feedback. Inspired by the recent advancements made by Generative Pre-trained Transformer (GPT) models, we conducted a study to examine the extent to which GPT models hold the potential to advance the existing knowledge of LA-supported feedback systems towards improving the efficiency of feedback provision. Therefore, our study explored the ability of two versions of GPT models – i.e., GPT-3.5 (ChatGPT) and GPT-4 – to generate assessment feedback on students' writing assessment tasks, common in higher education, with open-ended topics for a data science-related course. We compared the feedback generated by GPT models (namely GPT-3.5 and GPT-4) with the feedback provided by human instructors in terms of readability, effectiveness (content containing effective feedback components), and reliability (correct assessment on student performance). Results showed that (1) both GPT-3.5 and GPT-4 were able to generate more readable feedback with greater consistency than human instructors, (2) GPT-4 outperformed GPT-3.5 and human instructors in providing feedback containing information about effective feedback dimensions, including feeding-up, feeding-forward, process level, and self-regulation level, and (3) GPT-4 demonstrated higher reliability of feedback compared to GPT-3.5. Based on our findings, we discussed the potential opportunities and challenges of utilising GPT models in assessment feedback generation.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
评估大型语言模型在自动生成反馈方面的能力:评估研究
评估反馈对学生的学习非常重要。由人工智能驱动的学习分析(LA)在帮助教师提供费力的反馈方面展现出了巨大的潜力。受生成式预训练转换器(GPT)模型最近取得的进展的启发,我们开展了一项研究,以考察 GPT 模型在多大程度上有可能推动现有的 LA 支持反馈系统知识,从而提高反馈提供的效率。因此,我们的研究探讨了两个版本的 GPT 模型(即 GPT-3.5 (ChatGPT) 和 GPT-4)为学生的写作评估任务生成评估反馈的能力,这些任务在高等教育中很常见,是一门数据科学相关课程的开放式题目。我们将 GPT 模型(即 GPT-3.5 和 GPT-4)生成的反馈与人类教师提供的反馈在可读性、有效性(包含有效反馈成分的内容)和可靠性(对学生表现的正确评估)方面进行了比较。结果表明:(1) GPT-3.5 和 GPT-4 都能比人类教师生成可读性更强、一致性更高的反馈;(2) GPT-4 在提供包含有效反馈维度(包括向上反馈、向前反馈、过程水平和自我调节水平)信息的反馈方面优于 GPT-3.5 和人类教师;(3) 与 GPT-3.5 相比,GPT-4 显示出更高的反馈可靠性。根据我们的研究结果,我们讨论了在评估反馈生成中使用 GPT 模型的潜在机遇和挑战。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
16.80
自引率
0.00%
发文量
66
审稿时长
50 days
期刊最新文献
Enhancing data analysis and programming skills through structured prompt training: The impact of generative AI in engineering education Understanding the practices, perceptions, and (dis)trust of generative AI among instructors: A mixed-methods study in the U.S. higher education Technological self-efficacy and sense of coherence: Key drivers in teachers' AI acceptance and adoption The influence of AI literacy on complex problem-solving skills through systematic thinking skills and intuition thinking skills: An empirical study in Thai gen Z accounting students Psychometrics of an Elo-based large-scale online learning system
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1