Achieving Human Level Partial Credit Grading of Written Responses to Physics Conceptual Question using GPT-3.5 with Only Prompt Engineering

Zhongzhou Chen, Tong Wan
{"title":"Achieving Human Level Partial Credit Grading of Written Responses to Physics Conceptual Question using GPT-3.5 with Only Prompt Engineering","authors":"Zhongzhou Chen, Tong Wan","doi":"arxiv-2407.15251","DOIUrl":null,"url":null,"abstract":"Large language modules (LLMs) have great potential for auto-grading student\nwritten responses to physics problems due to their capacity to process and\ngenerate natural language. In this explorative study, we use a prompt\nengineering technique, which we name \"scaffolded chain of thought (COT)\", to\ninstruct GPT-3.5 to grade student written responses to a physics conceptual\nquestion. Compared to common COT prompting, scaffolded COT prompts GPT-3.5 to\nexplicitly compare student responses to a detailed, well-explained rubric\nbefore generating the grading outcome. We show that when compared to human\nraters, the grading accuracy of GPT-3.5 using scaffolded COT is 20% - 30%\nhigher than conventional COT. The level of agreement between AI and human\nraters can reach 70% - 80%, comparable to the level between two human raters.\nThis shows promise that an LLM-based AI grader can achieve human-level grading\naccuracy on a physics conceptual problem using prompt engineering techniques\nalone.","PeriodicalId":501565,"journal":{"name":"arXiv - PHYS - Physics Education","volume":"245 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - PHYS - Physics Education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.15251","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Large language modules (LLMs) have great potential for auto-grading student written responses to physics problems due to their capacity to process and generate natural language. In this explorative study, we use a prompt engineering technique, which we name "scaffolded chain of thought (COT)", to instruct GPT-3.5 to grade student written responses to a physics conceptual question. Compared to common COT prompting, scaffolded COT prompts GPT-3.5 to explicitly compare student responses to a detailed, well-explained rubric before generating the grading outcome. We show that when compared to human raters, the grading accuracy of GPT-3.5 using scaffolded COT is 20% - 30% higher than conventional COT. The level of agreement between AI and human raters can reach 70% - 80%, comparable to the level between two human raters. This shows promise that an LLM-based AI grader can achieve human-level grading accuracy on a physics conceptual problem using prompt engineering techniques alone.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用 GPT-3.5 对物理概念性问题的书面回答进行人类水平部分学分评分,仅使用提示工程学
大语言模块(LLMs)具有处理和生成自然语言的能力,因此在对物理问题的学生书面回答进行自动评分方面具有巨大潜力。在这项探索性研究中,我们使用了一种提示工程技术(我们将其命名为 "支架式思维链(COT)")来指导 GPT-3.5 对物理概念问题的学生书面回答进行评分。与普通的 COT 提示相比,脚手架式 COT 提示 GPT-3.5 在生成评分结果之前,会明确地将学生的回答与详细说明的评分标准进行比较。我们的研究表明,与人类评分员相比,使用支架式 COT 的 GPT-3.5 评分准确率比传统 COT 高 20% - 30%。这表明,基于 LLM 的人工智能评分员有望在物理概念问题上单独使用提示工程技术达到人类水平的评分精度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Implementing New Technology in Educational Systems Reflecting to learn in a physics multimedia communication course The Law of Closest Approach Investigating the Design--Science Connection in a multi-week Engineering Design (ED)-based introductory physics laboratory task A Conceptual Framework for Understanding Empathy in Physics Faculty
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1