A Comparison of Machine-Graded (ChatGPT) and Human-Graded Essay Scores in Veterinary Admissions

IF 1.1 3区 农林科学 Q3 EDUCATION, SCIENTIFIC DISCIPLINES Journal of veterinary medical education Pub Date : 2024-05-22 DOI:10.3138/jvme-2023-0162
Raphael Vanderstichel, Henrik Stryhn
{"title":"A Comparison of Machine-Graded (ChatGPT) and Human-Graded Essay Scores in Veterinary Admissions","authors":"Raphael Vanderstichel, Henrik Stryhn","doi":"10.3138/jvme-2023-0162","DOIUrl":null,"url":null,"abstract":"Admissions committees have historically emphasized cognitive measures, but a paradigm shift toward holistic reviews now places greater importance on non-cognitive skills. These holistic reviews may include personal statements, experiences, references, interviews, multiple mini-interviews, and situational judgment tests, often requiring substantial faculty resources. Leveraging advances in artificial intelligence, particularly in natural language processing, this study was conducted to assess the agreement of essay scores graded by both humans and machines (OpenAI's ChatGPT). Correlations were calculated among these scores and cognitive and non-cognitive measures in the admissions process. Human-derived scores from 778 applicants in 2021 and 552 in 2022 had item-specific inter-rater reliabilities ranging from 0.07 to 0.41, while machine-derived inter-replicate reliabilities ranged from 0.41 to 0.61. Pairwise correlations between human- and machine-derived essay scores and other admissions criteria revealed moderate correlations between the two scoring methods (0.41) and fair correlations between the essays and the multiple mini-interview (0.20 and 0.22 for human and machine scores, respectively). Despite having very low correlations, machine-graded scores exhibited slightly stronger correlations with cognitive measures (0.10 to 0.15) compared to human-graded scores (0.01 to 0.02). Importantly, machine scores demonstrated higher precision, approximately two to three times greater than human scores in both years. This study emphasizes the importance of careful item design, rubric development, and prompt formulation when using machine-based essay grading. It also underscores the importance of employing replicates and robust statistical analyses to ensure equitable applicant ranking when integrating machine grading into the admissions process.","PeriodicalId":17575,"journal":{"name":"Journal of veterinary medical education","volume":null,"pages":null},"PeriodicalIF":1.1000,"publicationDate":"2024-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of veterinary medical education","FirstCategoryId":"97","ListUrlMain":"https://doi.org/10.3138/jvme-2023-0162","RegionNum":3,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"EDUCATION, SCIENTIFIC DISCIPLINES","Score":null,"Total":0}
引用次数: 0

Abstract

Admissions committees have historically emphasized cognitive measures, but a paradigm shift toward holistic reviews now places greater importance on non-cognitive skills. These holistic reviews may include personal statements, experiences, references, interviews, multiple mini-interviews, and situational judgment tests, often requiring substantial faculty resources. Leveraging advances in artificial intelligence, particularly in natural language processing, this study was conducted to assess the agreement of essay scores graded by both humans and machines (OpenAI's ChatGPT). Correlations were calculated among these scores and cognitive and non-cognitive measures in the admissions process. Human-derived scores from 778 applicants in 2021 and 552 in 2022 had item-specific inter-rater reliabilities ranging from 0.07 to 0.41, while machine-derived inter-replicate reliabilities ranged from 0.41 to 0.61. Pairwise correlations between human- and machine-derived essay scores and other admissions criteria revealed moderate correlations between the two scoring methods (0.41) and fair correlations between the essays and the multiple mini-interview (0.20 and 0.22 for human and machine scores, respectively). Despite having very low correlations, machine-graded scores exhibited slightly stronger correlations with cognitive measures (0.10 to 0.15) compared to human-graded scores (0.01 to 0.02). Importantly, machine scores demonstrated higher precision, approximately two to three times greater than human scores in both years. This study emphasizes the importance of careful item design, rubric development, and prompt formulation when using machine-based essay grading. It also underscores the importance of employing replicates and robust statistical analyses to ensure equitable applicant ranking when integrating machine grading into the admissions process.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
兽医招生中机器评分(ChatGPT)与人工评分作文的比较
招生委员会历来重视认知能力的衡量标准,但现在向全面审查的模式转变,更加重视非认知技能。这些全面审查可能包括个人陈述、经历、推荐信、面试、多个小型面试和情境判断测试,通常需要大量的教师资源。本研究利用人工智能(尤其是自然语言处理)的进步,评估了由人类和机器(OpenAI 的 ChatGPT)评分的论文分数的一致性。计算了这些分数与录取过程中的认知和非认知测量之间的相关性。来自 2021 年 778 名申请者和 2022 年 552 名申请者的人工评分的特定项目评分者间信度介于 0.07 到 0.41 之间,而机器评分的重复间信度介于 0.41 到 0.61 之间。人工和机器得出的论文分数与其他录取标准之间的配对相关性显示,两种评分方法之间的相关性适中(0.41),论文和多重小型面试之间的相关性尚可(人工和机器评分的相关性分别为 0.20 和 0.22)。尽管相关性很低,但与人工评分(0.01 至 0.02)相比,机器评分与认知测量的相关性(0.10 至 0.15)略高。重要的是,机器评分表现出更高的精确度,在这两年中约为人工评分的 2 到 3 倍。这项研究强调了在使用机器作文评分时,精心设计题目、开发评分标准和制定评分提示的重要性。它还强调了在将机器评分整合到录取过程中时,采用重复和稳健的统计分析以确保申请人排名公平的重要性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
2.20
自引率
30.00%
发文量
113
审稿时长
>36 weeks
期刊介绍: The Journal of Veterinary Medical Education (JVME) is the peer-reviewed scholarly journal of the Association of American Veterinary Medical Colleges (AAVMC). As an internationally distributed journal, JVME provides a forum for the exchange of ideas, research, and discoveries about veterinary medical education. This exchange benefits veterinary faculty, students, and the veterinary profession as a whole by preparing veterinarians to better perform their professional activities and to meet the needs of society. The journal’s areas of focus include best practices and educational methods in veterinary education; recruitment, training, and mentoring of students at all levels of education, including undergraduate, graduate, veterinary technology, and continuing education; clinical instruction and assessment; institutional policy; and other challenges and issues faced by veterinary educators domestically and internationally. Veterinary faculty of all countries are encouraged to participate as contributors, reviewers, and institutional representatives.
期刊最新文献
Qualitative Analysis of Intern Applications and its Relationship to Performance. Case-Based Learning: An Analysis of Student Groupwork and Instructional Design that Promotes Collaborative Discussion. The Effect of Repeated Review of Course Content on Medium- and Long-Term Retention in an Elective Veterinary Cardiology Course. Companion Animal Cadaver Donation for Teaching Purposes at Veterinary Medicine Colleges: A Discrete Choice Experiment. Changing Perceptions of Veterinary Undergraduates to Module Re-Structuring as They Progress Through the Curriculum.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1