A Comparative Analysis of the Rating of College Students’ Essays by ChatGPT versus Human Raters

Potchong M. Jackaria, Bonjovi Hassan Hajan, Al-Rashiff H. Mastul
{"title":"A Comparative Analysis of the Rating of College Students’ Essays by ChatGPT versus Human Raters","authors":"Potchong M. Jackaria, Bonjovi Hassan Hajan, Al-Rashiff H. Mastul","doi":"10.26803/ijlter.23.2.23","DOIUrl":null,"url":null,"abstract":"The use of generative artificial intelligence (AI) in education has engendered mixed reactions due to its ability to generate human-like responses to questions. For education to benefit from this modern technology, there is a need to determine how such capability can be used to improve teaching and learning. Hence, using a comparative−descriptive research design, this study aimed to perform a comparative analysis between Chat Generative Pre-Trained Transformer (ChatGPT) version 3.5 and human raters in scoring students’ essays. Twenty essays were used of college students in a professional education course at the Mindanao State University – Tawi-Tawi College of Technology and Oceanography, a public university in southern Philippines. The essays were rated independently by three human raters using a scoring rubric from Carrol and West (1989) as adapted by Tuyen et al. (2019). For the AI ratings, the essays were encoded and inputted into ChatGPT 3.5 using prompts and the rubric. The responses were then screenshotted and recorded along with the human ratings for statistical analysis. Using the intraclass correlation coefficient (ICC), results show that among the human raters, the consistency was good, indicating the reliability of the rubric, while a moderate consistency was found in the ChatGPT 3.5 ratings. Comparison of the human and ChatGPT 3.5 ratings show poor consistency, implying the that the ratings of human raters and ChatGPT 3.5 were not linearly related. The finding implies that teachers should be cautious when using ChatGPT in rating students’ written works, suggesting further that using ChatGPT 3.5, in its current version, still needs human assistance to ensure the accuracy of its generated information. Rating of other types of student works using ChatGPT 3.5 or other generative AI tools may be investigated in future research.","PeriodicalId":37101,"journal":{"name":"International Journal of Learning, Teaching and Educational Research","volume":"73 4","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Learning, Teaching and Educational Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.26803/ijlter.23.2.23","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Social Sciences","Score":null,"Total":0}
引用次数: 0

Abstract

The use of generative artificial intelligence (AI) in education has engendered mixed reactions due to its ability to generate human-like responses to questions. For education to benefit from this modern technology, there is a need to determine how such capability can be used to improve teaching and learning. Hence, using a comparative−descriptive research design, this study aimed to perform a comparative analysis between Chat Generative Pre-Trained Transformer (ChatGPT) version 3.5 and human raters in scoring students’ essays. Twenty essays were used of college students in a professional education course at the Mindanao State University – Tawi-Tawi College of Technology and Oceanography, a public university in southern Philippines. The essays were rated independently by three human raters using a scoring rubric from Carrol and West (1989) as adapted by Tuyen et al. (2019). For the AI ratings, the essays were encoded and inputted into ChatGPT 3.5 using prompts and the rubric. The responses were then screenshotted and recorded along with the human ratings for statistical analysis. Using the intraclass correlation coefficient (ICC), results show that among the human raters, the consistency was good, indicating the reliability of the rubric, while a moderate consistency was found in the ChatGPT 3.5 ratings. Comparison of the human and ChatGPT 3.5 ratings show poor consistency, implying the that the ratings of human raters and ChatGPT 3.5 were not linearly related. The finding implies that teachers should be cautious when using ChatGPT in rating students’ written works, suggesting further that using ChatGPT 3.5, in its current version, still needs human assistance to ensure the accuracy of its generated information. Rating of other types of student works using ChatGPT 3.5 or other generative AI tools may be investigated in future research.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
ChatGPT 与人工评分员对大学生作文评分的比较分析
生成式人工智能(AI)能够对问题生成类似人类的回答,因此在教育领域的应用引起了不同的反响。为了让教育从这一现代技术中受益,有必要确定如何利用这种能力来改进教学。因此,本研究采用比较-描述性研究设计,旨在对 Chat Generative Pre-Trained Transformer(ChatGPT)3.5 版与人类评分员在学生作文评分中的表现进行比较分析。研究使用了菲律宾南部一所公立大学--棉兰老州立大学塔威塔威技术与海洋学院专业教育课程中大学生的 20 篇文章。这些论文由三位人类评分员使用经 Tuyen 等人(2019)改编的 Carrol 和 West(1989)评分标准进行独立评分。在人工智能评分方面,作文被编码并使用提示和评分标准输入 ChatGPT 3.5。然后对回复进行截图,并与人工评分一起记录下来,以便进行统计分析。使用类内相关系数(ICC),结果显示人类评分者之间的一致性很好,这表明评分标准是可靠的,而 ChatGPT 3.5 的评分一致性适中。人类评分与 ChatGPT 3.5 评分的一致性较差,这意味着人类评分者的评分与 ChatGPT 3.5 的评分不呈线性关系。这一发现意味着教师在使用 ChatGPT 对学生的书面作品进行评分时应谨慎,并进一步表明,目前版本的 ChatGPT 3.5 仍需要人工协助以确保其生成信息的准确性。使用 ChatGPT 3.5 或其他人工智能生成工具对其他类型的学生作品进行评分,可以在今后的研究中进行探讨。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
2.10
自引率
0.00%
发文量
220
期刊最新文献
The Potential of Linguistic Landscapes for the Teaching of English as a Foreign Language in Cuenca, Ecuador Pre-service Teachers’ Perceptions and Practices of Learner Autonomy: A Case Study in Vietnam Reflection Analysis of Resilient and Sustainable Research and Publication Activities at the National University of Science & Technology, Oman during COVID-19 The Effectiveness of Team Teaching in Improving Reading Skill among Thai EFL Undergraduates and Their Attitudes toward this Strategy Testing the Healthy School Organisation Instrument (i-OS) and the Holistic Psychological Well-Being Model of School Organisations
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1