Below average ChatGPT performance in medical microbiology exam compared to university students

IF 1.9 Q2 EDUCATION & EDUCATIONAL RESEARCH Frontiers in Education Pub Date : 2023-12-21 DOI:10.3389/feduc.2023.1333415
Malik Sallam, Khaled Al-Salahat
{"title":"Below average ChatGPT performance in medical microbiology exam compared to university students","authors":"Malik Sallam, Khaled Al-Salahat","doi":"10.3389/feduc.2023.1333415","DOIUrl":null,"url":null,"abstract":"The transformative potential of artificial intelligence (AI) in higher education is evident, with conversational models like ChatGPT poised to reshape teaching and assessment methods. The rapid evolution of AI models requires a continuous evaluation. AI-based models can offer personalized learning experiences but raises accuracy concerns. MCQs are widely used for competency assessment. The aim of this study was to evaluate ChatGPT performance in medical microbiology MCQs compared to the students’ performance.The study employed an 80-MCQ dataset from a 2021 medical microbiology exam at the University of Jordan Doctor of Dental Surgery (DDS) Medical Microbiology 2 course. The exam contained 40 midterm and 40 final MCQs, authored by a single instructor without copyright issues. The MCQs were categorized based on the revised Bloom’s Taxonomy into four categories: Remember, Understand, Analyze, or Evaluate. Metrics, including facility index and discriminative efficiency, were derived from 153 midterm and 154 final exam DDS student performances. ChatGPT 3.5 was used to answer questions, and responses were assessed for correctness and clarity by two independent raters.ChatGPT 3.5 correctly answered 64 out of 80 medical microbiology MCQs (80%) but scored below the student average (80.5/100 vs. 86.21/100). Incorrect ChatGPT responses were more common in MCQs with longer choices (p = 0.025). ChatGPT 3.5 performance varied across cognitive domains: Remember (88.5% correct), Understand (82.4% correct), Analyze (75% correct), Evaluate (72% correct), with no statistically significant differences (p = 0.492). Correct ChatGPT responses received statistically significant higher average clarity and correctness scores compared to incorrect responses.The study findings emphasized the need for ongoing refinement and evaluation of ChatGPT performance. ChatGPT 3.5 showed the potential to correctly and clearly answer medical microbiology MCQs; nevertheless, its performance was below-bar compared to the students. Variability in ChatGPT performance in different cognitive domains should be considered in future studies. The study insights could contribute to the ongoing evaluation of the AI-based models’ role in educational assessment and to augment the traditional methods in higher education.","PeriodicalId":52290,"journal":{"name":"Frontiers in Education","volume":"51 8","pages":""},"PeriodicalIF":1.9000,"publicationDate":"2023-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/feduc.2023.1333415","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}
引用次数: 1

Abstract

The transformative potential of artificial intelligence (AI) in higher education is evident, with conversational models like ChatGPT poised to reshape teaching and assessment methods. The rapid evolution of AI models requires a continuous evaluation. AI-based models can offer personalized learning experiences but raises accuracy concerns. MCQs are widely used for competency assessment. The aim of this study was to evaluate ChatGPT performance in medical microbiology MCQs compared to the students’ performance.The study employed an 80-MCQ dataset from a 2021 medical microbiology exam at the University of Jordan Doctor of Dental Surgery (DDS) Medical Microbiology 2 course. The exam contained 40 midterm and 40 final MCQs, authored by a single instructor without copyright issues. The MCQs were categorized based on the revised Bloom’s Taxonomy into four categories: Remember, Understand, Analyze, or Evaluate. Metrics, including facility index and discriminative efficiency, were derived from 153 midterm and 154 final exam DDS student performances. ChatGPT 3.5 was used to answer questions, and responses were assessed for correctness and clarity by two independent raters.ChatGPT 3.5 correctly answered 64 out of 80 medical microbiology MCQs (80%) but scored below the student average (80.5/100 vs. 86.21/100). Incorrect ChatGPT responses were more common in MCQs with longer choices (p = 0.025). ChatGPT 3.5 performance varied across cognitive domains: Remember (88.5% correct), Understand (82.4% correct), Analyze (75% correct), Evaluate (72% correct), with no statistically significant differences (p = 0.492). Correct ChatGPT responses received statistically significant higher average clarity and correctness scores compared to incorrect responses.The study findings emphasized the need for ongoing refinement and evaluation of ChatGPT performance. ChatGPT 3.5 showed the potential to correctly and clearly answer medical microbiology MCQs; nevertheless, its performance was below-bar compared to the students. Variability in ChatGPT performance in different cognitive domains should be considered in future studies. The study insights could contribute to the ongoing evaluation of the AI-based models’ role in educational assessment and to augment the traditional methods in higher education.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
与大学生相比,ChatGPT 在医学微生物学考试中的成绩低于平均水平
人工智能(AI)在高等教育中的变革潜力显而易见,ChatGPT 等会话模式有望重塑教学和评估方法。人工智能模型的快速发展需要持续的评估。基于人工智能的模型可以提供个性化的学习体验,但也引发了对准确性的担忧。MCQs 被广泛用于能力评估。本研究的目的是评估 ChatGPT 在医学微生物学 MCQ 中的表现,并与学生的表现进行比较。研究采用了来自约旦大学牙科博士(DDS)医学微生物学 2 课程 2021 年医学微生物学考试的 80-MCQ 数据集。考试包含 40 个期中和 40 个期末 MCQ,由一名教师编写,无版权问题。MCQ 根据修订后的布鲁姆分类法分为四类:记忆、理解、分析或评价。从 153 个期中考试和 154 个期末考试 DDS 学生的表现中得出了包括设施指数和判别效率在内的指标。ChatGPT 3.5 用于回答问题,回答的正确性和清晰度由两名独立评分员进行评估。ChatGPT 3.5 正确回答了 80 道医学微生物学 MCQ 中的 64 道(80%),但得分低于学生平均水平(80.5/100 vs. 86.21/100)。错误的 ChatGPT 回答在选择较长的 MCQ 中更为常见(p = 0.025)。ChatGPT 3.5 在不同认知领域的表现各不相同:记忆(正确率为 88.5%)、理解(正确率为 82.4%)、分析(正确率为 75%)、评估(正确率为 72%),在统计上没有显著差异 (p = 0.492)。与不正确的回答相比,正确的 ChatGPT 回答在统计意义上获得了更高的平均清晰度和正确率分数。研究结果强调了对 ChatGPT 性能进行持续改进和评估的必要性。ChatGPT 3.5 显示了正确、清晰地回答医学微生物学 MCQ 的潜力;然而,与学生相比,它的表现低于标准。今后的研究应考虑到 ChatGPT 在不同认知领域的表现差异。这项研究的启示有助于不断评估基于人工智能的模型在教育评估中的作用,并在高等教育中增强传统方法的作用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Frontiers in Education
Frontiers in Education Social Sciences-Education
CiteScore
2.90
自引率
8.70%
发文量
887
审稿时长
14 weeks
期刊最新文献
Assessment of math abilities before school entry: a tool development Predictability of Duolingo English mock test for Chinese college-level EFLs: using assessment use argument What makes an excellent reader? Short-term memory contrasts between two groups of children The influence of institutional characteristics on implementing school-based universal addiction prevention: a Hungarian mixed-methods nationwide study on the state of implementation, barriers, and facilitators Foregrounding co-artistry in an aesthetic and plurilingual/pluriliteracies approach to additional language teaching and learning
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1