{"title":"Introducing AI as members of script concordance test expert reference panel: A comparative analysis.","authors":"Moataz A Sallam, Enjy Abouzeid","doi":"10.1080/0142159X.2025.2473620","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The Script Concordance Test (SCT) is increasingly used in professional development to assess clinical reasoning, with linear progression in SCT performance observed as clinical experience increases. One challenge in implementing SCT is the potential burnout of expert reference panel (ERP) members. To address this, we introduced ChatGPT as panel members. The aim was to enhance the efficiency of SCT creation while maintaining educational content quality and to explore the effectiveness of different models as reference panels.</p><p><strong>Methodology: </strong>A quasi-experimental comparative design was employed, involving all undergraduate medical students and faculty members enrolled in the Ophthalmology clerkship. Two groups involved Traditional ERP which consisted of 15 experts, diversified in clinical experience: 5 senior residents, 5 lecturers, and 5 professors and AI-Generated ERP which is a panel generated using ChatGPT and o1 preview, designed to mirror diverse clinical opinions based on varying experience levels.</p><p><strong>Results: </strong>Experts consistently achieved the highest mean scores across most vignettes, with ChatGPT-4 and o1 scores generally slightly lower. Notably, the o1 mean scores were closer to those of experts compared to ChatGPT-4. Significant differences were observed between ChatGPT-4 and o1 scores in certain vignettes. These values indicate a strong level of consistency, suggesting that both experts and AI models provided highly reliable ratings.</p><p><strong>Conclusion: </strong>These findings suggest that while AI models cannot replace human experts, they can be effectively used to train students, enhance reasoning skills, and help narrow the gap between student and expert performance.</p>","PeriodicalId":18643,"journal":{"name":"Medical Teacher","volume":" ","pages":"1-8"},"PeriodicalIF":3.3000,"publicationDate":"2025-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical Teacher","FirstCategoryId":"95","ListUrlMain":"https://doi.org/10.1080/0142159X.2025.2473620","RegionNum":2,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION, SCIENTIFIC DISCIPLINES","Score":null,"Total":0}
引用次数: 0
Abstract
Background: The Script Concordance Test (SCT) is increasingly used in professional development to assess clinical reasoning, with linear progression in SCT performance observed as clinical experience increases. One challenge in implementing SCT is the potential burnout of expert reference panel (ERP) members. To address this, we introduced ChatGPT as panel members. The aim was to enhance the efficiency of SCT creation while maintaining educational content quality and to explore the effectiveness of different models as reference panels.
Methodology: A quasi-experimental comparative design was employed, involving all undergraduate medical students and faculty members enrolled in the Ophthalmology clerkship. Two groups involved Traditional ERP which consisted of 15 experts, diversified in clinical experience: 5 senior residents, 5 lecturers, and 5 professors and AI-Generated ERP which is a panel generated using ChatGPT and o1 preview, designed to mirror diverse clinical opinions based on varying experience levels.
Results: Experts consistently achieved the highest mean scores across most vignettes, with ChatGPT-4 and o1 scores generally slightly lower. Notably, the o1 mean scores were closer to those of experts compared to ChatGPT-4. Significant differences were observed between ChatGPT-4 and o1 scores in certain vignettes. These values indicate a strong level of consistency, suggesting that both experts and AI models provided highly reliable ratings.
Conclusion: These findings suggest that while AI models cannot replace human experts, they can be effectively used to train students, enhance reasoning skills, and help narrow the gap between student and expert performance.
期刊介绍:
Medical Teacher provides accounts of new teaching methods, guidance on structuring courses and assessing achievement, and serves as a forum for communication between medical teachers and those involved in general education. In particular, the journal recognizes the problems teachers have in keeping up-to-date with the developments in educational methods that lead to more effective teaching and learning at a time when the content of the curriculum—from medical procedures to policy changes in health care provision—is also changing. The journal features reports of innovation and research in medical education, case studies, survey articles, practical guidelines, reviews of current literature and book reviews. All articles are peer reviewed.