Chat GPT 4o vs residents: French language evaluation in ophthalmology

AJO International Pub Date : 2025-04-28 Epub Date: 2025-01-31 DOI:10.1016/j.ajoint.2025.100104
Leah Attal , Elad Shvartz , Nakhoul Nakhoul , Daniel Bahir
{"title":"Chat GPT 4o vs residents: French language evaluation in ophthalmology","authors":"Leah Attal ,&nbsp;Elad Shvartz ,&nbsp;Nakhoul Nakhoul ,&nbsp;Daniel Bahir","doi":"10.1016/j.ajoint.2025.100104","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><div>Chatbots capable of answering multiple-choice questions (MCQs) at a level comparable to residents could serve as affordable, 24/7 available educational tools with comprehensive explanations. Their non-judgmental nature could enable residents to freely ask questions without hesitation. Therefore, this study's aim is to evaluate ChatGPT 4o's accuracy to MCQs from the national ophthalmology residency examination in French language, compared to residents and other leading AI chatbots</div></div><div><h3>Methods</h3><div>A set of 600 questions from the national ophthalmology examination was translated into French and submitted to ChatGPT 4o, ChatGPT 4, and Gemini Advanced. The generated responses were compared to official correction grids to evaluate their accuracy. Additionally, variations over time, specialties, and accuracy with both text-based and image-based questions were analysed and compared to residents’ results.</div></div><div><h3>Results</h3><div>ChatGPT 4o achieved an accuracy rate of 67.5 %, outperforming the accuracy of ChatGPT 4 and Gemini Advanced. However, Gemini Advanced exhibited greater sensitivity to the ethical considerations involved in medical advice generation. ChatGPT 4o demonstrated consistent accuracy over time, with particular strength in the fundamentals of ophthalmology, ocular pathologies, and refractive surgery. Its performance in image processing was significantly improved compared to other chatbots, though still inferior to text-based processing.</div></div><div><h3>Conclusion</h3><div>ChatGPT 4o demonstrates sufficient accuracy to pass the ophthalmology national examination, though its performance falls short compared to that of residents. These findings suggest that the use of ChatGPT 4o as an educational tool in ophthalmology residency is promising, even in a non-English language. However, further improvements are needed to enhance its performances.</div></div>","PeriodicalId":100071,"journal":{"name":"AJO International","volume":"2 1","pages":"Article 100104"},"PeriodicalIF":0.0000,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AJO International","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2950253525000073","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/31 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose

Chatbots capable of answering multiple-choice questions (MCQs) at a level comparable to residents could serve as affordable, 24/7 available educational tools with comprehensive explanations. Their non-judgmental nature could enable residents to freely ask questions without hesitation. Therefore, this study's aim is to evaluate ChatGPT 4o's accuracy to MCQs from the national ophthalmology residency examination in French language, compared to residents and other leading AI chatbots

Methods

A set of 600 questions from the national ophthalmology examination was translated into French and submitted to ChatGPT 4o, ChatGPT 4, and Gemini Advanced. The generated responses were compared to official correction grids to evaluate their accuracy. Additionally, variations over time, specialties, and accuracy with both text-based and image-based questions were analysed and compared to residents’ results.

Results

ChatGPT 4o achieved an accuracy rate of 67.5 %, outperforming the accuracy of ChatGPT 4 and Gemini Advanced. However, Gemini Advanced exhibited greater sensitivity to the ethical considerations involved in medical advice generation. ChatGPT 4o demonstrated consistent accuracy over time, with particular strength in the fundamentals of ophthalmology, ocular pathologies, and refractive surgery. Its performance in image processing was significantly improved compared to other chatbots, though still inferior to text-based processing.

Conclusion

ChatGPT 4o demonstrates sufficient accuracy to pass the ophthalmology national examination, though its performance falls short compared to that of residents. These findings suggest that the use of ChatGPT 4o as an educational tool in ophthalmology residency is promising, even in a non-English language. However, further improvements are needed to enhance its performances.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
GPT 40与住院医师:眼科的法语评估
目的:能够以与居民相当的水平回答多项选择题(mcq)的聊天机器人可以作为经济实惠的、全天候可用的教育工具,并提供全面的解释。他们的非评判性使居民可以毫不犹豫地自由提问。因此,本研究的目的是评估ChatGPT 40对法语国家眼科住院医师考试mcq的准确性,与居民和其他领先的AI聊天机器人进行比较。方法将全国眼科考试中的600个问题翻译成法语,提交给ChatGPT 40、ChatGPT 4和Gemini Advanced。将生成的回答与官方校正网格进行比较,以评估其准确性。此外,分析了基于文本和基于图像的问题随时间、专业和准确性的变化,并将其与居民的结果进行了比较。结果ChatGPT 40的准确率为67.5%,优于ChatGPT 4和Gemini Advanced。然而,双子座高级表现出更大的敏感性,涉及医疗建议的产生伦理考虑。随着时间的推移,ChatGPT 40显示出一致的准确性,特别是在眼科基础、眼部病理学和屈光手术方面。与其他聊天机器人相比,它在图像处理方面的性能有了显著提高,但仍不如基于文本的处理。结论chatgpt 40的准确性足以通过眼科国家考试,但与住院医师相比仍有一定差距。这些发现表明,使用ChatGPT 40作为眼科住院医师的教育工具是有希望的,即使在非英语语言中也是如此。然而,需要进一步改进以提高其性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Intensive phacoemulsification training in Rwanda: Surgical outcomes of a structured training programme Maribavir for cytomegalovirus retinitis: A case series and review of the literature Twenty years of USAID and NIH eye health funding for Sub-Saharan Africa: A descriptive analysis of a bibliometric dataset Early real-world anatomic response and interval extension after switching to aflibercept 8 mg in previously treated eyes with neovascular AMD or pachychoroid-related macular neovascularization Outcomes of minimally invasive and traditional incisional glaucoma surgery: An IRIS® (intelligent research in sight) registry study
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1