Performance of Chatgpt in ophthalmology exam; human versus AI.

IF 1.4 4区 医学 Q3 OPHTHALMOLOGY International Ophthalmology Pub Date : 2024-11-06 DOI:10.1007/s10792-024-03353-w
Ali Safa Balci, Zeliha Yazar, Banu Turgut Ozturk, Cigdem Altan
{"title":"Performance of Chatgpt in ophthalmology exam; human versus AI.","authors":"Ali Safa Balci, Zeliha Yazar, Banu Turgut Ozturk, Cigdem Altan","doi":"10.1007/s10792-024-03353-w","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>This cross-sectional study focuses on evaluating the success rate of ChatGPT in answering questions from the 'Resident Training Development Exam' and comparing these results with the performance of the ophthalmology residents.</p><p><strong>Methods: </strong>The 75 exam questions, across nine sections and three difficulty levels, were presented to ChatGPT. The responses and explanations were recorded. The readability and complexity of the explanations were analyzed and The Flesch Reading Ease (FRE) score (0-100) was recorded using the program named Readable. Residents were categorized into four groups based on their seniority. The overall and seniority-specific success rates of the residents were compared separately with ChatGPT.</p><p><strong>Results: </strong>Out of 69 questions, ChatGPT answered 37 correctly (53.62%). The highest success was in Lens and Cataract (77.77%), and the lowest in Pediatric Ophthalmology and Strabismus (0.00%). Of 789 residents, overall accuracy was 50.37%. Seniority-specific accuracy rates were 43.49%, 51.30%, 54.91%, and 60.05% for 1st to 4th-year residents. ChatGPT ranked 292nd among residents. Difficulty-wise, 11 questions were easy, 44 moderate, and 14 difficult. ChatGPT's accuracy for each level was 63.63%, 54.54%, and 42.85%, respectively. The average FRE score of responses generated by ChatGPT was found to be 27.56 ± 12.40.</p><p><strong>Conclusion: </strong>ChatGPT correctly answered 53.6% of questions in an exam for residents. ChatGPT has a lower success rate on average than a 3rd year resident. The readability of responses provided by ChatGPT is low, and they are difficult to understand. As difficulty increases, ChatGPT's success decreases. Predictably, these results will change with more information loaded into ChatGPT.</p>","PeriodicalId":14473,"journal":{"name":"International Ophthalmology","volume":null,"pages":null},"PeriodicalIF":1.4000,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Ophthalmology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s10792-024-03353-w","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose: This cross-sectional study focuses on evaluating the success rate of ChatGPT in answering questions from the 'Resident Training Development Exam' and comparing these results with the performance of the ophthalmology residents.

Methods: The 75 exam questions, across nine sections and three difficulty levels, were presented to ChatGPT. The responses and explanations were recorded. The readability and complexity of the explanations were analyzed and The Flesch Reading Ease (FRE) score (0-100) was recorded using the program named Readable. Residents were categorized into four groups based on their seniority. The overall and seniority-specific success rates of the residents were compared separately with ChatGPT.

Results: Out of 69 questions, ChatGPT answered 37 correctly (53.62%). The highest success was in Lens and Cataract (77.77%), and the lowest in Pediatric Ophthalmology and Strabismus (0.00%). Of 789 residents, overall accuracy was 50.37%. Seniority-specific accuracy rates were 43.49%, 51.30%, 54.91%, and 60.05% for 1st to 4th-year residents. ChatGPT ranked 292nd among residents. Difficulty-wise, 11 questions were easy, 44 moderate, and 14 difficult. ChatGPT's accuracy for each level was 63.63%, 54.54%, and 42.85%, respectively. The average FRE score of responses generated by ChatGPT was found to be 27.56 ± 12.40.

Conclusion: ChatGPT correctly answered 53.6% of questions in an exam for residents. ChatGPT has a lower success rate on average than a 3rd year resident. The readability of responses provided by ChatGPT is low, and they are difficult to understand. As difficulty increases, ChatGPT's success decreases. Predictably, these results will change with more information loaded into ChatGPT.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Chatgpt 在眼科检查中的表现;人类与人工智能。
目的:本横断面研究的重点是评估 ChatGPT 回答 "住院医师培训发展考试 "问题的成功率,并将这些结果与眼科住院医师的表现进行比较:方法:向 ChatGPT 演示了 75 道考试题,包括九个部分和三个难度级别。方法:在 ChatGPT 上展示了 75 道考题,包括 9 个部分和 3 个难度级别,并记录了答案和解释。使用名为 "Readable "的程序分析了解释的可读性和复杂性,并记录了弗莱什阅读容易度(FRE)得分(0-100)。根据居民的年资将其分为四组。通过 ChatGPT 分别比较了住院医师的总体成功率和特定年资的成功率:在 69 个问题中,ChatGPT 回答正确 37 个(53.62%)。成功率最高的是晶状体和白内障(77.77%),最低的是小儿眼科和斜视(0.00%)。在 789 名住院医师中,总体准确率为 50.37%。一年级至四年级住院医师的准确率分别为 43.49%、51.30%、54.91% 和 60.05%。ChatGPT 在住院医师中排名第 292 位。从难度上看,11 道题简单,44 道题中等,14 道题困难。ChatGPT 在每个级别的准确率分别为 63.63%、54.54% 和 42.85%。ChatGPT 生成的回答的平均 FRE 分数为 27.56 ± 12.40:ChatGPT 正确回答了住院医师考试中 53.6% 的问题。ChatGPT 的平均成功率低于三年级住院医师。ChatGPT 提供的回答可读性低,难以理解。随着难度的增加,ChatGPT 的成功率也在下降。可以预见的是,这些结果会随着 ChatGPT 装载更多信息而发生变化。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
3.20
自引率
0.00%
发文量
451
期刊介绍: International Ophthalmology provides the clinician with articles on all the relevant subspecialties of ophthalmology, with a broad international scope. The emphasis is on presentation of the latest clinical research in the field. In addition, the journal includes regular sections devoted to new developments in technologies, products, and techniques.
期刊最新文献
Retinal vasculature changes in patients with internal carotid artery stenosis. Performance of Chatgpt in ophthalmology exam; human versus AI. Unveiling macular displacement: endotamponade variations in retinal detachment repair outcomes. A small disc size, a big challenge: effect of optic disc size on the correlation between peripapillary choroidal thickness, peripapillary retinal nerve fiber layer, and ganglion cell layer. Clinical profile and etiological spectrum of patients presenting with corneal hydrops over a 12-year period.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1