Evaluating the strengths and limitations of multimodal ChatGPT-4 in detecting glaucoma using fundus images

S. A. Alryalat, Ayman Musleh, M. Kahook
{"title":"Evaluating the strengths and limitations of multimodal ChatGPT-4 in detecting glaucoma using fundus images","authors":"S. A. Alryalat, Ayman Musleh, M. Kahook","doi":"10.3389/fopht.2024.1387190","DOIUrl":null,"url":null,"abstract":"This study evaluates the diagnostic accuracy of a multimodal large language model (LLM), ChatGPT-4, in recognizing glaucoma using color fundus photographs (CFPs) with a benchmark dataset and without prior training or fine tuning.The publicly accessible Retinal Fundus Glaucoma Challenge “REFUGE” dataset was utilized for analyses. The input data consisted of the entire 400 image testing set. The task involved classifying fundus images into either ‘Likely Glaucomatous’ or ‘Likely Non-Glaucomatous’. We constructed a confusion matrix to visualize the results of predictions from ChatGPT-4, focusing on accuracy of binary classifications (glaucoma vs non-glaucoma).ChatGPT-4 demonstrated an accuracy of 90% with a 95% confidence interval (CI) of 87.06%-92.94%. The sensitivity was found to be 50% (95% CI: 34.51%-65.49%), while the specificity was 94.44% (95% CI: 92.08%-96.81%). The precision was recorded at 50% (95% CI: 34.51%-65.49%), and the F1 Score was 0.50.ChatGPT-4 achieved relatively high diagnostic accuracy without prior fine tuning on CFPs. Considering the scarcity of data in specialized medical fields, including ophthalmology, the use of advanced AI techniques, such as LLMs, might require less data for training compared to other forms of AI with potential savings in time and financial resources. It may also pave the way for the development of innovative tools to support specialized medical care, particularly those dependent on multimodal data for diagnosis and follow-up, irrespective of resource constraints.","PeriodicalId":510339,"journal":{"name":"Frontiers in Ophthalmology","volume":" 19","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Ophthalmology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fopht.2024.1387190","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This study evaluates the diagnostic accuracy of a multimodal large language model (LLM), ChatGPT-4, in recognizing glaucoma using color fundus photographs (CFPs) with a benchmark dataset and without prior training or fine tuning.The publicly accessible Retinal Fundus Glaucoma Challenge “REFUGE” dataset was utilized for analyses. The input data consisted of the entire 400 image testing set. The task involved classifying fundus images into either ‘Likely Glaucomatous’ or ‘Likely Non-Glaucomatous’. We constructed a confusion matrix to visualize the results of predictions from ChatGPT-4, focusing on accuracy of binary classifications (glaucoma vs non-glaucoma).ChatGPT-4 demonstrated an accuracy of 90% with a 95% confidence interval (CI) of 87.06%-92.94%. The sensitivity was found to be 50% (95% CI: 34.51%-65.49%), while the specificity was 94.44% (95% CI: 92.08%-96.81%). The precision was recorded at 50% (95% CI: 34.51%-65.49%), and the F1 Score was 0.50.ChatGPT-4 achieved relatively high diagnostic accuracy without prior fine tuning on CFPs. Considering the scarcity of data in specialized medical fields, including ophthalmology, the use of advanced AI techniques, such as LLMs, might require less data for training compared to other forms of AI with potential savings in time and financial resources. It may also pave the way for the development of innovative tools to support specialized medical care, particularly those dependent on multimodal data for diagnosis and follow-up, irrespective of resource constraints.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
评估多模式 ChatGPT-4 在利用眼底图像检测青光眼方面的优势和局限性
本研究评估了多模态大语言模型(LLM)ChatGPT-4在使用基准数据集和彩色眼底照片(CFP)识别青光眼方面的诊断准确性,无需事先训练或微调。输入数据包括整个 400 张图像测试集。任务包括将眼底图像分类为 "可能为青光眼 "或 "可能为非青光眼"。我们构建了一个混淆矩阵来直观显示 ChatGPT-4 的预测结果,重点关注二元分类(青光眼与非青光眼)的准确性。ChatGPT-4 的准确率为 90%,95% 置信区间(CI)为 87.06%-92.94%。灵敏度为 50%(95% CI:34.51%-65.49%),特异度为 94.44%(95% CI:92.08%-96.81%)。ChatGPT-4 无需事先对 CFP 进行微调就能达到相对较高的诊断准确性。考虑到包括眼科在内的专业医疗领域数据稀缺,与其他形式的人工智能相比,使用 LLM 等先进的人工智能技术可能需要更少的数据进行训练,从而节省时间和财政资源。它还可能为开发创新工具铺平道路,以支持专业医疗护理,特别是那些依赖多模态数据进行诊断和随访的医疗护理,而不受资源限制。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Oculometric biomarkers of visuomotor deficits in clinically asymptomatic patients with systemic lupus erythematosus undergoing long-term hydroxychloroquine treatment The significance of growth shells in development of symmetry, transparency, and refraction of the human lens Editorial: Retinal metabolism in health and disease Computational single fundus image restoration techniques: a review Systematic review of ocular surface treatments in the setting of thyroid eye disease
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1