Assessing the Responses of Large Language Models (ChatGPT-4, Claude 3, Gemini, and Microsoft Copilot) to Frequently Asked Questions in Retinopathy of Prematurity: A Study on Readability and Appropriateness.

IF 1 4区 医学 Q4 OPHTHALMOLOGY Journal of Pediatric Ophthalmology & Strabismus Pub Date : 2024-10-28 DOI:10.3928/01913913-20240911-05
Serhat Ermis, Ece Özal, Murat Karapapak, Ebrar Kumantaş, Sadık Altan Özal
{"title":"Assessing the Responses of Large Language Models (ChatGPT-4, Claude 3, Gemini, and Microsoft Copilot) to Frequently Asked Questions in Retinopathy of Prematurity: A Study on Readability and Appropriateness.","authors":"Serhat Ermis, Ece Özal, Murat Karapapak, Ebrar Kumantaş, Sadık Altan Özal","doi":"10.3928/01913913-20240911-05","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>To assess the appropriateness and readability of responses provided by four large language models (LLMs) (ChatGPT-4, Claude 3, Gemini, and Microsoft Co-pilot) to parents' queries pertaining to retinopathy of prematurity (ROP).</p><p><strong>Methods: </strong>A total of 60 frequently asked questions were collated and categorized into six distinct sections. The responses generated by the LLMs were evaluated by three experienced ROP specialists to determine their appropriateness and comprehensiveness. Additionally, the readability of the responses was assessed using a range of metrics, including the Flesch-Kincaid Grade Level (FKGL), Gunning Fog (GF) Index, Coleman-Liau (CL) Index, Simple Measure of Gobbledygook (SMOG) Index, and Flesch Reading Ease (FRE) score.</p><p><strong>Results: </strong>ChatGPT-4 demonstrated the highest level of appropriateness (100%) and performed exceptionally well in the Likert analysis, scoring 5 points on 96% of questions. The CL Index and FRE scores identified Gemini as the most readable LLM, whereas the GF Index and SMOG Index rated Microsoft Copilot as the most readable. Nevertheless, ChatGPT-4 exhibited the most intricate text structure, with scores of 18.56 on the GF Index, 18.56 on the CL Index, 17.2 on the SMOG Index, and 9.45 on the FRE score. This suggests that the responses demand a college-level comprehension.</p><p><strong>Conclusions: </strong>ChatGPT-4 demonstrated higher performance than other LLMs in responding to questions related to ROP; however, its texts were more complex. In terms of readability, Gemini and Microsoft Copilot were found to be more successful. <b>[<i>J Pediatr Ophthalmol Strabismus</i>. 20XX;XX(X):XXX-XXX.]</b>.</p>","PeriodicalId":50095,"journal":{"name":"Journal of Pediatric Ophthalmology & Strabismus","volume":" ","pages":"1-12"},"PeriodicalIF":1.0000,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Pediatric Ophthalmology & Strabismus","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3928/01913913-20240911-05","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose: To assess the appropriateness and readability of responses provided by four large language models (LLMs) (ChatGPT-4, Claude 3, Gemini, and Microsoft Co-pilot) to parents' queries pertaining to retinopathy of prematurity (ROP).

Methods: A total of 60 frequently asked questions were collated and categorized into six distinct sections. The responses generated by the LLMs were evaluated by three experienced ROP specialists to determine their appropriateness and comprehensiveness. Additionally, the readability of the responses was assessed using a range of metrics, including the Flesch-Kincaid Grade Level (FKGL), Gunning Fog (GF) Index, Coleman-Liau (CL) Index, Simple Measure of Gobbledygook (SMOG) Index, and Flesch Reading Ease (FRE) score.

Results: ChatGPT-4 demonstrated the highest level of appropriateness (100%) and performed exceptionally well in the Likert analysis, scoring 5 points on 96% of questions. The CL Index and FRE scores identified Gemini as the most readable LLM, whereas the GF Index and SMOG Index rated Microsoft Copilot as the most readable. Nevertheless, ChatGPT-4 exhibited the most intricate text structure, with scores of 18.56 on the GF Index, 18.56 on the CL Index, 17.2 on the SMOG Index, and 9.45 on the FRE score. This suggests that the responses demand a college-level comprehension.

Conclusions: ChatGPT-4 demonstrated higher performance than other LLMs in responding to questions related to ROP; however, its texts were more complex. In terms of readability, Gemini and Microsoft Copilot were found to be more successful. [J Pediatr Ophthalmol Strabismus. 20XX;XX(X):XXX-XXX.].

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
评估大型语言模型(ChatGPT-4、Claude 3、Gemini 和 Microsoft Copilot)对早产儿视网膜病变常见问题的反应:关于可读性和适宜性的研究。
目的:评估四种大型语言模型(LLMs)(ChatGPT-4、Claude 3、Gemini 和 Microsoft Co-pilot)对家长有关早产儿视网膜病变(ROP)的询问所做回答的适当性和可读性:方法:共整理了 60 个常见问题,并将其分为六个不同的部分。由三位经验丰富的早产儿视网膜病变专家对 LLM 生成的回复进行评估,以确定其是否恰当和全面。此外,还使用了一系列指标来评估回复的可读性,包括弗莱什-金凯德分级(FKGL)、冈宁雾(GF)指数、科尔曼-利亚(CL)指数、简单拗口指数(SMOG)和弗莱什阅读容易度(FRE)评分:ChatGPT-4 的合适度最高(100%),在李克特分析中表现优异,96% 的问题都得了 5 分。CL 指数和 FRE 分数将 Gemini 评为可读性最高的 LLM,而 GF 指数和 SMOG 指数则将 Microsoft Copilot 评为可读性最高的 LLM。然而,ChatGPT-4 的文本结构最为复杂,GF 指数为 18.56,CL 指数为 18.56,SMOG 指数为 17.2,FRE 分数为 9.45。这表明这些回答需要大学水平的理解能力:结论:ChatGPT-4 在回答有关 ROP 的问题时表现出比其他 LLM 更高的性能,但其文本更为复杂。就可读性而言,Gemini 和 Microsoft Copilot 更为成功。[20XX;XX(X):XXX-XXX.].
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
1.80
自引率
8.30%
发文量
115
审稿时长
>12 weeks
期刊介绍: The Journal of Pediatric Ophthalmology & Strabismus is a bimonthly peer-reviewed publication for pediatric ophthalmologists. The Journal has published original articles on the diagnosis, treatment, and prevention of eye disorders in the pediatric age group and the treatment of strabismus in all age groups for over 50 years.
期刊最新文献
Changes in PAX7 Positive Satellite Cells in Extraocular Muscle After Strabismus Surgery. Long-term Motor and Sensory Outcomes After Unilateral Lateral Rectus Recession-Medial Rectus Resection for Infantile Constant Exotropia. Ophthalmic Outcomes in Children With Spina Bifida Myelomeningocele in Ireland. Capturing Bilateral Retinoblastoma. Botulinum Toxin A Augmentation of Strabismus Surgery for Large-Angle Strabismus: A Retrospective Case Series and Literature Review.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1