Evaluating Large Language Models on their Accuracy and Completeness: Immune Checkpoint Inhibitors and their Ocular Toxicities.

IF 2.3 2区 医学 Q2 OPHTHALMOLOGY Retina-The Journal of Retinal and Vitreous Diseases Pub Date : 2024-09-18 DOI:10.1097/IAE.0000000000004271
Camellia Edalat, Nila Kirupaharan, Lauren A Dalvin, Kapil Mishra, Rayna Marshall, Hannah Xu, Jasmine H Francis, Meghan Berkenstock
{"title":"Evaluating Large Language Models on their Accuracy and Completeness: Immune Checkpoint Inhibitors and their Ocular Toxicities.","authors":"Camellia Edalat, Nila Kirupaharan, Lauren A Dalvin, Kapil Mishra, Rayna Marshall, Hannah Xu, Jasmine H Francis, Meghan Berkenstock","doi":"10.1097/IAE.0000000000004271","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>To analyze the accuracy and thoroughness of 3 large language models (LLMs) to produce information for providers about immune checkpoint inhibitor (ICI) ocular toxicities.</p><p><strong>Methods: </strong>Eight questions were created about the general definition of checkpoint inhibitors, their mechanism of action, ocular toxicities, and toxicity management. All were inputted into ChatGPT 4.0, Bard, and LLaMA programs. Utilizing the 6-point Likert scale for accuracy and completeness, four ophthalmologists who routinely treat ocular toxicities of immunotherapy agents rated the LLMs answers. ANOVA testing was used to assess significant differences among the three LLMs and a post-hoc pairwise t-test. Fleiss kappa values were calculated to account for interrater variability.</p><p><strong>Results: </strong>ChatGPT responses were rated with an average of 4.59 for accuracy and 4.09 for completeness; Bard answers were rated 4.59 and 4.19; LLaMA results were rated 4.38 and 4.03. The three LLMs did not significantly differ in accuracy (p=0.47) nor completeness (p=0.86). Fleiss kappa values were found to be poor for both accuracy (-0.03) and completeness (0.01).</p><p><strong>Conclusions: </strong>All three LLMs provided highly accurate and complete responses to questions centered on ICI inhibitor ocular toxicities and management. Further studies are needed to assess specific ICI agents and the accuracy and completeness of updated versions of LLMs.</p>","PeriodicalId":54486,"journal":{"name":"Retina-The Journal of Retinal and Vitreous Diseases","volume":null,"pages":null},"PeriodicalIF":2.3000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Retina-The Journal of Retinal and Vitreous Diseases","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/IAE.0000000000004271","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose: To analyze the accuracy and thoroughness of 3 large language models (LLMs) to produce information for providers about immune checkpoint inhibitor (ICI) ocular toxicities.

Methods: Eight questions were created about the general definition of checkpoint inhibitors, their mechanism of action, ocular toxicities, and toxicity management. All were inputted into ChatGPT 4.0, Bard, and LLaMA programs. Utilizing the 6-point Likert scale for accuracy and completeness, four ophthalmologists who routinely treat ocular toxicities of immunotherapy agents rated the LLMs answers. ANOVA testing was used to assess significant differences among the three LLMs and a post-hoc pairwise t-test. Fleiss kappa values were calculated to account for interrater variability.

Results: ChatGPT responses were rated with an average of 4.59 for accuracy and 4.09 for completeness; Bard answers were rated 4.59 and 4.19; LLaMA results were rated 4.38 and 4.03. The three LLMs did not significantly differ in accuracy (p=0.47) nor completeness (p=0.86). Fleiss kappa values were found to be poor for both accuracy (-0.03) and completeness (0.01).

Conclusions: All three LLMs provided highly accurate and complete responses to questions centered on ICI inhibitor ocular toxicities and management. Further studies are needed to assess specific ICI agents and the accuracy and completeness of updated versions of LLMs.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
评估大型语言模型的准确性和完整性:免疫检查点抑制剂及其眼部毒性。
目的:分析 3 种大型语言模型(LLM)为医疗服务提供者提供免疫检查点抑制剂(ICI)眼部毒性信息的准确性和全面性:创建了 8 个问题,涉及检查点抑制剂的一般定义、作用机制、眼部毒性和毒性管理。所有问题均已输入 ChatGPT 4.0、Bard 和 LLaMA 程序。四位经常治疗免疫疗法药物眼部毒性的眼科专家利用 6 点李克特量表对 LLMs 的答案进行了准确性和完整性评分。采用方差分析和事后配对 t 检验来评估三种 LLM 之间的显著差异。计算了弗莱斯卡帕值,以考虑到研究者之间的差异:ChatGPT 回答的准确性平均为 4.59 分,完整性平均为 4.09 分;Bard 回答的准确性平均为 4.59 分,完整性平均为 4.19 分;LLaMA 的准确性平均为 4.38 分,完整性平均为 4.03 分。三种 LLM 在准确性(p=0.47)和完整性(p=0.86)方面没有明显差异。在准确性(-0.03)和完整性(0.01)方面,Fleiss kappa 值都很低:所有三种 LLM 对 ICI 抑制剂眼部毒性和管理问题的回答都非常准确和完整。还需要进一步研究来评估特定 ICI 药物以及 LLMs 更新版本的准确性和完整性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
5.70
自引率
9.10%
发文量
554
审稿时长
3-6 weeks
期刊介绍: ​RETINA® focuses exclusively on the growing specialty of vitreoretinal disorders. The Journal provides current information on diagnostic and therapeutic techniques. Its highly specialized and informative, peer-reviewed articles are easily applicable to clinical practice. In addition to regular reports from clinical and basic science investigators, RETINA® publishes special features including periodic review articles on pertinent topics, special articles dealing with surgical and other therapeutic techniques, and abstract cards. Issues are abundantly illustrated in vivid full color. Published 12 times per year, RETINA® is truly a “must have” publication for anyone connected to this field.
期刊最新文献
Purtscher-Like Retinopathy and Acute Macular Neuroretinopathy in a Child with Acute Influenza A. Correspondence. ACUTE POSTERIOR MULTIFOCAL PLACOID PIGMENT EPITHELIOPATHY AND PLACOID VARIANT DISEASES MASQUERADING AS AGE-RELATED MACULAR DEGENERATION IN THE ELDERLY: A Case Series. Correspondence. Reply.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1