Appropriateness of Ophthalmology Recommendations From an Online Chat-Based Artificial Intelligence Model

Prashant D. Tailor MD , Timothy T. Xu MD , Blake H. Fortes MD , Raymond Iezzi MD , Timothy W. Olsen MD , Matthew R. Starr MD , Sophie J. Bakri MD , Brittni A. Scruggs MD, PhD , Andrew J. Barkmeier MD , Sanjay V. Patel MD , Keith H. Baratz MD , Ashlie A. Bernhisel MD , Lilly H. Wagner MD , Andrea A. Tooley MD , Gavin W. Roddy MD, PhD , Arthur J. Sit MD , Kristi Y. Wu MD , Erick D. Bothun MD , Sasha A. Mansukhani MBBS , Brian G. Mohney MD , Lauren A. Dalvin MD
{"title":"Appropriateness of Ophthalmology Recommendations From an Online Chat-Based Artificial Intelligence Model","authors":"Prashant D. Tailor MD ,&nbsp;Timothy T. Xu MD ,&nbsp;Blake H. Fortes MD ,&nbsp;Raymond Iezzi MD ,&nbsp;Timothy W. Olsen MD ,&nbsp;Matthew R. Starr MD ,&nbsp;Sophie J. Bakri MD ,&nbsp;Brittni A. Scruggs MD, PhD ,&nbsp;Andrew J. Barkmeier MD ,&nbsp;Sanjay V. Patel MD ,&nbsp;Keith H. Baratz MD ,&nbsp;Ashlie A. Bernhisel MD ,&nbsp;Lilly H. Wagner MD ,&nbsp;Andrea A. Tooley MD ,&nbsp;Gavin W. Roddy MD, PhD ,&nbsp;Arthur J. Sit MD ,&nbsp;Kristi Y. Wu MD ,&nbsp;Erick D. Bothun MD ,&nbsp;Sasha A. Mansukhani MBBS ,&nbsp;Brian G. Mohney MD ,&nbsp;Lauren A. Dalvin MD","doi":"10.1016/j.mcpdig.2024.01.003","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><p>To determine the appropriateness of ophthalmology recommendations from an online chat-based artificial intelligence model to ophthalmology questions.</p></div><div><h3>Patients and Methods</h3><p>Cross-sectional qualitative study from April 1, 2023, to April 30, 2023. A total of 192 questions were generated spanning all ophthalmic subspecialties. Each question was posed to a large language model (LLM) 3 times. The responses were graded by appropriate subspecialists as appropriate, inappropriate, or unreliable in 2 grading contexts. The first grading context was if the information was presented on a patient information site. The second was an LLM-generated draft response to patient queries sent by the electronic medical record (EMR). Appropriate was defined as accurate and specific enough to serve as a surrogate for physician-approved information. Main outcome measure was percentage of appropriate responses per subspecialty.</p></div><div><h3>Results</h3><p>For patient information site-related questions, the LLM provided an overall average of 79% appropriate responses. Variable rates of average appropriateness were observed across ophthalmic subspecialties for patient information site information ranging from 56% to 100%: cataract or refractive (92%), cornea (56%), glaucoma (72%), neuro-ophthalmology (67%), oculoplastic or orbital surgery (80%), ocular oncology (100%), pediatrics (89%), vitreoretinal diseases (86%), and uveitis (65%). For draft responses to patient questions via EMR, the LLM provided an overall average of 74% appropriate responses and varied by subspecialty: cataract or refractive (85%), cornea (54%), glaucoma (77%), neuro-ophthalmology (63%), oculoplastic or orbital surgery (62%), ocular oncology (90%), pediatrics (94%), vitreoretinal diseases (88%), and uveitis (55%). Stratifying grades across health information categories (disease and condition, risk and prevention, surgery-related, and treatment and management) showed notable but insignificant variations, with disease and condition often rated highest (72% and 69%) for appropriateness and surgery-related (55% and 51%) lowest, in both contexts.</p></div><div><h3>Conclusion</h3><p>This LLM reported mostly appropriate responses across multiple ophthalmology subspecialties in the context of both patient information sites and EMR-related responses to patient questions. Current LLM offerings require optimization and improvement before widespread clinical use.</p></div>","PeriodicalId":74127,"journal":{"name":"Mayo Clinic Proceedings. Digital health","volume":"2 1","pages":"Pages 119-128"},"PeriodicalIF":0.0000,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S294976122400004X/pdfft?md5=5523855f19c376cfc730f0de31cbe918&pid=1-s2.0-S294976122400004X-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mayo Clinic Proceedings. Digital health","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S294976122400004X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Objective

To determine the appropriateness of ophthalmology recommendations from an online chat-based artificial intelligence model to ophthalmology questions.

Patients and Methods

Cross-sectional qualitative study from April 1, 2023, to April 30, 2023. A total of 192 questions were generated spanning all ophthalmic subspecialties. Each question was posed to a large language model (LLM) 3 times. The responses were graded by appropriate subspecialists as appropriate, inappropriate, or unreliable in 2 grading contexts. The first grading context was if the information was presented on a patient information site. The second was an LLM-generated draft response to patient queries sent by the electronic medical record (EMR). Appropriate was defined as accurate and specific enough to serve as a surrogate for physician-approved information. Main outcome measure was percentage of appropriate responses per subspecialty.

Results

For patient information site-related questions, the LLM provided an overall average of 79% appropriate responses. Variable rates of average appropriateness were observed across ophthalmic subspecialties for patient information site information ranging from 56% to 100%: cataract or refractive (92%), cornea (56%), glaucoma (72%), neuro-ophthalmology (67%), oculoplastic or orbital surgery (80%), ocular oncology (100%), pediatrics (89%), vitreoretinal diseases (86%), and uveitis (65%). For draft responses to patient questions via EMR, the LLM provided an overall average of 74% appropriate responses and varied by subspecialty: cataract or refractive (85%), cornea (54%), glaucoma (77%), neuro-ophthalmology (63%), oculoplastic or orbital surgery (62%), ocular oncology (90%), pediatrics (94%), vitreoretinal diseases (88%), and uveitis (55%). Stratifying grades across health information categories (disease and condition, risk and prevention, surgery-related, and treatment and management) showed notable but insignificant variations, with disease and condition often rated highest (72% and 69%) for appropriateness and surgery-related (55% and 51%) lowest, in both contexts.

Conclusion

This LLM reported mostly appropriate responses across multiple ophthalmology subspecialties in the context of both patient information sites and EMR-related responses to patient questions. Current LLM offerings require optimization and improvement before widespread clinical use.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于在线聊天的人工智能模型提供的眼科建议的适宜性
患者和方法2023 年 4 月 1 日至 2023 年 4 月 30 日进行的横断面定性研究。共生成 192 个问题,涵盖所有眼科亚专科。每个问题都向大语言模型(LLM)提出 3 次。相应的亚专科医生在两个分级情境中将回答分为适当、不适当或不可靠。第一种分级情境是信息是否出现在患者信息网站上。第二种是由 LLM 生成的对电子病历 (EMR) 发送的患者询问的回复草稿。适当被定义为足够准确和具体,可以作为医生批准信息的替代。主要结果指标为每个亚专科的适当回复百分比。结果对于与患者信息网站相关的问题,LLM 提供的总体平均适当回复率为 79%。各眼科亚专科对患者信息网站信息的平均合适率从 56% 到 100% 不等:白内障或屈光(92%)、角膜(56%)、青光眼(72%)、神经眼科(67%)、眼部整形或眼眶手术(80%)、眼部肿瘤(100%)、儿科(89%)、玻璃体视网膜疾病(86%)和葡萄膜炎(65%)。对于通过电子病历回答患者问题的草稿,法律硕士平均提供了 74% 的适当答复,并因亚专科而异:白内障或屈光(85%)、角膜(54%)、青光眼(77%)、神经眼科(63%)、眼部整形或眼眶手术(62%)、眼部肿瘤(90%)、儿科(94%)、玻璃体视网膜疾病(88%)和葡萄膜炎(55%)。对健康信息类别(疾病和病情、风险和预防、手术相关以及治疗和管理)的分层评分显示出显著但不明显的差异,在这两种情况下,疾病和病情的适当性往往被评为最高(72%和69%),而手术相关的适当性则被评为最低(55%和51%)。在临床广泛使用之前,目前的 LLM 产品需要优化和改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Mayo Clinic Proceedings. Digital health
Mayo Clinic Proceedings. Digital health Medicine and Dentistry (General), Health Informatics, Public Health and Health Policy
自引率
0.00%
发文量
0
审稿时长
47 days
期刊最新文献
Developing a Research Center for Artificial Intelligence in Medicine Strategic Considerations for Selecting Artificial Intelligence Solutions for Institutional Integration: A Single-Center Experience Reviewers for Mayo Clinic Proceedings: Digital Health (2024) A Blueprint for Clinical-Driven Medical Device Development: The Feverkidstool Application to Identify Children With Serious Bacterial Infection Cost-Effectiveness of Artificial Intelligence-Enabled Electrocardiograms for Early Detection of Low Ejection Fraction: A Secondary Analysis of the Electrocardiogram Artificial Intelligence-Guided Screening for Low Ejection Fraction Trial
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1