Large language model doctor: assessing the ability of ChatGPT-4 to deliver interventional radiology procedural information to patients during the consent process.

IF 1.2 Q3 CARDIAC & CARDIOVASCULAR SYSTEMS CVIR Endovascular Pub Date : 2024-11-29 DOI:10.1186/s42155-024-00477-z
Hayden L Hofmann, Jenanan Vairavamurthy
{"title":"Large language model doctor: assessing the ability of ChatGPT-4 to deliver interventional radiology procedural information to patients during the consent process.","authors":"Hayden L Hofmann, Jenanan Vairavamurthy","doi":"10.1186/s42155-024-00477-z","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>The study aims to evaluate how current interventional radiologists view ChatGPT in the context of informed consent for interventional radiology (IR) procedures.</p><p><strong>Methods: </strong>ChatGPT-4 was instructed to outline the risks, benefits, and alternatives for IR procedures. The outputs were reviewed by IR physicians to assess if outputs were 1) accurate, 2) comprehensive, 3) easy to understand, 4) written in a conversational tone, and 5) if they were comfortable providing the output to the patient. For each criterion, outputs were measured on a 5-point scale. Mean scores and percentage of physicians rating output as sufficient (4 or 5 on 5-point scale) were measured. A linear regression correlated mean rating with number of years in practice. Intraclass correlation coefficient (ICC) measured agreement among physicians.</p><p><strong>Results: </strong>The mean rating of the ChatGPT responses was 4.29, 3.85, 4.15, 4.24, 3.82 for accuracy, comprehensiveness, readability, conversational tone, and physician comfort level, respectively. Percentage of physicians rating outputs as sufficient was 84%, 71%, 85%, 85%, and 67% for accuracy, comprehensiveness, readability, conversational tone, and physician comfort level, respectively. There was an inverse relationship between years in training and output score (coeff = -0.03413, p = 0.0128); ICC measured 0.39 (p = 0.003).</p><p><strong>Conclusions: </strong>GPT-4 produced outputs that were accurate, understandable, and in a conversational tone. However, GPT-4 had a decreased capacity to produce a comprehensive output leading some physicians to be uncomfortable providing the output to patients. Practicing IRs should be aware of these limitations when counseling patients as ChatGPT-4 continues to develop into a clinically usable AI tool.</p>","PeriodicalId":52351,"journal":{"name":"CVIR Endovascular","volume":"7 1","pages":"83"},"PeriodicalIF":1.2000,"publicationDate":"2024-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11607371/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"CVIR Endovascular","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s42155-024-00477-z","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CARDIAC & CARDIOVASCULAR SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose: The study aims to evaluate how current interventional radiologists view ChatGPT in the context of informed consent for interventional radiology (IR) procedures.

Methods: ChatGPT-4 was instructed to outline the risks, benefits, and alternatives for IR procedures. The outputs were reviewed by IR physicians to assess if outputs were 1) accurate, 2) comprehensive, 3) easy to understand, 4) written in a conversational tone, and 5) if they were comfortable providing the output to the patient. For each criterion, outputs were measured on a 5-point scale. Mean scores and percentage of physicians rating output as sufficient (4 or 5 on 5-point scale) were measured. A linear regression correlated mean rating with number of years in practice. Intraclass correlation coefficient (ICC) measured agreement among physicians.

Results: The mean rating of the ChatGPT responses was 4.29, 3.85, 4.15, 4.24, 3.82 for accuracy, comprehensiveness, readability, conversational tone, and physician comfort level, respectively. Percentage of physicians rating outputs as sufficient was 84%, 71%, 85%, 85%, and 67% for accuracy, comprehensiveness, readability, conversational tone, and physician comfort level, respectively. There was an inverse relationship between years in training and output score (coeff = -0.03413, p = 0.0128); ICC measured 0.39 (p = 0.003).

Conclusions: GPT-4 produced outputs that were accurate, understandable, and in a conversational tone. However, GPT-4 had a decreased capacity to produce a comprehensive output leading some physicians to be uncomfortable providing the output to patients. Practicing IRs should be aware of these limitations when counseling patients as ChatGPT-4 continues to develop into a clinically usable AI tool.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
大语言模型医生:评估ChatGPT-4在同意过程中向患者传递介入放射学程序信息的能力。
目的:本研究旨在评估当前介入放射科医生在介入放射学(IR)程序知情同意的背景下如何看待ChatGPT。方法:ChatGPT-4被指示概述IR手术的风险、益处和替代方案。IR医生对输出进行审查,以评估输出是否1)准确,2)全面,3)易于理解,4)以会话语气书写,5)他们是否愿意向患者提供输出。对于每个标准,产出以5分制进行衡量。测量了平均得分和医生将输出评为足够的百分比(5分制为4或5)。在实践中,平均评级与年数线性回归相关。类内相关系数(ICC)衡量医师之间的一致性。结果:ChatGPT回答在准确性、全面性、可读性、会话语气和医生舒适度方面的平均评分分别为4.29、3.85、4.15、4.24、3.82。在准确性、全面性、可读性、会话语气和医生舒适度方面,医生认为输出足够的比例分别为84%、71%、85%、85%和67%。训练年限与输出得分呈负相关(coeff = -0.03413, p = 0.0128);ICC测量为0.39 (p = 0.003)。结论:GPT-4产生的输出是准确的,可理解的,并在会话语气。然而,GPT-4产生全面输出的能力下降,导致一些医生对向患者提供输出感到不舒服。随着ChatGPT-4继续发展成为临床可用的人工智能工具,执业的IRs在咨询患者时应该意识到这些局限性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CVIR Endovascular
CVIR Endovascular Medicine-Radiology, Nuclear Medicine and Imaging
CiteScore
2.30
自引率
0.00%
发文量
59
期刊最新文献
Treatment rationale in nutcracker syndrome with concurrent pelvic congestion syndrome. Axillo-caval extra-anatomic venous bypass creation via direct percutaneous puncture of the superior vena cava. Percutaneous Closure Device Controlled INCRAFT Stentgraft Implantation Registry (PUCCINI). Percutaneous large-bore mechanical thrombectomy for macroscopic fat pulmonary embolism: a case report. Postoperative symptom changes following uterine artery embolization for uterine fibroid based on FIGO classification.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1