Large language model doctor: assessing the ability of ChatGPT-4 to deliver interventional radiology procedural information to patients during the consent process.

IF 1.2 Q3 CARDIAC & CARDIOVASCULAR SYSTEMS CVIR Endovascular Pub Date : 2024-11-29 DOI:10.1186/s42155-024-00477-z

Hayden L Hofmann, Jenanan Vairavamurthy

{"title":"Large language model doctor: assessing the ability of ChatGPT-4 to deliver interventional radiology procedural information to patients during the consent process.","authors":"Hayden L Hofmann, Jenanan Vairavamurthy","doi":"10.1186/s42155-024-00477-z","DOIUrl":null,"url":null,"abstract":"Purpose: The study aims to evaluate how current interventional radiologists view ChatGPT in the context of informed consent for interventional radiology (IR) procedures.Methods: ChatGPT-4 was instructed to outline the risks, benefits, and alternatives for IR procedures. The outputs were reviewed by IR physicians to assess if outputs were 1) accurate, 2) comprehensive, 3) easy to understand, 4) written in a conversational tone, and 5) if they were comfortable providing the output to the patient. For each criterion, outputs were measured on a 5-point scale. Mean scores and percentage of physicians rating output as sufficient (4 or 5 on 5-point scale) were measured. A linear regression correlated mean rating with number of years in practice. Intraclass correlation coefficient (ICC) measured agreement among physicians.Results: The mean rating of the ChatGPT responses was 4.29, 3.85, 4.15, 4.24, 3.82 for accuracy, comprehensiveness, readability, conversational tone, and physician comfort level, respectively. Percentage of physicians rating outputs as sufficient was 84%, 71%, 85%, 85%, and 67% for accuracy, comprehensiveness, readability, conversational tone, and physician comfort level, respectively. There was an inverse relationship between years in training and output score (coeff = -0.03413, p = 0.0128); ICC measured 0.39 (p = 0.003).Conclusions: GPT-4 produced outputs that were accurate, understandable, and in a conversational tone. However, GPT-4 had a decreased capacity to produce a comprehensive output leading some physicians to be uncomfortable providing the output to patients. Practicing IRs should be aware of these limitations when counseling patients as ChatGPT-4 continues to develop into a clinically usable AI tool.","PeriodicalId":52351,"journal":{"name":"CVIR Endovascular","volume":"7 1","pages":"83"},"PeriodicalIF":1.2000,"publicationDate":"2024-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11607371/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"CVIR Endovascular","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s42155-024-00477-z","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CARDIAC & CARDIOVASCULAR SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose: The study aims to evaluate how current interventional radiologists view ChatGPT in the context of informed consent for interventional radiology (IR) procedures.

Methods: ChatGPT-4 was instructed to outline the risks, benefits, and alternatives for IR procedures. The outputs were reviewed by IR physicians to assess if outputs were 1) accurate, 2) comprehensive, 3) easy to understand, 4) written in a conversational tone, and 5) if they were comfortable providing the output to the patient. For each criterion, outputs were measured on a 5-point scale. Mean scores and percentage of physicians rating output as sufficient (4 or 5 on 5-point scale) were measured. A linear regression correlated mean rating with number of years in practice. Intraclass correlation coefficient (ICC) measured agreement among physicians.

Results: The mean rating of the ChatGPT responses was 4.29, 3.85, 4.15, 4.24, 3.82 for accuracy, comprehensiveness, readability, conversational tone, and physician comfort level, respectively. Percentage of physicians rating outputs as sufficient was 84%, 71%, 85%, 85%, and 67% for accuracy, comprehensiveness, readability, conversational tone, and physician comfort level, respectively. There was an inverse relationship between years in training and output score (coeff = -0.03413, p = 0.0128); ICC measured 0.39 (p = 0.003).

Conclusions: GPT-4 produced outputs that were accurate, understandable, and in a conversational tone. However, GPT-4 had a decreased capacity to produce a comprehensive output leading some physicians to be uncomfortable providing the output to patients. Practicing IRs should be aware of these limitations when counseling patients as ChatGPT-4 continues to develop into a clinically usable AI tool.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

大语言模型医生：评估ChatGPT-4在同意过程中向患者传递介入放射学程序信息的能力。

目的：本研究旨在评估当前介入放射科医生在介入放射学（IR）程序知情同意的背景下如何看待ChatGPT。方法：ChatGPT-4被指示概述IR手术的风险、益处和替代方案。IR医生对输出进行审查，以评估输出是否1)准确，2)全面，3)易于理解，4)以会话语气书写，5)他们是否愿意向患者提供输出。对于每个标准，产出以5分制进行衡量。测量了平均得分和医生将输出评为足够的百分比（5分制为4或5）。在实践中，平均评级与年数线性回归相关。类内相关系数（ICC）衡量医师之间的一致性。结果：ChatGPT回答在准确性、全面性、可读性、会话语气和医生舒适度方面的平均评分分别为4.29、3.85、4.15、4.24、3.82。在准确性、全面性、可读性、会话语气和医生舒适度方面，医生认为输出足够的比例分别为84%、71%、85%、85%和67%。训练年限与输出得分呈负相关（coeff = -0.03413, p = 0.0128）；ICC测量为0.39 （p = 0.003）。结论：GPT-4产生的输出是准确的，可理解的，并在会话语气。然而，GPT-4产生全面输出的能力下降，导致一些医生对向患者提供输出感到不舒服。随着ChatGPT-4继续发展成为临床可用的人工智能工具，执业的IRs在咨询患者时应该意识到这些局限性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

CVIR Endovascular Medicine-Radiology, Nuclear Medicine and Imaging

CiteScore

2.30

自引率

0.00%

发文量