Accuracy and Readability of ChatGPT on Potential Complications of Interventional Radiology Procedures: AI-Powered Patient Interviewing.

IF 3.8 2区 医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING Academic Radiology Pub Date : 2024-11-16 DOI:10.1016/j.acra.2024.10.028
Esat Kaba, Mehmet Beyazal, Fatma Beyazal Çeliker, İbrahim Yel, Thomas J Vogl
{"title":"Accuracy and Readability of ChatGPT on Potential Complications of Interventional Radiology Procedures: AI-Powered Patient Interviewing.","authors":"Esat Kaba, Mehmet Beyazal, Fatma Beyazal Çeliker, İbrahim Yel, Thomas J Vogl","doi":"10.1016/j.acra.2024.10.028","DOIUrl":null,"url":null,"abstract":"<p><strong>Rationale and objectives: </strong>It is crucial to inform the patient about potential complications and obtain consent before interventional radiology procedures. In this study, we investigated the accuracy, reliability, and readability of the information provided by ChatGPT-4 about potential complications of interventional radiology procedures.</p><p><strong>Materials and methods: </strong>Potential major and minor complications of 25 different interventional radiology procedures (8 non-vascular, 17 vascular) were asked to ChatGPT-4 chatbot. The responses were evaluated by two experienced interventional radiologists (>25 years and 10 years of experience) using a 5-point Likert scale according to Cardiovascular and Interventional Radiological Society of Europe guidelines. The correlation between the two interventional radiologists' scoring was evaluated by the Wilcoxon signed-rank test, Intraclass Correlation Coefficient (ICC), and Pearson correlation coefficient (PCC). In addition, readability and complexity were quantitatively assessed using the Flesch-Kincaid Grade Level, Flesch Reading Ease scores, and Simple Measure of Gobbledygook (SMOG) index.</p><p><strong>Results: </strong>Interventional radiologist 1 (IR1) and interventional radiologist 2 (IR2) gave 104 and 109 points, respectively, out of a potential 125 points for the total of all procedures. There was no statistically significant difference between the total scores of the two IRs (p = 0.244). The IRs demonstrated high agreement across all procedure ratings (ICC=0.928). Both IRs scored 34 out of 40 points for the eight non-vascular procedures. 17 vascular procedures received 70 points out of 85 from IR1 and 75 from IR2. The agreement between the two observers' assessments was good, with PCC values of 0.908 and 0.896 for non-vascular and vascular procedures, respectively. Readability levels were overall low. The mean Flesch-Kincaid Grade Level, Flesch Reading Ease scores, and SMOG index were 12.51 ± 1.14 (college level) 30.27 ± 8.38 (college level), and 14.46 ± 0.76 (college level), respectively. There was no statistically significant difference in readability between non-vascular and vascular procedures (p = 0.16).</p><p><strong>Conclusion: </strong>ChatGPT-4 demonstrated remarkable performance, highlighting its potential to enhance accessibility to information about interventional radiology procedures and support the creation of educational materials for patients. Based on the findings of our study, while ChatGPT provides accurate information and shows no evidence of hallucinations, it is important to emphasize that a high level of education and health literacy are required to fully comprehend its responses.</p>","PeriodicalId":50928,"journal":{"name":"Academic Radiology","volume":" ","pages":""},"PeriodicalIF":3.8000,"publicationDate":"2024-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Academic Radiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.acra.2024.10.028","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0

Abstract

Rationale and objectives: It is crucial to inform the patient about potential complications and obtain consent before interventional radiology procedures. In this study, we investigated the accuracy, reliability, and readability of the information provided by ChatGPT-4 about potential complications of interventional radiology procedures.

Materials and methods: Potential major and minor complications of 25 different interventional radiology procedures (8 non-vascular, 17 vascular) were asked to ChatGPT-4 chatbot. The responses were evaluated by two experienced interventional radiologists (>25 years and 10 years of experience) using a 5-point Likert scale according to Cardiovascular and Interventional Radiological Society of Europe guidelines. The correlation between the two interventional radiologists' scoring was evaluated by the Wilcoxon signed-rank test, Intraclass Correlation Coefficient (ICC), and Pearson correlation coefficient (PCC). In addition, readability and complexity were quantitatively assessed using the Flesch-Kincaid Grade Level, Flesch Reading Ease scores, and Simple Measure of Gobbledygook (SMOG) index.

Results: Interventional radiologist 1 (IR1) and interventional radiologist 2 (IR2) gave 104 and 109 points, respectively, out of a potential 125 points for the total of all procedures. There was no statistically significant difference between the total scores of the two IRs (p = 0.244). The IRs demonstrated high agreement across all procedure ratings (ICC=0.928). Both IRs scored 34 out of 40 points for the eight non-vascular procedures. 17 vascular procedures received 70 points out of 85 from IR1 and 75 from IR2. The agreement between the two observers' assessments was good, with PCC values of 0.908 and 0.896 for non-vascular and vascular procedures, respectively. Readability levels were overall low. The mean Flesch-Kincaid Grade Level, Flesch Reading Ease scores, and SMOG index were 12.51 ± 1.14 (college level) 30.27 ± 8.38 (college level), and 14.46 ± 0.76 (college level), respectively. There was no statistically significant difference in readability between non-vascular and vascular procedures (p = 0.16).

Conclusion: ChatGPT-4 demonstrated remarkable performance, highlighting its potential to enhance accessibility to information about interventional radiology procedures and support the creation of educational materials for patients. Based on the findings of our study, while ChatGPT provides accurate information and shows no evidence of hallucinations, it is important to emphasize that a high level of education and health literacy are required to fully comprehend its responses.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
关于介入放射学手术潜在并发症的 ChatGPT 的准确性和可读性:人工智能驱动的患者访谈。
理由和目标:在介入放射学手术前告知患者潜在并发症并征得同意至关重要。在这项研究中,我们调查了 ChatGPT-4 提供的有关介入放射手术潜在并发症信息的准确性、可靠性和可读性:我们向 ChatGPT-4 聊天机器人询问了 25 种不同介入放射学手术(8 种非血管性手术,17 种血管性手术)的潜在主要和次要并发症。两位经验丰富的介入放射科医生(分别有 25 年以上和 10 年以上的经验)根据欧洲心血管和介入放射学会指南,使用 5 点李克特量表对回答进行了评估。两位介入放射科医生评分之间的相关性通过 Wilcoxon 符号秩检验、类内相关系数 (ICC) 和皮尔逊相关系数 (PCC) 进行评估。此外,还使用 Flesch-Kincaid 分级、Flesch 阅读轻松度评分和简单拗口(SMOG)指数对可读性和复杂性进行了定量评估:结果:介入放射科医生 1(IR1)和介入放射科医生 2(IR2)分别给出了 104 分和 109 分,而所有程序的总分可能是 125 分。两位 IR 的总分在统计学上没有明显差异(p = 0.244)。在所有程序的评分中,独立评审员的评分结果都非常一致(ICC=0.928)。在 8 项非血管手术的 40 分评分中,两位 IR 均获得了 34 分。17 项血管手术中,IR1 和 IR2 分别打出了 70 分和 75 分(满分分别为 85 分和 75 分)。两位观察员的评估结果一致性良好,非血管手术和血管手术的 PCC 值分别为 0.908 和 0.896。可读性水平总体较低。Flesch-Kincaid 等级平均值、Flesch 阅读轻松度得分和 SMOG 指数分别为 12.51 ± 1.14(大学水平)、30.27 ± 8.38(大学水平)和 14.46 ± 0.76(大学水平)。非血管性和血管性手术的可读性差异无统计学意义(P = 0.16):ChatGPT-4表现出色,突显了其在提高介入放射学手术信息的可及性和支持为患者创建教育材料方面的潜力。根据我们的研究结果,虽然 ChatGPT 能提供准确的信息,也没有出现幻觉的迹象,但必须强调的是,要完全理解它的反应,需要较高的教育水平和健康素养。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Academic Radiology
Academic Radiology 医学-核医学
CiteScore
7.60
自引率
10.40%
发文量
432
审稿时长
18 days
期刊介绍: Academic Radiology publishes original reports of clinical and laboratory investigations in diagnostic imaging, the diagnostic use of radioactive isotopes, computed tomography, positron emission tomography, magnetic resonance imaging, ultrasound, digital subtraction angiography, image-guided interventions and related techniques. It also includes brief technical reports describing original observations, techniques, and instrumental developments; state-of-the-art reports on clinical issues, new technology and other topics of current medical importance; meta-analyses; scientific studies and opinions on radiologic education; and letters to the Editor.
期刊最新文献
A Novel Approach Based on Integrating Radiomics, Bone Morphometry and Hounsfield Unit-Derived From Routine Chest CT for Bone Mineral Density Assessment. Authors' Response: FDG-PET/CT in Lung: Beyond Cancer. CT-Defined Coronary Artery Calcification as a Prognostic Marker for Overall Survival in Lung Cancer: A Systematic Review and Meta-analysis. Accuracy and Readability of ChatGPT on Potential Complications of Interventional Radiology Procedures: AI-Powered Patient Interviewing. Automated Kidney Stone Composition Analysis with Photon-Counting Detector CT, a Performance Study-A Phantom Study.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1