GPT-4 使心血管磁共振报告变得简单易懂。

IF 4.2 1区 医学 Q1 CARDIAC & CARDIOVASCULAR SYSTEMS Journal of Cardiovascular Magnetic Resonance Pub Date : 2024-01-01 Epub Date: 2024-03-07 DOI:10.1016/j.jocmr.2024.101035
Babak Salam, Dmitrij Kravchenko, Sebastian Nowak, Alois M Sprinkart, Leonie Weinhold, Anna Odenthal, Narine Mesropyan, Leon M Bischoff, Ulrike Attenberger, Daniel L Kuetting, Julian A Luetkens, Alexander Isaak
{"title":"GPT-4 使心血管磁共振报告变得简单易懂。","authors":"Babak Salam, Dmitrij Kravchenko, Sebastian Nowak, Alois M Sprinkart, Leonie Weinhold, Anna Odenthal, Narine Mesropyan, Leon M Bischoff, Ulrike Attenberger, Daniel L Kuetting, Julian A Luetkens, Alexander Isaak","doi":"10.1016/j.jocmr.2024.101035","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Patients are increasingly using Generative Pre-trained Transformer 4 (GPT-4) to better understand their own radiology findings.</p><p><strong>Purpose: </strong>To evaluate the performance of GPT-4 in transforming cardiovascular magnetic resonance (CMR) reports into text that is comprehensible to medical laypersons.</p><p><strong>Methods: </strong>ChatGPT with GPT-4 architecture was used to generate three different explained versions of 20 various CMR reports (n = 60) using the same prompt: \"Explain the radiology report in a language understandable to a medical layperson\". Two cardiovascular radiologists evaluated understandability, factual correctness, completeness of relevant findings, and lack of potential harm, while 13 medical laypersons evaluated the understandability of the original and the GPT-4 reports on a Likert scale (1 \"strongly disagree\", 5 \"strongly agree\"). Readability was measured using the Automated Readability Index (ARI). Linear mixed-effects models (values given as median [interquartile range]) and intraclass correlation coefficient (ICC) were used for statistical analysis.</p><p><strong>Results: </strong>GPT-4 reports were generated on average in 52 s ± 13. GPT-4 reports achieved a lower ARI score (10 [9-12] vs 5 [4-6]; p < 0.001) and were subjectively easier to understand for laypersons than original reports (1 [1] vs 4 [4,5]; p < 0.001). Eighteen out of 20 (90%) standard CMR reports and 2/60 (3%) GPT-generated reports had an ARI score corresponding to the 8th grade level or higher. Radiologists' ratings of the GPT-4 reports reached high levels for correctness (5 [4, 5]), completeness (5 [5]), and lack of potential harm (5 [5]); with \"strong agreement\" for factual correctness in 94% (113/120) and completeness of relevant findings in 81% (97/120) of reports. Test-retest agreement for layperson understandability ratings between the three simplified reports generated from the same original report was substantial (ICC: 0.62; p < 0.001). Interrater agreement between radiologists was almost perfect for lack of potential harm (ICC: 0.93, p < 0.001) and moderate to substantial for completeness (ICC: 0.76, p < 0.001) and factual correctness (ICC: 0.55, p < 0.001).</p><p><strong>Conclusion: </strong>GPT-4 can reliably transform complex CMR reports into more understandable, layperson-friendly language while largely maintaining factual correctness and completeness, and can thus help convey patient-relevant radiology information in an easy-to-understand manner.</p>","PeriodicalId":15221,"journal":{"name":"Journal of Cardiovascular Magnetic Resonance","volume":" ","pages":"101035"},"PeriodicalIF":4.2000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10981113/pdf/","citationCount":"0","resultStr":"{\"title\":\"Generative Pre-trained Transformer 4 makes cardiovascular magnetic resonance reports easy to understand.\",\"authors\":\"Babak Salam, Dmitrij Kravchenko, Sebastian Nowak, Alois M Sprinkart, Leonie Weinhold, Anna Odenthal, Narine Mesropyan, Leon M Bischoff, Ulrike Attenberger, Daniel L Kuetting, Julian A Luetkens, Alexander Isaak\",\"doi\":\"10.1016/j.jocmr.2024.101035\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Patients are increasingly using Generative Pre-trained Transformer 4 (GPT-4) to better understand their own radiology findings.</p><p><strong>Purpose: </strong>To evaluate the performance of GPT-4 in transforming cardiovascular magnetic resonance (CMR) reports into text that is comprehensible to medical laypersons.</p><p><strong>Methods: </strong>ChatGPT with GPT-4 architecture was used to generate three different explained versions of 20 various CMR reports (n = 60) using the same prompt: \\\"Explain the radiology report in a language understandable to a medical layperson\\\". Two cardiovascular radiologists evaluated understandability, factual correctness, completeness of relevant findings, and lack of potential harm, while 13 medical laypersons evaluated the understandability of the original and the GPT-4 reports on a Likert scale (1 \\\"strongly disagree\\\", 5 \\\"strongly agree\\\"). Readability was measured using the Automated Readability Index (ARI). Linear mixed-effects models (values given as median [interquartile range]) and intraclass correlation coefficient (ICC) were used for statistical analysis.</p><p><strong>Results: </strong>GPT-4 reports were generated on average in 52 s ± 13. GPT-4 reports achieved a lower ARI score (10 [9-12] vs 5 [4-6]; p < 0.001) and were subjectively easier to understand for laypersons than original reports (1 [1] vs 4 [4,5]; p < 0.001). Eighteen out of 20 (90%) standard CMR reports and 2/60 (3%) GPT-generated reports had an ARI score corresponding to the 8th grade level or higher. Radiologists' ratings of the GPT-4 reports reached high levels for correctness (5 [4, 5]), completeness (5 [5]), and lack of potential harm (5 [5]); with \\\"strong agreement\\\" for factual correctness in 94% (113/120) and completeness of relevant findings in 81% (97/120) of reports. Test-retest agreement for layperson understandability ratings between the three simplified reports generated from the same original report was substantial (ICC: 0.62; p < 0.001). Interrater agreement between radiologists was almost perfect for lack of potential harm (ICC: 0.93, p < 0.001) and moderate to substantial for completeness (ICC: 0.76, p < 0.001) and factual correctness (ICC: 0.55, p < 0.001).</p><p><strong>Conclusion: </strong>GPT-4 can reliably transform complex CMR reports into more understandable, layperson-friendly language while largely maintaining factual correctness and completeness, and can thus help convey patient-relevant radiology information in an easy-to-understand manner.</p>\",\"PeriodicalId\":15221,\"journal\":{\"name\":\"Journal of Cardiovascular Magnetic Resonance\",\"volume\":\" \",\"pages\":\"101035\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2024-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10981113/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Cardiovascular Magnetic Resonance\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1016/j.jocmr.2024.101035\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/3/7 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"CARDIAC & CARDIOVASCULAR SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cardiovascular Magnetic Resonance","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.jocmr.2024.101035","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/3/7 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"CARDIAC & CARDIOVASCULAR SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

背景:目的:评估 GPT-4 在将心血管磁共振(CMR)报告转化为医学外行人可理解的文本方面的性能:采用 GPT-4 架构的 ChatGPT 生成了 20 份不同 CMR 报告的三个不同解释版本,使用相同的提示 "用医学外行人可理解的语言解释放射学报告"(n=60)。两名心血管放射科医生对报告的可理解性、事实的正确性、相关结果的完整性以及是否存在潜在危害进行了评估,而 13 名非专业医务人员则以李克特量表(1 分 "非常不同意",5 分 "非常同意")对原始报告和 GPT-4 报告的可理解性进行了评估。可读性采用自动可读性指数(ARI)进行测量。统计分析采用线性混合效应模型(数值为中位数[四分位之间])和类内相关系数(ICC):GPT-4报告的平均生成时间为52秒±13秒,GPT-4报告的ARI得分较低(10 [9-12] vs 5 [4-6];pth级别或更高。放射医师对 GPT-4 报告的正确性(5 [4-5])、完整性(5 [5-5])和无潜在危害性(5 [5-5])的评分均达到较高水平;94%(113/120)的报告在事实正确性方面 "非常一致",81%(97/120)的报告在相关结果的完整性方面 "非常一致"。由同一原始报告生成的三份简化报告之间的外行人可理解性评分的测试-再测试一致性非常高(ICC:0.62; p结论:GPT-4 可以可靠地将复杂的 CMR 报告转化为更易懂、更易于普通人理解的语言,同时基本保持事实的正确性和完整性,因此有助于以通俗易懂的方式传达与患者相关的放射学信息。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Generative Pre-trained Transformer 4 makes cardiovascular magnetic resonance reports easy to understand.

Background: Patients are increasingly using Generative Pre-trained Transformer 4 (GPT-4) to better understand their own radiology findings.

Purpose: To evaluate the performance of GPT-4 in transforming cardiovascular magnetic resonance (CMR) reports into text that is comprehensible to medical laypersons.

Methods: ChatGPT with GPT-4 architecture was used to generate three different explained versions of 20 various CMR reports (n = 60) using the same prompt: "Explain the radiology report in a language understandable to a medical layperson". Two cardiovascular radiologists evaluated understandability, factual correctness, completeness of relevant findings, and lack of potential harm, while 13 medical laypersons evaluated the understandability of the original and the GPT-4 reports on a Likert scale (1 "strongly disagree", 5 "strongly agree"). Readability was measured using the Automated Readability Index (ARI). Linear mixed-effects models (values given as median [interquartile range]) and intraclass correlation coefficient (ICC) were used for statistical analysis.

Results: GPT-4 reports were generated on average in 52 s ± 13. GPT-4 reports achieved a lower ARI score (10 [9-12] vs 5 [4-6]; p < 0.001) and were subjectively easier to understand for laypersons than original reports (1 [1] vs 4 [4,5]; p < 0.001). Eighteen out of 20 (90%) standard CMR reports and 2/60 (3%) GPT-generated reports had an ARI score corresponding to the 8th grade level or higher. Radiologists' ratings of the GPT-4 reports reached high levels for correctness (5 [4, 5]), completeness (5 [5]), and lack of potential harm (5 [5]); with "strong agreement" for factual correctness in 94% (113/120) and completeness of relevant findings in 81% (97/120) of reports. Test-retest agreement for layperson understandability ratings between the three simplified reports generated from the same original report was substantial (ICC: 0.62; p < 0.001). Interrater agreement between radiologists was almost perfect for lack of potential harm (ICC: 0.93, p < 0.001) and moderate to substantial for completeness (ICC: 0.76, p < 0.001) and factual correctness (ICC: 0.55, p < 0.001).

Conclusion: GPT-4 can reliably transform complex CMR reports into more understandable, layperson-friendly language while largely maintaining factual correctness and completeness, and can thus help convey patient-relevant radiology information in an easy-to-understand manner.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
10.90
自引率
12.50%
发文量
61
审稿时长
6-12 weeks
期刊介绍: Journal of Cardiovascular Magnetic Resonance (JCMR) publishes high-quality articles on all aspects of basic, translational and clinical research on the design, development, manufacture, and evaluation of cardiovascular magnetic resonance (CMR) methods applied to the cardiovascular system. Topical areas include, but are not limited to: New applications of magnetic resonance to improve the diagnostic strategies, risk stratification, characterization and management of diseases affecting the cardiovascular system. New methods to enhance or accelerate image acquisition and data analysis. Results of multicenter, or larger single-center studies that provide insight into the utility of CMR. Basic biological perceptions derived by CMR methods.
期刊最新文献
Automatic Vessel Segmentation and Reformation of Non-contrast Coronary Magnetic Resonance Angiography Using Transfer Learning-based 3D U-net with Attention Mechanism. Absence of cardiac impairment in patients after SARS-CoV-2 infection: a long-term follow-up study. Simultaneous Multislice Cardiac Multimapping based on Locally Low-Rank and Sparsity Constraints. Joint suppression of cardiac bSSFP cine banding and flow artifacts using twofold phase-cycling and a dual-encoder neural network. Non-Invasively Measured Myocardial Torsional Modulus: Comparison to Invasive Evaluation of Diastolic Function.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1