Appropriateness of ChatGPT in Answering Heart Failure Related Questions

IF 2.2 4区医学 Q2 CARDIAC & CARDIOVASCULAR SYSTEMS Heart, Lung and Circulation Pub Date : 2024-09-01 DOI:10.1016/j.hlc.2024.03.005

{"title":"Appropriateness of ChatGPT in Answering Heart Failure Related Questions","authors":"","doi":"10.1016/j.hlc.2024.03.005","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><p>Heart failure requires complex management, and increased patient knowledge has been shown to improve outcomes. This study assessed the knowledge of Chat Generative Pre-trained Transformer (ChatGPT) and its appropriateness as a supplemental resource of information for patients with heart failure.</p></div><div><h3>Method</h3><p>A total of 107 frequently asked heart failure-related questions were included in 3 categories: “basic knowledge” (49), “management” (41) and “other” (17). Two responses per question were generated using both GPT-3.5 and GPT-4 (i.e., two responses per question per model). The accuracy and reproducibility of responses were graded by two reviewers, board-certified in cardiology, with differences resolved by a third reviewer, board-certified in cardiology and advanced heart failure. Accuracy was graded using a four-point scale: (1) comprehensive, (2) correct but inadequate, (3) some correct and some incorrect, and (4) completely incorrect.</p></div><div><h3>Results</h3><p>GPT-4 provided 107/107 (100%) responses with correct information. Further, GPT-4 displayed a greater proportion of comprehensive knowledge for the categories of “basic knowledge” and “management” (89.8% and 82.9%, respectively). For GPT-3, there were two total responses (1.9%) graded as “some correct and incorrect” for GPT-3.5, while no “completely incorrect” responses were produced. With respect to comprehensive knowledge, GPT-3.5 performed best in the “management” category and “other” category (prognosis, procedures, and support) (78.1%, 94.1%). The models also provided highly reproducible responses, with GPT-3.5 scoring above 94% in every category and GPT-4 with 100% for all answers.</p></div><div><h3>Conclusions</h3><p>GPT-3.5 and GPT-4 answered the majority of heart failure-related questions accurately and reliably. If validated in future studies, ChatGPT may serve as a useful tool in the future by providing accessible health-related information and education to patients living with heart failure. In its current state, ChatGPT necessitates further rigorous testing and validation to ensure patient safety and equity across all patient demographics.</p></div>","PeriodicalId":13000,"journal":{"name":"Heart, Lung and Circulation","volume":"33 9","pages":"Pages 1314-1318"},"PeriodicalIF":2.2000,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1443950624001653/pdfft?md5=19c34317ccf8cf7b45fd251e734d1e0c&pid=1-s2.0-S1443950624001653-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Heart, Lung and Circulation","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1443950624001653","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CARDIAC & CARDIOVASCULAR SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Background

Heart failure requires complex management, and increased patient knowledge has been shown to improve outcomes. This study assessed the knowledge of Chat Generative Pre-trained Transformer (ChatGPT) and its appropriateness as a supplemental resource of information for patients with heart failure.

Method

A total of 107 frequently asked heart failure-related questions were included in 3 categories: “basic knowledge” (49), “management” (41) and “other” (17). Two responses per question were generated using both GPT-3.5 and GPT-4 (i.e., two responses per question per model). The accuracy and reproducibility of responses were graded by two reviewers, board-certified in cardiology, with differences resolved by a third reviewer, board-certified in cardiology and advanced heart failure. Accuracy was graded using a four-point scale: (1) comprehensive, (2) correct but inadequate, (3) some correct and some incorrect, and (4) completely incorrect.

Results

GPT-4 provided 107/107 (100%) responses with correct information. Further, GPT-4 displayed a greater proportion of comprehensive knowledge for the categories of “basic knowledge” and “management” (89.8% and 82.9%, respectively). For GPT-3, there were two total responses (1.9%) graded as “some correct and incorrect” for GPT-3.5, while no “completely incorrect” responses were produced. With respect to comprehensive knowledge, GPT-3.5 performed best in the “management” category and “other” category (prognosis, procedures, and support) (78.1%, 94.1%). The models also provided highly reproducible responses, with GPT-3.5 scoring above 94% in every category and GPT-4 with 100% for all answers.

Conclusions

GPT-3.5 and GPT-4 answered the majority of heart failure-related questions accurately and reliably. If validated in future studies, ChatGPT may serve as a useful tool in the future by providing accessible health-related information and education to patients living with heart failure. In its current state, ChatGPT necessitates further rigorous testing and validation to ensure patient safety and equity across all patient demographics.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

ChatGPT 在回答心力衰竭相关问题时的适用性。

背景：心力衰竭需要复杂的管理，而增加患者知识已被证明可以改善预后。本研究评估了聊天生成预训练转换器（ChatGPT）的知识及其作为心衰患者补充信息资源的适当性：共有 107 个与心力衰竭相关的常见问题被分为 3 类："基础知识"（49 个）、"管理"（41 个）和 "其他"（17 个）。每个问题使用 GPT-3.5 和 GPT-4 生成两个回答（即每个模型每个问题有两个回答）。回答的准确性和可重复性由两名获得心脏病学专业认证的评审员进行评分，不同意见由第三名获得心脏病学和高级心力衰竭专业认证的评审员解决。准确性采用四级评分法：(1) 全面，(2) 正确但不充分，(3) 部分正确，部分不正确，(4) 完全不正确：结果：GPT-4 提供了 107/107 个（100%）具有正确信息的回答。此外，GPT-4 在 "基础知识 "和 "管理 "类别中显示出更大比例的全面知识（分别为 89.8% 和 82.9%）。就 GPT-3 而言，在 GPT-3.5 中，共有两份答卷（1.9%）被评为 "部分正确，部分不正确"，但没有 "完全不正确 "的答卷。在综合知识方面，GPT-3.5 在 "管理 "类别和 "其他 "类别（预后、程序和支持）中表现最佳（78.1%、94.1%）。这些模型还提供了具有高度可重复性的答案，GPT-3.5 在每个类别中的得分都高于 94%，而 GPT-4 在所有答案中的得分均为 100%：结论：GPT-3.5 和 GPT-4 能准确可靠地回答大多数心衰相关问题。如果在未来的研究中得到验证，ChatGPT 将成为一种有用的工具，为心衰患者提供便捷的健康相关信息和教育。就目前的状况而言，ChatGPT 还需要进一步的严格测试和验证，以确保患者的安全和在所有患者人口统计学中的公平性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Heart, Lung and Circulation CARDIAC & CARDIOVASCULAR SYSTEMS-

CiteScore

4.50

自引率

3.80%

发文量

912

审稿时长

11.9 weeks

期刊介绍： Heart, Lung and Circulation publishes articles integrating clinical and research activities in the fields of basic cardiovascular science, clinical cardiology and cardiac surgery, with a focus on emerging issues in cardiovascular disease. The journal promotes multidisciplinary dialogue between cardiologists, cardiothoracic surgeons, cardio-pulmonary physicians and cardiovascular scientists.