{"title":"Appropriateness of ChatGPT in Answering Heart Failure Related Questions","authors":"","doi":"10.1016/j.hlc.2024.03.005","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><p>Heart failure requires complex management, and increased patient knowledge has been shown to improve outcomes. This study assessed the knowledge of Chat Generative Pre-trained Transformer (ChatGPT) and its appropriateness as a supplemental resource of information for patients with heart failure.</p></div><div><h3>Method</h3><p>A total of 107 frequently asked heart failure-related questions were included in 3 categories: “basic knowledge” (49), “management” (41) and “other” (17). Two responses per question were generated using both GPT-3.5 and GPT-4 (i.e., two responses per question per model). The accuracy and reproducibility of responses were graded by two reviewers, board-certified in cardiology, with differences resolved by a third reviewer, board-certified in cardiology and advanced heart failure. Accuracy was graded using a four-point scale: (1) comprehensive, (2) correct but inadequate, (3) some correct and some incorrect, and (4) completely incorrect.</p></div><div><h3>Results</h3><p>GPT-4 provided 107/107 (100%) responses with correct information. Further, GPT-4 displayed a greater proportion of comprehensive knowledge for the categories of “basic knowledge” and “management” (89.8% and 82.9%, respectively). For GPT-3, there were two total responses (1.9%) graded as “some correct and incorrect” for GPT-3.5, while no “completely incorrect” responses were produced. With respect to comprehensive knowledge, GPT-3.5 performed best in the “management” category and “other” category (prognosis, procedures, and support) (78.1%, 94.1%). The models also provided highly reproducible responses, with GPT-3.5 scoring above 94% in every category and GPT-4 with 100% for all answers.</p></div><div><h3>Conclusions</h3><p>GPT-3.5 and GPT-4 answered the majority of heart failure-related questions accurately and reliably. If validated in future studies, ChatGPT may serve as a useful tool in the future by providing accessible health-related information and education to patients living with heart failure. In its current state, ChatGPT necessitates further rigorous testing and validation to ensure patient safety and equity across all patient demographics.</p></div>","PeriodicalId":13000,"journal":{"name":"Heart, Lung and Circulation","volume":"33 9","pages":"Pages 1314-1318"},"PeriodicalIF":2.2000,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1443950624001653/pdfft?md5=19c34317ccf8cf7b45fd251e734d1e0c&pid=1-s2.0-S1443950624001653-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Heart, Lung and Circulation","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1443950624001653","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CARDIAC & CARDIOVASCULAR SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Background
Heart failure requires complex management, and increased patient knowledge has been shown to improve outcomes. This study assessed the knowledge of Chat Generative Pre-trained Transformer (ChatGPT) and its appropriateness as a supplemental resource of information for patients with heart failure.
Method
A total of 107 frequently asked heart failure-related questions were included in 3 categories: “basic knowledge” (49), “management” (41) and “other” (17). Two responses per question were generated using both GPT-3.5 and GPT-4 (i.e., two responses per question per model). The accuracy and reproducibility of responses were graded by two reviewers, board-certified in cardiology, with differences resolved by a third reviewer, board-certified in cardiology and advanced heart failure. Accuracy was graded using a four-point scale: (1) comprehensive, (2) correct but inadequate, (3) some correct and some incorrect, and (4) completely incorrect.
Results
GPT-4 provided 107/107 (100%) responses with correct information. Further, GPT-4 displayed a greater proportion of comprehensive knowledge for the categories of “basic knowledge” and “management” (89.8% and 82.9%, respectively). For GPT-3, there were two total responses (1.9%) graded as “some correct and incorrect” for GPT-3.5, while no “completely incorrect” responses were produced. With respect to comprehensive knowledge, GPT-3.5 performed best in the “management” category and “other” category (prognosis, procedures, and support) (78.1%, 94.1%). The models also provided highly reproducible responses, with GPT-3.5 scoring above 94% in every category and GPT-4 with 100% for all answers.
Conclusions
GPT-3.5 and GPT-4 answered the majority of heart failure-related questions accurately and reliably. If validated in future studies, ChatGPT may serve as a useful tool in the future by providing accessible health-related information and education to patients living with heart failure. In its current state, ChatGPT necessitates further rigorous testing and validation to ensure patient safety and equity across all patient demographics.
期刊介绍:
Heart, Lung and Circulation publishes articles integrating clinical and research activities in the fields of basic cardiovascular science, clinical cardiology and cardiac surgery, with a focus on emerging issues in cardiovascular disease. The journal promotes multidisciplinary dialogue between cardiologists, cardiothoracic surgeons, cardio-pulmonary physicians and cardiovascular scientists.