Evaluating the role of AI chatbots in patient education for abdominal aortic aneurysms: a comparison of ChatGPT and conventional resources.

IF 1.5 4区医学 Q3 SURGERY ANZ Journal of Surgery Pub Date : 2025-03-05 DOI:10.1111/ans.70053

Harry Collin, Chelsea Tong, Abhishekh Srinivas, Angus Pegler, Philip Allan, Daniel Hagley

{"title":"Evaluating the role of AI chatbots in patient education for abdominal aortic aneurysms: a comparison of ChatGPT and conventional resources.","authors":"Harry Collin, Chelsea Tong, Abhishekh Srinivas, Angus Pegler, Philip Allan, Daniel Hagley","doi":"10.1111/ans.70053","DOIUrl":null,"url":null,"abstract":"Backgrounds: Abdominal aortic aneurysms (AAA) carry significant risks, yet patient understanding is often limited, with online resources typically low quality. ChatGPT, an artificial intelligence (AI) chatbot, presents a new frontier in patient education, but concerns remain about misinformation. This study evaluates the quality of ChatGPT-generated patient information on AAA.Methods: Eight patient questions on AAA were sourced from a reputable online resource for patient information funded by the Australian Government's Healthdirect Australia (HDA) website and input into ChatGPT's free (ChatGPT-4o mini) and paid (ChatGPT-4) models. A vascular surgeon evaluated response appropriateness. Readability was assessed using the Flesch-Kincaid test. The Patient Education Materials Assessment Tool (PEMAT) measured understandability and actionability, with responses scoring ≥75% for both considered high-quality.Results: All responses were deemed clinically appropriate. Mean response length was longer for ChatGPT than HDA. Readability was at a college level for ChatGPT, while HDA was at a 10th to 12th-grade level. One response was high-quality (generated by paid ChatGPT) with a PEMAT actionability score of ≥75%. Actionability scores were otherwise low across all sources with ChatGPT responses more likely to contain identifiable actions, although these were often not clearly presented. ChatGPT responses were marginally more understandable than HDA.Conclusions: ChatGPT-generated information on AAA was appropriate and understandable, outperforming HDA in both aspects. However, AI responses are at a more advanced reading level and lack actionable instructions. AI chatbots show promise as supplemental tools for AAA patient education, but further refinement is needed to enhance their effectiveness in supporting informed decision-making.","PeriodicalId":8158,"journal":{"name":"ANZ Journal of Surgery","volume":" ","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ANZ Journal of Surgery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1111/ans.70053","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"SURGERY","Score":null,"Total":0}

引用次数: 0

Abstract

Backgrounds: Abdominal aortic aneurysms (AAA) carry significant risks, yet patient understanding is often limited, with online resources typically low quality. ChatGPT, an artificial intelligence (AI) chatbot, presents a new frontier in patient education, but concerns remain about misinformation. This study evaluates the quality of ChatGPT-generated patient information on AAA.

Methods: Eight patient questions on AAA were sourced from a reputable online resource for patient information funded by the Australian Government's Healthdirect Australia (HDA) website and input into ChatGPT's free (ChatGPT-4o mini) and paid (ChatGPT-4) models. A vascular surgeon evaluated response appropriateness. Readability was assessed using the Flesch-Kincaid test. The Patient Education Materials Assessment Tool (PEMAT) measured understandability and actionability, with responses scoring ≥75% for both considered high-quality.

Results: All responses were deemed clinically appropriate. Mean response length was longer for ChatGPT than HDA. Readability was at a college level for ChatGPT, while HDA was at a 10th to 12th-grade level. One response was high-quality (generated by paid ChatGPT) with a PEMAT actionability score of ≥75%. Actionability scores were otherwise low across all sources with ChatGPT responses more likely to contain identifiable actions, although these were often not clearly presented. ChatGPT responses were marginally more understandable than HDA.

Conclusions: ChatGPT-generated information on AAA was appropriate and understandable, outperforming HDA in both aspects. However, AI responses are at a more advanced reading level and lack actionable instructions. AI chatbots show promise as supplemental tools for AAA patient education, but further refinement is needed to enhance their effectiveness in supporting informed decision-making.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

求助全文

约1分钟内获得全文去求助

来源期刊

ANZ Journal of Surgery 医学-外科

CiteScore

2.50

自引率

11.80%

发文量

720

审稿时长

2 months

期刊介绍： ANZ Journal of Surgery is published by Wiley on behalf of the Royal Australasian College of Surgeons to provide a medium for the publication of peer-reviewed original contributions related to clinical practice and/or research in all fields of surgery and related disciplines. It also provides a programme of continuing education for surgeons. All articles are peer-reviewed by at least two researchers expert in the field of the submitted paper.