Harry Collin, Chelsea Tong, Abhishekh Srinivas, Angus Pegler, Philip Allan, Daniel Hagley
{"title":"Evaluating the role of AI chatbots in patient education for abdominal aortic aneurysms: a comparison of ChatGPT and conventional resources.","authors":"Harry Collin, Chelsea Tong, Abhishekh Srinivas, Angus Pegler, Philip Allan, Daniel Hagley","doi":"10.1111/ans.70053","DOIUrl":null,"url":null,"abstract":"<p><strong>Backgrounds: </strong>Abdominal aortic aneurysms (AAA) carry significant risks, yet patient understanding is often limited, with online resources typically low quality. ChatGPT, an artificial intelligence (AI) chatbot, presents a new frontier in patient education, but concerns remain about misinformation. This study evaluates the quality of ChatGPT-generated patient information on AAA.</p><p><strong>Methods: </strong>Eight patient questions on AAA were sourced from a reputable online resource for patient information funded by the Australian Government's Healthdirect Australia (HDA) website and input into ChatGPT's free (ChatGPT-4o mini) and paid (ChatGPT-4) models. A vascular surgeon evaluated response appropriateness. Readability was assessed using the Flesch-Kincaid test. The Patient Education Materials Assessment Tool (PEMAT) measured understandability and actionability, with responses scoring ≥75% for both considered high-quality.</p><p><strong>Results: </strong>All responses were deemed clinically appropriate. Mean response length was longer for ChatGPT than HDA. Readability was at a college level for ChatGPT, while HDA was at a 10th to 12th-grade level. One response was high-quality (generated by paid ChatGPT) with a PEMAT actionability score of ≥75%. Actionability scores were otherwise low across all sources with ChatGPT responses more likely to contain identifiable actions, although these were often not clearly presented. ChatGPT responses were marginally more understandable than HDA.</p><p><strong>Conclusions: </strong>ChatGPT-generated information on AAA was appropriate and understandable, outperforming HDA in both aspects. However, AI responses are at a more advanced reading level and lack actionable instructions. AI chatbots show promise as supplemental tools for AAA patient education, but further refinement is needed to enhance their effectiveness in supporting informed decision-making.</p>","PeriodicalId":8158,"journal":{"name":"ANZ Journal of Surgery","volume":" ","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ANZ Journal of Surgery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1111/ans.70053","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"SURGERY","Score":null,"Total":0}
引用次数: 0
Abstract
Backgrounds: Abdominal aortic aneurysms (AAA) carry significant risks, yet patient understanding is often limited, with online resources typically low quality. ChatGPT, an artificial intelligence (AI) chatbot, presents a new frontier in patient education, but concerns remain about misinformation. This study evaluates the quality of ChatGPT-generated patient information on AAA.
Methods: Eight patient questions on AAA were sourced from a reputable online resource for patient information funded by the Australian Government's Healthdirect Australia (HDA) website and input into ChatGPT's free (ChatGPT-4o mini) and paid (ChatGPT-4) models. A vascular surgeon evaluated response appropriateness. Readability was assessed using the Flesch-Kincaid test. The Patient Education Materials Assessment Tool (PEMAT) measured understandability and actionability, with responses scoring ≥75% for both considered high-quality.
Results: All responses were deemed clinically appropriate. Mean response length was longer for ChatGPT than HDA. Readability was at a college level for ChatGPT, while HDA was at a 10th to 12th-grade level. One response was high-quality (generated by paid ChatGPT) with a PEMAT actionability score of ≥75%. Actionability scores were otherwise low across all sources with ChatGPT responses more likely to contain identifiable actions, although these were often not clearly presented. ChatGPT responses were marginally more understandable than HDA.
Conclusions: ChatGPT-generated information on AAA was appropriate and understandable, outperforming HDA in both aspects. However, AI responses are at a more advanced reading level and lack actionable instructions. AI chatbots show promise as supplemental tools for AAA patient education, but further refinement is needed to enhance their effectiveness in supporting informed decision-making.
期刊介绍:
ANZ Journal of Surgery is published by Wiley on behalf of the Royal Australasian College of Surgeons to provide a medium for the publication of peer-reviewed original contributions related to clinical practice and/or research in all fields of surgery and related disciplines. It also provides a programme of continuing education for surgeons. All articles are peer-reviewed by at least two researchers expert in the field of the submitted paper.