Daniel J. Bertges MD , Adam W. Beck MD , Marc Schermerhorn MD , Mark K. Eskandari MD , Jens Eldrup-Jorgensen MD , Sean Liebscher MD , Robyn Guinto MD , Mead Ferris MD , Andy Stanley MD , Georg Steinthorsson MD , Matthew Alef MD , Salvatore T. Scali MD
{"title":"Testing ChatGPT's Ability to Provide Patient and Physician Information on Aortic Aneurysm","authors":"Daniel J. Bertges MD , Adam W. Beck MD , Marc Schermerhorn MD , Mark K. Eskandari MD , Jens Eldrup-Jorgensen MD , Sean Liebscher MD , Robyn Guinto MD , Mead Ferris MD , Andy Stanley MD , Georg Steinthorsson MD , Matthew Alef MD , Salvatore T. Scali MD","doi":"10.1016/j.jss.2025.01.015","DOIUrl":null,"url":null,"abstract":"<div><h3>Introduction</h3><div>Our objective was to test the ability of ChatGPT 4.0 to provide accurate information for patients and physicians about abdominal aortic aneurysms (AAA) and to assess its alignment with Society for Vascular Surgery (SVS) clinical practice guidelines (CPG) for AAA care.</div></div><div><h3>Material and methods</h3><div>Fifteen patient-level questions, 37 questions selected to reflect 28 SVS CPGs and 4 questions regarding AAA rupture risk were posed to ChatGPT 4.0. Single responses were recorded and graded for accuracy and quality by ten board-certified vascular surgeons as well as two fellow trainees using a 5-point Likert scale; 1 = very poor, 2 = poor, 3 = fair, 4 = good, and 5 = excellent.</div></div><div><h3>Results</h3><div>The mean of the means (MoM) accuracy rating across all 15 patient-level questions was 4.4 (SD 0.4, quartile range (QR) 4.2-4.7). ChatGPT 4.0 demonstrated good alignment with SVS practice guidelines (MoM: 4.2, SD: 0.4, QR: 3.9-4.5). The accuracy of responses was consistent across guideline categories; screening or surveillance (4.2), indications for surgery (4.5), preoperative risk assessment (4.5), perioperative coronary revascularization (4.1), and perioperative management (4.2). The generative artificial intelligence bot demonstrated only fair performance in answering the annual AAA rupture risk (MoM: 3.4, SD: 1.2, QR: 2.3-4.3).</div></div><div><h3>Conclusions</h3><div>ChatGPT 4.0 provided accurate responses to a variety of patient-level questions regarding AAA. Responses were well-aligned with current SVS CPGs except for inaccuracies in the risk of AAA rupture at varying diameters. The emergence of generative artificial intelligence bots presents an opportunity for study of applications in patient education and to determine their ability to augment the vascular specialist's knowledge base.</div></div>","PeriodicalId":17030,"journal":{"name":"Journal of Surgical Research","volume":"307 ","pages":"Pages 129-138"},"PeriodicalIF":1.8000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Surgical Research","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0022480425000332","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"SURGERY","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction
Our objective was to test the ability of ChatGPT 4.0 to provide accurate information for patients and physicians about abdominal aortic aneurysms (AAA) and to assess its alignment with Society for Vascular Surgery (SVS) clinical practice guidelines (CPG) for AAA care.
Material and methods
Fifteen patient-level questions, 37 questions selected to reflect 28 SVS CPGs and 4 questions regarding AAA rupture risk were posed to ChatGPT 4.0. Single responses were recorded and graded for accuracy and quality by ten board-certified vascular surgeons as well as two fellow trainees using a 5-point Likert scale; 1 = very poor, 2 = poor, 3 = fair, 4 = good, and 5 = excellent.
Results
The mean of the means (MoM) accuracy rating across all 15 patient-level questions was 4.4 (SD 0.4, quartile range (QR) 4.2-4.7). ChatGPT 4.0 demonstrated good alignment with SVS practice guidelines (MoM: 4.2, SD: 0.4, QR: 3.9-4.5). The accuracy of responses was consistent across guideline categories; screening or surveillance (4.2), indications for surgery (4.5), preoperative risk assessment (4.5), perioperative coronary revascularization (4.1), and perioperative management (4.2). The generative artificial intelligence bot demonstrated only fair performance in answering the annual AAA rupture risk (MoM: 3.4, SD: 1.2, QR: 2.3-4.3).
Conclusions
ChatGPT 4.0 provided accurate responses to a variety of patient-level questions regarding AAA. Responses were well-aligned with current SVS CPGs except for inaccuracies in the risk of AAA rupture at varying diameters. The emergence of generative artificial intelligence bots presents an opportunity for study of applications in patient education and to determine their ability to augment the vascular specialist's knowledge base.
期刊介绍:
The Journal of Surgical Research: Clinical and Laboratory Investigation publishes original articles concerned with clinical and laboratory investigations relevant to surgical practice and teaching. The journal emphasizes reports of clinical investigations or fundamental research bearing directly on surgical management that will be of general interest to a broad range of surgeons and surgical researchers. The articles presented need not have been the products of surgeons or of surgical laboratories.
The Journal of Surgical Research also features review articles and special articles relating to educational, research, or social issues of interest to the academic surgical community.