Yihua Zhan , Xutao Chen , Feihong Ye , Zhikai Wu , Muhammad Usman , Zhihan Yuan , Han Wu , Jian Huang , Hao Yu
{"title":"Evaluating AI Chatbot Responses to Postkidney Transplant Inquiries","authors":"Yihua Zhan , Xutao Chen , Feihong Ye , Zhikai Wu , Muhammad Usman , Zhihan Yuan , Han Wu , Jian Huang , Hao Yu","doi":"10.1016/j.transproceed.2024.12.028","DOIUrl":null,"url":null,"abstract":"<div><div>This study evaluated the capability of three AI chatbots—ChatGPT 4.0, Claude 3.0, and Gemini Pro, as well as Google—in responding to common postkidney transplantation inquiries. We compiled a list of frequently asked postkidney transplant questions using Google and Bing. Response quality was rated on a 5-point Likert scale, while understandability and actionability were measured with the Patient Education Materials Assessment Tool (PEMAT). Readability was assessed using the Flesch Reading Ease and Flesch-Kincaid Grade Level metrics, with statistical analysis conducted via non-parametric tests, specifically the Kruskal-Wallis test, using SPSS. We gathered 127 questions, which were addressed by the chatbots and Google. The responses were of high quality (median Likert score: 4 [4,5]), good understandability (median PEMAT understandability score: 72.7% [62.5,77.8]), but poor actionability (median PEMAT operability score: 20% [0%-20%]). The readability was challenging (median Flesch Reading Ease score: 22.1 [8.7,34.8]), with a Flesch-Kincaid Grade Level akin to undergraduate-level text (median score: 14.7 [12.3,16.7]). Among the chatbots, Claude 3.0 provided the most reliable responses, though they required a higher reading level. ChatGPT 4.0 offered the most comprehensible responses. Moreover, Google did not outperform the chatbots in any of the scoring metrics.</div></div>","PeriodicalId":23246,"journal":{"name":"Transplantation proceedings","volume":"57 2","pages":"Pages 394-405"},"PeriodicalIF":0.8000,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transplantation proceedings","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0041134524006821","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"IMMUNOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
This study evaluated the capability of three AI chatbots—ChatGPT 4.0, Claude 3.0, and Gemini Pro, as well as Google—in responding to common postkidney transplantation inquiries. We compiled a list of frequently asked postkidney transplant questions using Google and Bing. Response quality was rated on a 5-point Likert scale, while understandability and actionability were measured with the Patient Education Materials Assessment Tool (PEMAT). Readability was assessed using the Flesch Reading Ease and Flesch-Kincaid Grade Level metrics, with statistical analysis conducted via non-parametric tests, specifically the Kruskal-Wallis test, using SPSS. We gathered 127 questions, which were addressed by the chatbots and Google. The responses were of high quality (median Likert score: 4 [4,5]), good understandability (median PEMAT understandability score: 72.7% [62.5,77.8]), but poor actionability (median PEMAT operability score: 20% [0%-20%]). The readability was challenging (median Flesch Reading Ease score: 22.1 [8.7,34.8]), with a Flesch-Kincaid Grade Level akin to undergraduate-level text (median score: 14.7 [12.3,16.7]). Among the chatbots, Claude 3.0 provided the most reliable responses, though they required a higher reading level. ChatGPT 4.0 offered the most comprehensible responses. Moreover, Google did not outperform the chatbots in any of the scoring metrics.
期刊介绍:
Transplantation Proceedings publishes several different categories of manuscripts, all of which undergo extensive peer review by recognized authorities in the field prior to their acceptance for publication.
The first type of manuscripts consists of sets of papers providing an in-depth expression of the current state of the art in various rapidly developing components of world transplantation biology and medicine. These manuscripts emanate from congresses of the affiliated transplantation societies, from Symposia sponsored by the Societies, as well as special Conferences and Workshops covering related topics.
Transplantation Proceedings also publishes several special sections including publication of Clinical Transplantation Proceedings, being rapid original contributions of preclinical and clinical experiences. These manuscripts undergo review by members of the Editorial Board.
Original basic or clinical science articles, clinical trials and case studies can be submitted to the journal?s open access companion title Transplantation Reports.