Evaluation of the reliability and readability of answers given by chatbots to frequently asked questions about endophthalmitis: A cross-sectional study on chatbots.
{"title":"Evaluation of the reliability and readability of answers given by chatbots to frequently asked questions about endophthalmitis: A cross-sectional study on chatbots.","authors":"Suleyman Demir","doi":"10.1177/14604582241304679","DOIUrl":null,"url":null,"abstract":"<p><p><b>Objective:</b> This study aimed to investigate the accuracy, reliability, and readability of A-Eye Consult, ChatGPT-4.0, Google Gemini and Copilot AI large language models (LLMs) in responding to patient questions about endophthalmitis. <b>Methods:</b> The LLMs' responses to 25 questions about endophthalmitis, frequently asked by patients, were evaluated by two ophthalmologists using a five-point Likert scale, with scores ranging from 1-5. The DISCERN scale assessed the reliability of the LLMs' responses, whereas the Flesch Reading Ease (FRE) and Flesch-Kincaid Grade Level (FKGL) indices assessed readability and text complexity, respectively. <b>Results:</b> A-Eye Consult and ChatGPT-4.0 outperformed Google Gemini and Copilot in providing comprehensive and precise responses. The Likert score significantly differed across all four LLMs (<i>p</i> < .001), with A-Eye Consult scoring significantly higher than Google Gemini and Copilot (<i>p</i> < .001). <b>Conclusions:</b> A-Eye Consult and ChatGPT-4.0 responses, while more complex than those of other LLMs, provided more reliable and accurate information.</p>","PeriodicalId":55069,"journal":{"name":"Health Informatics Journal","volume":"30 4","pages":"14604582241304679"},"PeriodicalIF":2.2000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Health Informatics Journal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/14604582241304679","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: This study aimed to investigate the accuracy, reliability, and readability of A-Eye Consult, ChatGPT-4.0, Google Gemini and Copilot AI large language models (LLMs) in responding to patient questions about endophthalmitis. Methods: The LLMs' responses to 25 questions about endophthalmitis, frequently asked by patients, were evaluated by two ophthalmologists using a five-point Likert scale, with scores ranging from 1-5. The DISCERN scale assessed the reliability of the LLMs' responses, whereas the Flesch Reading Ease (FRE) and Flesch-Kincaid Grade Level (FKGL) indices assessed readability and text complexity, respectively. Results: A-Eye Consult and ChatGPT-4.0 outperformed Google Gemini and Copilot in providing comprehensive and precise responses. The Likert score significantly differed across all four LLMs (p < .001), with A-Eye Consult scoring significantly higher than Google Gemini and Copilot (p < .001). Conclusions: A-Eye Consult and ChatGPT-4.0 responses, while more complex than those of other LLMs, provided more reliable and accurate information.
期刊介绍:
Health Informatics Journal is an international peer-reviewed journal. All papers submitted to Health Informatics Journal are subject to peer review by members of a carefully appointed editorial board. The journal operates a conventional single-blind reviewing policy in which the reviewer’s name is always concealed from the submitting author.