The reliability of freely accessible, baseline, general-purpose large language model generated patient information for frequently asked questions on liver disease: a preliminary cross-sectional study.
Madunil A Niriella, Pathum Premaratna, Mananjala Senanayake, Senerath Kodisinghe, Uditha Dassanayake, Anuradha Dassanayake, Dileepa S Ediriweera, H Janaka de Silva
{"title":"The reliability of freely accessible, baseline, general-purpose large language model generated patient information for frequently asked questions on liver disease: a preliminary cross-sectional study.","authors":"Madunil A Niriella, Pathum Premaratna, Mananjala Senanayake, Senerath Kodisinghe, Uditha Dassanayake, Anuradha Dassanayake, Dileepa S Ediriweera, H Janaka de Silva","doi":"10.1080/17474124.2025.2471874","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>We assessed the use of large language models (LLMs) like ChatGPT-3.5 and Gemini against human experts as sources of patient information.</p><p><strong>Research design and methods: </strong>We compared the accuracy, completeness and quality of freely accessible, baseline, general-purpose LLM-generated responses to 20 frequently asked questions (FAQs) on liver disease, with those from two gastroenterologists, using the Kruskal-Wallis test. Three independent gastroenterologists blindly rated each response.</p><p><strong>Results: </strong>The expert and AI-generated responses displayed high mean scores across all domains, with no statistical difference between the groups for accuracy [H(2) = 0.421, <i>p</i> = 0.811], completeness [H(2) = 3.146, <i>p</i> = 0.207], or quality [H(2) = 3.350, <i>p</i> = 0.187]. We found no statistical difference between rank totals in accuracy [H(2) = 5.559, <i>p</i> = 0.062], completeness [H(2) = 0.104, <i>p</i> = 0.949], or quality [H(2) = 0.420, <i>p</i> = 0.810] between the three raters (R1, R2, R3).</p><p><strong>Conclusion: </strong>Our findings outline the potential of freely accessible, baseline, general-purpose LLMs in providing reliable answers to FAQs on liver disease.</p>","PeriodicalId":12257,"journal":{"name":"Expert Review of Gastroenterology & Hepatology","volume":" ","pages":""},"PeriodicalIF":3.8000,"publicationDate":"2025-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Review of Gastroenterology & Hepatology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1080/17474124.2025.2471874","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GASTROENTEROLOGY & HEPATOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: We assessed the use of large language models (LLMs) like ChatGPT-3.5 and Gemini against human experts as sources of patient information.
Research design and methods: We compared the accuracy, completeness and quality of freely accessible, baseline, general-purpose LLM-generated responses to 20 frequently asked questions (FAQs) on liver disease, with those from two gastroenterologists, using the Kruskal-Wallis test. Three independent gastroenterologists blindly rated each response.
Results: The expert and AI-generated responses displayed high mean scores across all domains, with no statistical difference between the groups for accuracy [H(2) = 0.421, p = 0.811], completeness [H(2) = 3.146, p = 0.207], or quality [H(2) = 3.350, p = 0.187]. We found no statistical difference between rank totals in accuracy [H(2) = 5.559, p = 0.062], completeness [H(2) = 0.104, p = 0.949], or quality [H(2) = 0.420, p = 0.810] between the three raters (R1, R2, R3).
Conclusion: Our findings outline the potential of freely accessible, baseline, general-purpose LLMs in providing reliable answers to FAQs on liver disease.
期刊介绍:
The enormous health and economic burden of gastrointestinal disease worldwide warrants a sharp focus on the etiology, epidemiology, prevention, diagnosis, treatment and development of new therapies. By the end of the last century we had seen enormous advances, both in technologies to visualize disease and in curative therapies in areas such as gastric ulcer, with the advent first of the H2-antagonists and then the proton pump inhibitors - clear examples of how advances in medicine can massively benefit the patient. Nevertheless, specialists face ongoing challenges from a wide array of diseases of diverse etiology.