Evaluating large language models as patient education tools for inflammatory bowel disease: A comparative study.

IF 5.4 3区医学 Q1 GASTROENTEROLOGY & HEPATOLOGY World Journal of Gastroenterology Pub Date : 2025-02-14 DOI:10.3748/wjg.v31.i6.102090

Yan Zhang, Xiao-Han Wan, Qing-Zhou Kong, Han Liu, Jun Liu, Jing Guo, Xiao-Yun Yang, Xiu-Li Zuo, Yan-Qing Li

{"title":"Evaluating large language models as patient education tools for inflammatory bowel disease: A comparative study.","authors":"Yan Zhang, Xiao-Han Wan, Qing-Zhou Kong, Han Liu, Jun Liu, Jing Guo, Xiao-Yun Yang, Xiu-Li Zuo, Yan-Qing Li","doi":"10.3748/wjg.v31.i6.102090","DOIUrl":null,"url":null,"abstract":"Background: Inflammatory bowel disease (IBD) is a global health burden that affects millions of individuals worldwide, necessitating extensive patient education. Large language models (LLMs) hold promise for addressing patient information needs. However, LLM use to deliver accurate and comprehensible IBD-related medical information has yet to be thoroughly investigated.Aim: To assess the utility of three LLMs (ChatGPT-4.0, Claude-3-Opus, and Gemini-1.5-Pro) as a reference point for patients with IBD.Methods: In this comparative study, two gastroenterology experts generated 15 IBD-related questions that reflected common patient concerns. These questions were used to evaluate the performance of the three LLMs. The answers provided by each model were independently assessed by three IBD-related medical experts using a Likert scale focusing on accuracy, comprehensibility, and correlation. Simultaneously, three patients were invited to evaluate the comprehensibility of their answers. Finally, a readability assessment was performed.Results: Overall, each of the LLMs achieved satisfactory levels of accuracy, comprehensibility, and completeness when answering IBD-related questions, although their performance varies. All of the investigated models demonstrated strengths in providing basic disease information such as IBD definition as well as its common symptoms and diagnostic methods. Nevertheless, when dealing with more complex medical advice, such as medication side effects, dietary adjustments, and complication risks, the quality of answers was inconsistent between the LLMs. Notably, Claude-3-Opus generated answers with better readability than the other two models.Conclusion: LLMs have the potential as educational tools for patients with IBD; however, there are discrepancies between the models. Further optimization and the development of specialized models are necessary to ensure the accuracy and safety of the information provided.","PeriodicalId":23778,"journal":{"name":"World Journal of Gastroenterology","volume":"31 6","pages":"102090"},"PeriodicalIF":5.4000,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11752706/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"World Journal of Gastroenterology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3748/wjg.v31.i6.102090","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GASTROENTEROLOGY & HEPATOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Inflammatory bowel disease (IBD) is a global health burden that affects millions of individuals worldwide, necessitating extensive patient education. Large language models (LLMs) hold promise for addressing patient information needs. However, LLM use to deliver accurate and comprehensible IBD-related medical information has yet to be thoroughly investigated.

Aim: To assess the utility of three LLMs (ChatGPT-4.0, Claude-3-Opus, and Gemini-1.5-Pro) as a reference point for patients with IBD.

Methods: In this comparative study, two gastroenterology experts generated 15 IBD-related questions that reflected common patient concerns. These questions were used to evaluate the performance of the three LLMs. The answers provided by each model were independently assessed by three IBD-related medical experts using a Likert scale focusing on accuracy, comprehensibility, and correlation. Simultaneously, three patients were invited to evaluate the comprehensibility of their answers. Finally, a readability assessment was performed.

Results: Overall, each of the LLMs achieved satisfactory levels of accuracy, comprehensibility, and completeness when answering IBD-related questions, although their performance varies. All of the investigated models demonstrated strengths in providing basic disease information such as IBD definition as well as its common symptoms and diagnostic methods. Nevertheless, when dealing with more complex medical advice, such as medication side effects, dietary adjustments, and complication risks, the quality of answers was inconsistent between the LLMs. Notably, Claude-3-Opus generated answers with better readability than the other two models.

Conclusion: LLMs have the potential as educational tools for patients with IBD; however, there are discrepancies between the models. Further optimization and the development of specialized models are necessary to ensure the accuracy and safety of the information provided.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

评估大型语言模型作为炎症性肠病患者教育工具：一项比较研究。

背景：炎症性肠病（IBD）是影响全世界数百万人的全球性健康负担，需要广泛的患者教育。大型语言模型（llm）有望满足患者的信息需求。然而，法学硕士用于提供准确和可理解的ibd相关医学信息尚未得到彻底的调查。目的：评估三种llm （ChatGPT-4.0， Claude-3-Opus和Gemini-1.5-Pro）作为IBD患者参考点的效用。方法：在这项比较研究中，两位胃肠病学专家提出了15个与ibd相关的问题，这些问题反映了患者的共同担忧。这些问题被用来评价三位法学硕士的表现。每个模型提供的答案由三位ibd相关医学专家使用李克特量表独立评估，重点是准确性，可理解性和相关性。同时，三名患者被邀请来评估他们的答案的可理解性。最后，进行可读性评估。结果：总体而言，每个法学硕士在回答ibd相关问题时都达到了令人满意的准确性、可理解性和完整性水平，尽管他们的表现各不相同。所有被调查的模型都显示出在提供基本疾病信息方面的优势，例如IBD的定义以及常见症状和诊断方法。然而，当处理更复杂的医疗建议时，如药物副作用、饮食调整和并发症风险，llm之间的回答质量不一致。值得注意的是，Claude-3-Opus生成的答案比其他两个模型具有更好的可读性。结论：法学硕士具有作为IBD患者教育工具的潜力；然而，模型之间存在差异。为了确保所提供信息的准确性和安全性，有必要进一步优化和开发专门的模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

World Journal of Gastroenterology 医学-胃肠肝病学

CiteScore

7.80

自引率

4.70%

发文量

464

审稿时长

2.4 months

期刊介绍： The primary aims of the WJG are to improve diagnostic, therapeutic and preventive modalities and the skills of clinicians and to guide clinical practice in gastroenterology and hepatology.