探讨大型语言模型在乙型肝炎感染相关问题上的表现:一项比较研究。

IF 4.3 3区 医学 Q1 GASTROENTEROLOGY & HEPATOLOGY World Journal of Gastroenterology Pub Date : 2025-01-21 DOI:10.3748/wjg.v31.i3.101092
Yu Li, Chen-Kai Huang, Yi Hu, Xiao-Dong Zhou, Cong He, Jia-Wei Zhong
{"title":"探讨大型语言模型在乙型肝炎感染相关问题上的表现:一项比较研究。","authors":"Yu Li, Chen-Kai Huang, Yi Hu, Xiao-Dong Zhou, Cong He, Jia-Wei Zhong","doi":"10.3748/wjg.v31.i3.101092","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Patients with hepatitis B virus (HBV) infection require chronic and personalized care to improve outcomes. Large language models (LLMs) can potentially provide medical information for patients.</p><p><strong>Aim: </strong>To examine the performance of three LLMs, ChatGPT-3.5, ChatGPT-4.0, and Google Gemini, in answering HBV-related questions.</p><p><strong>Methods: </strong>LLMs' responses to HBV-related questions were independently graded by two medical professionals using a four-point accuracy scale, and disagreements were resolved by a third reviewer. Each question was run three times using three LLMs. Readability was assessed <i>via</i> the Gunning Fog index and Flesch-Kincaid grade level.</p><p><strong>Results: </strong>Overall, all three LLM chatbots achieved high average accuracy scores for subjective questions (ChatGPT-3.5: 3.50; ChatGPT-4.0: 3.69; Google Gemini: 3.53, out of a maximum score of 4). With respect to objective questions, ChatGPT-4.0 achieved an 80.8% accuracy rate, compared with 62.9% for ChatGPT-3.5 and 73.1% for Google Gemini. Across the six domains, ChatGPT-4.0 performed better in terms of diagnosis, whereas Google Gemini demonstrated excellent clinical manifestations. Notably, in the readability analysis, the mean Gunning Fog index and Flesch-Kincaid grade level scores of the three LLM chatbots were significantly higher than the standard level eight, far exceeding the reading level of the normal population.</p><p><strong>Conclusion: </strong>Our results highlight the potential of LLMs, especially ChatGPT-4.0, for delivering responses to HBV-related questions. LLMs may be an adjunctive informational tool for patients and physicians to improve outcomes. Nevertheless, current LLMs should not replace personalized treatment recommendations from physicians in the management of HBV infection.</p>","PeriodicalId":23778,"journal":{"name":"World Journal of Gastroenterology","volume":"31 3","pages":"101092"},"PeriodicalIF":4.3000,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11684168/pdf/","citationCount":"0","resultStr":"{\"title\":\"Exploring the performance of large language models on hepatitis B infection-related questions: A comparative study.\",\"authors\":\"Yu Li, Chen-Kai Huang, Yi Hu, Xiao-Dong Zhou, Cong He, Jia-Wei Zhong\",\"doi\":\"10.3748/wjg.v31.i3.101092\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Patients with hepatitis B virus (HBV) infection require chronic and personalized care to improve outcomes. Large language models (LLMs) can potentially provide medical information for patients.</p><p><strong>Aim: </strong>To examine the performance of three LLMs, ChatGPT-3.5, ChatGPT-4.0, and Google Gemini, in answering HBV-related questions.</p><p><strong>Methods: </strong>LLMs' responses to HBV-related questions were independently graded by two medical professionals using a four-point accuracy scale, and disagreements were resolved by a third reviewer. Each question was run three times using three LLMs. Readability was assessed <i>via</i> the Gunning Fog index and Flesch-Kincaid grade level.</p><p><strong>Results: </strong>Overall, all three LLM chatbots achieved high average accuracy scores for subjective questions (ChatGPT-3.5: 3.50; ChatGPT-4.0: 3.69; Google Gemini: 3.53, out of a maximum score of 4). With respect to objective questions, ChatGPT-4.0 achieved an 80.8% accuracy rate, compared with 62.9% for ChatGPT-3.5 and 73.1% for Google Gemini. Across the six domains, ChatGPT-4.0 performed better in terms of diagnosis, whereas Google Gemini demonstrated excellent clinical manifestations. Notably, in the readability analysis, the mean Gunning Fog index and Flesch-Kincaid grade level scores of the three LLM chatbots were significantly higher than the standard level eight, far exceeding the reading level of the normal population.</p><p><strong>Conclusion: </strong>Our results highlight the potential of LLMs, especially ChatGPT-4.0, for delivering responses to HBV-related questions. LLMs may be an adjunctive informational tool for patients and physicians to improve outcomes. Nevertheless, current LLMs should not replace personalized treatment recommendations from physicians in the management of HBV infection.</p>\",\"PeriodicalId\":23778,\"journal\":{\"name\":\"World Journal of Gastroenterology\",\"volume\":\"31 3\",\"pages\":\"101092\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2025-01-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11684168/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"World Journal of Gastroenterology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.3748/wjg.v31.i3.101092\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"GASTROENTEROLOGY & HEPATOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"World Journal of Gastroenterology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3748/wjg.v31.i3.101092","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GASTROENTEROLOGY & HEPATOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

背景:乙型肝炎病毒(HBV)感染患者需要长期和个性化的护理来改善预后。大型语言模型(llm)可以潜在地为患者提供医疗信息。目的:研究三种llm, ChatGPT-3.5, ChatGPT-4.0和谷歌Gemini在回答hbv相关问题方面的表现。方法:法学硕士对hbv相关问题的回答由两名医学专业人员使用四分式准确性量表独立评分,分歧由第三位审稿人解决。每个问题使用三个llm运行三次。通过Gunning Fog指数和Flesch-Kincaid分级水平评估可读性。结果:总体而言,所有三个LLM聊天机器人在主观问题上的平均准确率得分都很高(ChatGPT-3.5: 3.50;chatgpt - 4.0: 3.69;谷歌Gemini: 3.53,满分4分)。对于客观问题,ChatGPT-4.0的准确率为80.8%,而ChatGPT-3.5的准确率为62.9%,谷歌Gemini的准确率为73.1%。在六个领域中,ChatGPT-4.0在诊断方面表现更好,而谷歌Gemini则表现出出色的临床表现。值得注意的是,在可读性分析中,3个LLM聊天机器人的平均Gunning Fog指数和Flesch-Kincaid等级水平得分均显著高于标准8级,远远超过正常人群的阅读水平。结论:我们的研究结果突出了llm,特别是ChatGPT-4.0在解决hbv相关问题方面的潜力。法学硕士可能是患者和医生改善预后的辅助信息工具。然而,目前的llm不应该取代医生在HBV感染管理方面的个性化治疗建议。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Exploring the performance of large language models on hepatitis B infection-related questions: A comparative study.

Background: Patients with hepatitis B virus (HBV) infection require chronic and personalized care to improve outcomes. Large language models (LLMs) can potentially provide medical information for patients.

Aim: To examine the performance of three LLMs, ChatGPT-3.5, ChatGPT-4.0, and Google Gemini, in answering HBV-related questions.

Methods: LLMs' responses to HBV-related questions were independently graded by two medical professionals using a four-point accuracy scale, and disagreements were resolved by a third reviewer. Each question was run three times using three LLMs. Readability was assessed via the Gunning Fog index and Flesch-Kincaid grade level.

Results: Overall, all three LLM chatbots achieved high average accuracy scores for subjective questions (ChatGPT-3.5: 3.50; ChatGPT-4.0: 3.69; Google Gemini: 3.53, out of a maximum score of 4). With respect to objective questions, ChatGPT-4.0 achieved an 80.8% accuracy rate, compared with 62.9% for ChatGPT-3.5 and 73.1% for Google Gemini. Across the six domains, ChatGPT-4.0 performed better in terms of diagnosis, whereas Google Gemini demonstrated excellent clinical manifestations. Notably, in the readability analysis, the mean Gunning Fog index and Flesch-Kincaid grade level scores of the three LLM chatbots were significantly higher than the standard level eight, far exceeding the reading level of the normal population.

Conclusion: Our results highlight the potential of LLMs, especially ChatGPT-4.0, for delivering responses to HBV-related questions. LLMs may be an adjunctive informational tool for patients and physicians to improve outcomes. Nevertheless, current LLMs should not replace personalized treatment recommendations from physicians in the management of HBV infection.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
World Journal of Gastroenterology
World Journal of Gastroenterology 医学-胃肠肝病学
CiteScore
7.80
自引率
4.70%
发文量
464
审稿时长
2.4 months
期刊介绍: The primary aims of the WJG are to improve diagnostic, therapeutic and preventive modalities and the skills of clinicians and to guide clinical practice in gastroenterology and hepatology.
期刊最新文献
Exploring the links between gallstone disease, non-alcoholic fatty liver disease, and kidney stones: A path to comprehensive prevention. Exploring the therapeutic potential of glucagon-like peptide 1 agonists in metabolic disorders. Gel immersion in endoscopy: Exploring potential applications. Imaging characteristics of brain microstructure and cerebral perfusion in Crohn's disease patients with anxiety: A prospective comparative study. Impact of microplastics on the human digestive system: From basic to clinical.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1