A Future of Self-Directed Patient Internet Research: Large Language Model-Based Tools Versus Standard Search Engines.

IF 3 2区 医学 Q3 ENGINEERING, BIOMEDICAL Annals of Biomedical Engineering Pub Date : 2025-03-03 DOI:10.1007/s10439-025-03701-6
Arya Rao, Andrew Mu, Elizabeth Enichen, Dhruva Gupta, Nathan Hall, Erica Koranteng, William Marks, Michael J Senter-Zapata, David C Whitehead, Benjamin A White, Sanjay Saini, Adam B Landman, Marc D Succi
{"title":"A Future of Self-Directed Patient Internet Research: Large Language Model-Based Tools Versus Standard Search Engines.","authors":"Arya Rao, Andrew Mu, Elizabeth Enichen, Dhruva Gupta, Nathan Hall, Erica Koranteng, William Marks, Michael J Senter-Zapata, David C Whitehead, Benjamin A White, Sanjay Saini, Adam B Landman, Marc D Succi","doi":"10.1007/s10439-025-03701-6","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>As generalist large language models (LLMs) become more commonplace, patients will inevitably increasingly turn to these tools instead of traditional search engines. Here, we evaluate publicly available LLM-based chatbots as tools for patient education through physician review of responses provided by Google, Bard, GPT-3.5 and GPT-4 to commonly searched queries about prevalent chronic health conditions in the United States.</p><p><strong>Methods: </strong>Five distinct commonly Google-searched queries were selected for (i) hypertension, (ii) hyperlipidemia, (iii) diabetes, (iv) anxiety, and (v) mood disorders and prompted into each model of interest. Responses were assessed by board-certified physicians for accuracy, comprehensiveness, and overall quality on a five-point Likert scale. The Flesch-Kincaid Grade Levels were calculated to assess readability.</p><p><strong>Results: </strong>GPT-3.5 (4.40 ± 0.48, 4.29 ± 0.43) and GPT-4 (4.35 ± 0.30, 4.24 ± 0.28) received higher ratings in comprehensiveness and quality than Bard (3.79 ± 0.36, 3.87 ± 0.32) and Google (1.87 ± 0.42, 2.11 ± 0.47), all p < 0.05. However, Bard (9.45 ± 1.35) and Google responses (9.92 ± 5.31) had a lower average Flesch-Kincaid Grade Level compared to GPT-3.5 (14.69 ± 1.57) and GPT-4 (12.88 ± 2.02), indicating greater readability.</p><p><strong>Conclusion: </strong>This study suggests that publicly available LLM-based tools may provide patients with more accurate responses to queries on chronic health conditions than answers provided by Google search. These results provide support for the use of these tools in place of traditional search engines for health-related queries.</p>","PeriodicalId":7986,"journal":{"name":"Annals of Biomedical Engineering","volume":" ","pages":""},"PeriodicalIF":3.0000,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Biomedical Engineering","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1007/s10439-025-03701-6","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose: As generalist large language models (LLMs) become more commonplace, patients will inevitably increasingly turn to these tools instead of traditional search engines. Here, we evaluate publicly available LLM-based chatbots as tools for patient education through physician review of responses provided by Google, Bard, GPT-3.5 and GPT-4 to commonly searched queries about prevalent chronic health conditions in the United States.

Methods: Five distinct commonly Google-searched queries were selected for (i) hypertension, (ii) hyperlipidemia, (iii) diabetes, (iv) anxiety, and (v) mood disorders and prompted into each model of interest. Responses were assessed by board-certified physicians for accuracy, comprehensiveness, and overall quality on a five-point Likert scale. The Flesch-Kincaid Grade Levels were calculated to assess readability.

Results: GPT-3.5 (4.40 ± 0.48, 4.29 ± 0.43) and GPT-4 (4.35 ± 0.30, 4.24 ± 0.28) received higher ratings in comprehensiveness and quality than Bard (3.79 ± 0.36, 3.87 ± 0.32) and Google (1.87 ± 0.42, 2.11 ± 0.47), all p < 0.05. However, Bard (9.45 ± 1.35) and Google responses (9.92 ± 5.31) had a lower average Flesch-Kincaid Grade Level compared to GPT-3.5 (14.69 ± 1.57) and GPT-4 (12.88 ± 2.02), indicating greater readability.

Conclusion: This study suggests that publicly available LLM-based tools may provide patients with more accurate responses to queries on chronic health conditions than answers provided by Google search. These results provide support for the use of these tools in place of traditional search engines for health-related queries.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
目的:随着通用大型语言模型(LLM)的普及,患者将不可避免地越来越多地转向这些工具,而不是传统的搜索引擎。在此,我们通过医生对谷歌、巴德、GPT-3.5 和 GPT-4 提供的有关美国流行慢性疾病的常见搜索查询的回复进行审查,对基于 LLM 的聊天机器人作为患者教育工具的公开可用性进行评估:针对(i)高血压、(ii)高脂血症、(iii)糖尿病、(iv)焦虑症和(v)情绪障碍选择了五个不同的谷歌常用搜索查询,并将其提示到每个感兴趣的模型中。回答的准确性、全面性和整体质量由委员会认证的医生按照李克特五点量表进行评估。为评估可读性,还计算了 Flesch-Kincaid 分级:结果:GPT-3.5(4.40±0.48,4.29±0.43)和 GPT-4(4.35±0.30,4.24±0.28)在全面性和质量方面的评分高于 Bard(3.79±0.36,3.87±0.32)和 Google(1.87±0.42,2.11±0.47),均为 p 结论:本研究表明,与谷歌搜索提供的答案相比,基于 LLM 的公开工具可为患者提供更准确的慢性健康状况查询回复。这些结果为使用这些工具代替传统搜索引擎进行健康相关查询提供了支持。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Annals of Biomedical Engineering
Annals of Biomedical Engineering 工程技术-工程:生物医学
CiteScore
7.50
自引率
15.80%
发文量
212
审稿时长
3 months
期刊介绍: Annals of Biomedical Engineering is an official journal of the Biomedical Engineering Society, publishing original articles in the major fields of bioengineering and biomedical engineering. The Annals is an interdisciplinary and international journal with the aim to highlight integrated approaches to the solutions of biological and biomedical problems.
期刊最新文献
The Histological and Mechanical Behavior of Skin During Puncture for Different Impactor Sizes and Loading Rates. Chemical Characterization in Medical Device Evaluation: Current Practices, Regulatory Requirements, and Future Directions. A Future of Self-Directed Patient Internet Research: Large Language Model-Based Tools Versus Standard Search Engines. Identification of Novel Biomarkers for Malignant Thyroid Nodules: A Preliminary Study Based on Ultrasound Omics. Head Response and Cervical Spine Injuries in an Oblique Lateral Helmeted Head Impact.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1