Large language models' performances regarding common patient questions about osteoarthritis: A comparative analysis of ChatGPT-3.5, ChatGPT-4.0, and perplexity.

IF 9.7 1区 医学 Q1 HOSPITALITY, LEISURE, SPORT & TOURISM Journal of Sport and Health Science Pub Date : 2024-11-28 DOI:10.1016/j.jshs.2024.101016
Mingde Cao, Qianwen Wang, Xueyou Zhang, Zuru Lang, Jihong Qiu, Patrick Shu-Hang Yung, Michael Tim-Yun Ong
{"title":"Large language models' performances regarding common patient questions about osteoarthritis: A comparative analysis of ChatGPT-3.5, ChatGPT-4.0, and perplexity.","authors":"Mingde Cao, Qianwen Wang, Xueyou Zhang, Zuru Lang, Jihong Qiu, Patrick Shu-Hang Yung, Michael Tim-Yun Ong","doi":"10.1016/j.jshs.2024.101016","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Large Language Models (LLMs) have gained much attention and, in part, have replaced common search engines as a popular channel for obtaining information due to their contextually relevant responses. Osteoarthritis (OA) is a common topic in skeletal muscle disorders, and patients often seek information about it online. Our study evaluated the ability of 3 LLMs (ChatGPT-3.5, ChatGPT-4.0, and Perplexity) to accurately answer common OA-related queries.</p><p><strong>Methods: </strong>We defined 6 themes (pathogenesis, risk factors, clinical presentation, diagnosis, treatment and prevention, and prognosis) based on a generalization of 25 frequently asked questions about OA. Three consultant-level orthopedic specialists independently rated the LLMs' replies on a 4-point accuracy scale. The final ratings for each response were determined using a majority consensus approach. Responses classified as \"satisfactory\" were evaluated for comprehensiveness on a 5-point scale.</p><p><strong>Results: </strong>ChatGPT-4.0 demonstrated superior accuracy, with 64% of responses rated as \"excellent\", compared to 40% for ChatGPT-3.5 and 28% for Perplexity (Pearson's chi-squared test with Fisher's exact test, all p < 0.001). All 3 LLM-chatbots had high mean comprehensiveness ratings (Perplexity = 3.88; ChatGPT-4.0 = 4.56; ChatGPT-3.5 = 3.96, out of a maximum score of 5). The LLM-chatbots performed reliably across domains, except for \"treatment and prevention\" However, ChatGPT-4.0 still outperformed ChatGPT-3.5 and Perplexity, garnering 53.8% \"excellent\" ratings (Pearson's chi-squared test with Fisher's exact test, all p < 0.001).</p><p><strong>Conclusion: </strong>Our findings underscore the potential of LLMs, specifically ChatGPT-4.0 and Perplexity, to deliver accurate and thorough responses to OA-related queries. Targeted correction of specific misconceptions to improve the accuracy of LLMs remains crucial.</p>","PeriodicalId":48897,"journal":{"name":"Journal of Sport and Health Science","volume":" ","pages":"101016"},"PeriodicalIF":9.7000,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Sport and Health Science","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.jshs.2024.101016","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HOSPITALITY, LEISURE, SPORT & TOURISM","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Large Language Models (LLMs) have gained much attention and, in part, have replaced common search engines as a popular channel for obtaining information due to their contextually relevant responses. Osteoarthritis (OA) is a common topic in skeletal muscle disorders, and patients often seek information about it online. Our study evaluated the ability of 3 LLMs (ChatGPT-3.5, ChatGPT-4.0, and Perplexity) to accurately answer common OA-related queries.

Methods: We defined 6 themes (pathogenesis, risk factors, clinical presentation, diagnosis, treatment and prevention, and prognosis) based on a generalization of 25 frequently asked questions about OA. Three consultant-level orthopedic specialists independently rated the LLMs' replies on a 4-point accuracy scale. The final ratings for each response were determined using a majority consensus approach. Responses classified as "satisfactory" were evaluated for comprehensiveness on a 5-point scale.

Results: ChatGPT-4.0 demonstrated superior accuracy, with 64% of responses rated as "excellent", compared to 40% for ChatGPT-3.5 and 28% for Perplexity (Pearson's chi-squared test with Fisher's exact test, all p < 0.001). All 3 LLM-chatbots had high mean comprehensiveness ratings (Perplexity = 3.88; ChatGPT-4.0 = 4.56; ChatGPT-3.5 = 3.96, out of a maximum score of 5). The LLM-chatbots performed reliably across domains, except for "treatment and prevention" However, ChatGPT-4.0 still outperformed ChatGPT-3.5 and Perplexity, garnering 53.8% "excellent" ratings (Pearson's chi-squared test with Fisher's exact test, all p < 0.001).

Conclusion: Our findings underscore the potential of LLMs, specifically ChatGPT-4.0 and Perplexity, to deliver accurate and thorough responses to OA-related queries. Targeted correction of specific misconceptions to improve the accuracy of LLMs remains crucial.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
关于骨关节炎常见患者问题的大型语言模型的性能:ChatGPT-3.5、ChatGPT-4.0和Perplexity的比较分析
背景:大型语言模型(llm)已经获得了广泛的关注,并且由于其上下文相关的响应,在一定程度上已经取代了普通的搜索引擎,成为获取信息的流行渠道。骨关节炎(OA)是骨骼肌疾病的常见话题,患者经常在网上寻找相关信息。我们的研究评估了3个LLMs (ChatGPT-3.5, ChatGPT-4.0和Perplexity)准确回答常见oa相关查询的能力。方法:总结25个OA常见问题,确定发病机制、危险因素、临床表现、诊断、治疗和预防、预后6个主题。三位顾问级别的骨科专家以4分的准确度对法学硕士的回答进行了独立评分。每个回答的最终评级是使用多数共识方法确定的。被分类为“满意”的回答以5分制对综合程度进行评估。结果:ChatGPT-4.0显示出更高的准确性,64%的回答被评为“优秀”,而ChatGPT-3.5为40%,Perplexity为28% (Pearson卡方检验与Fisher精确检验,均p < 0.001)。所有3个llm聊天机器人的平均综合评分都很高(Perplexity = 3.88;chatgpt - 4.0 = 4.56;ChatGPT-3.5 = 3.96,满分5分)。除了“治疗和预防”之外,llm聊天机器人在各个领域的表现都很可靠。然而,ChatGPT-4.0的表现仍然优于ChatGPT-3.5和Perplexity,获得53.8%的“优秀”评分(Pearson的卡方检验和Fisher的精确检验,均p < 0.001)。结论:我们的研究结果强调了法学硕士的潜力,特别是ChatGPT-4.0和Perplexity,可以为oa相关查询提供准确而彻底的响应。有针对性地纠正特定的误解,以提高法学硕士的准确性仍然至关重要。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
18.30
自引率
1.70%
发文量
101
审稿时长
22 weeks
期刊介绍: The Journal of Sport and Health Science (JSHS) is an international, multidisciplinary journal that aims to advance the fields of sport, exercise, physical activity, and health sciences. Published by Elsevier B.V. on behalf of Shanghai University of Sport, JSHS is dedicated to promoting original and impactful research, as well as topical reviews, editorials, opinions, and commentary papers. With a focus on physical and mental health, injury and disease prevention, traditional Chinese exercise, and human performance, JSHS offers a platform for scholars and researchers to share their findings and contribute to the advancement of these fields. Our journal is peer-reviewed, ensuring that all published works meet the highest academic standards. Supported by a carefully selected international editorial board, JSHS upholds impeccable integrity and provides an efficient publication platform. We invite submissions from scholars and researchers worldwide, and we are committed to disseminating insightful and influential research in the field of sport and health science.
期刊最新文献
A primer on global molecular responses to exercise in skeletal muscle: Omics in focus. The effect of muscle warm-up on voluntary and evoked force-time parameters: A systematic review and meta-analysis with meta-regression. Erratum to "Biomechanics associated with tibial stress fracture in runners: A systematic review and meta-analysis" [J Sport Health Sci 12 (2023) 333-342]. Do compression garments enhance running performance? An updated systematic review and meta-analysis. Exercised gut microbiota improves vascular and metabolic abnormalities in sedentary diabetic mice through gut‒vascular connection.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1