Effectiveness of a large language model for clinical information retrieval regarding shoulder arthroplasty

IF 2 Q2 ORTHOPEDICS Journal of Experimental Orthopaedics Pub Date : 2024-12-17 DOI:10.1002/jeo2.70114
Jacob F. Oeding, Amy Z. Lu, Michael Mazzucco, Michael C. Fu, David M. Dines, Russell F. Warren, Lawrence V. Gulotta, Joshua S. Dines, Kyle N. Kunze
{"title":"Effectiveness of a large language model for clinical information retrieval regarding shoulder arthroplasty","authors":"Jacob F. Oeding,&nbsp;Amy Z. Lu,&nbsp;Michael Mazzucco,&nbsp;Michael C. Fu,&nbsp;David M. Dines,&nbsp;Russell F. Warren,&nbsp;Lawrence V. Gulotta,&nbsp;Joshua S. Dines,&nbsp;Kyle N. Kunze","doi":"10.1002/jeo2.70114","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Purpose</h3>\n \n <p>To determine the scope and accuracy of medical information provided by ChatGPT-4 in response to clinical queries concerning total shoulder arthroplasty (TSA), and to compare these results to those of the Google search engine.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>A patient-replicated query for ‘total shoulder replacement’ was performed using both Google Web Search (the most frequently used search engine worldwide) and ChatGPT-4. The top 10 frequently asked questions (FAQs), answers, and associated sources were extracted. This search was performed again independently to identify the top 10 FAQs necessitating numerical responses such that the concordance of answers could be compared between Google and ChatGPT-4. The clinical relevance and accuracy of the provided information were graded by two blinded orthopaedic shoulder surgeons.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>Concerning FAQs with numeric responses, 8 out of 10 (80%) had identical answers or substantial overlap between ChatGPT-4 and Google. Accuracy of information was not significantly different (<i>p</i> = 0.32). Google sources included 40% medical practices, 30% academic, 20% single-surgeon practice, and 10% social media, while ChatGPT-4 used 100% academic sources, representing a statistically significant difference (<i>p</i> = 0.001). Only 3 out of 10 (30%) FAQs with open-ended answers were identical between ChatGPT-4 and Google. The clinical relevance of FAQs was not significantly different (<i>p</i> = 0.18). Google sources for open-ended questions included academic (60%), social media (20%), medical practice (10%) and single-surgeon practice (10%), while 100% of sources for ChatGPT-4 were academic, representing a statistically significant difference (<i>p</i> = 0.0025).</p>\n </section>\n \n <section>\n \n <h3> Conclusion</h3>\n \n <p>ChatGPT-4 provided trustworthy academic sources for medical information retrieval concerning TSA, while sources used by Google were heterogeneous. Accuracy and clinical relevance of information were not significantly different between ChatGPT-4 and Google.</p>\n </section>\n \n <section>\n \n <h3> Level of Evidence</h3>\n \n <p>Level IV cross-sectional.</p>\n </section>\n </div>","PeriodicalId":36909,"journal":{"name":"Journal of Experimental Orthopaedics","volume":"11 4","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2024-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11649951/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Experimental Orthopaedics","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/jeo2.70114","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ORTHOPEDICS","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose

To determine the scope and accuracy of medical information provided by ChatGPT-4 in response to clinical queries concerning total shoulder arthroplasty (TSA), and to compare these results to those of the Google search engine.

Methods

A patient-replicated query for ‘total shoulder replacement’ was performed using both Google Web Search (the most frequently used search engine worldwide) and ChatGPT-4. The top 10 frequently asked questions (FAQs), answers, and associated sources were extracted. This search was performed again independently to identify the top 10 FAQs necessitating numerical responses such that the concordance of answers could be compared between Google and ChatGPT-4. The clinical relevance and accuracy of the provided information were graded by two blinded orthopaedic shoulder surgeons.

Results

Concerning FAQs with numeric responses, 8 out of 10 (80%) had identical answers or substantial overlap between ChatGPT-4 and Google. Accuracy of information was not significantly different (p = 0.32). Google sources included 40% medical practices, 30% academic, 20% single-surgeon practice, and 10% social media, while ChatGPT-4 used 100% academic sources, representing a statistically significant difference (p = 0.001). Only 3 out of 10 (30%) FAQs with open-ended answers were identical between ChatGPT-4 and Google. The clinical relevance of FAQs was not significantly different (p = 0.18). Google sources for open-ended questions included academic (60%), social media (20%), medical practice (10%) and single-surgeon practice (10%), while 100% of sources for ChatGPT-4 were academic, representing a statistically significant difference (p = 0.0025).

Conclusion

ChatGPT-4 provided trustworthy academic sources for medical information retrieval concerning TSA, while sources used by Google were heterogeneous. Accuracy and clinical relevance of information were not significantly different between ChatGPT-4 and Google.

Level of Evidence

Level IV cross-sectional.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
大型语言模型对肩关节置换术临床信息检索的有效性。
目的:确定ChatGPT-4在回应全肩关节置换术(TSA)临床查询时提供的医疗信息的范围和准确性,并将这些结果与谷歌搜索引擎的结果进行比较。方法:使用谷歌Web Search(全球最常用的搜索引擎)和ChatGPT-4对“全肩关节置换术”进行患者复制查询。提取了前10个常见问题(FAQs)、答案和相关来源。该搜索再次独立执行,以确定需要数字回答的前10个常见问题,以便可以比较谷歌和chatggt -4之间答案的一致性。两位盲法骨科肩部外科医生对所提供信息的临床相关性和准确性进行了评分。结果:在带有数字回答的常见问题中,ChatGPT-4和谷歌有80%的答案相同或有大量重叠。信息准确性差异无统计学意义(p = 0.32)。谷歌来源包括40%的医疗实践,30%的学术,20%的单一外科医生实践和10%的社交媒体,而ChatGPT-4使用100%的学术来源,代表统计学上显著差异(p = 0.001)。在ChatGPT-4和谷歌之间,只有3 / 10(30%)带有开放式答案的常见问题是相同的。常见问题的临床相关性无显著性差异(p = 0.18)。谷歌开放式问题的来源包括学术(60%)、社交媒体(20%)、医疗实践(10%)和单外科医生实践(10%),而ChatGPT-4的来源100%为学术,差异有统计学意义(p = 0.0025)。结论:ChatGPT-4为TSA医学信息检索提供了可靠的学术来源,而谷歌使用的来源具有异质性。ChatGPT-4和谷歌对信息的准确性和临床相关性无显著差异。证据等级:横截面ⅳ级。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Experimental Orthopaedics
Journal of Experimental Orthopaedics Medicine-Orthopedics and Sports Medicine
CiteScore
3.20
自引率
5.60%
发文量
114
审稿时长
13 weeks
期刊最新文献
Reproducibility of a new device for robotic-assisted TKA surgery The central fibre areas in the tibial footprint of the posterior cruciate ligament show the highest contribution to restriction of a posterior drawer force—A biomechanical robotic investigation The short version of the ALR-RSI scale is a valid and reproducible scale to evaluate psychological readiness to return to sport after ankle lateral reconstruction Which treatment strategy for irreparable rotator cuff tears is most cost-effective? A Markov model-based cost-utility analysis comparing superior capsular reconstruction, lower trapezius tendon transfer, subacromial balloon spacer implantation and reverse shoulder arthroplasty Epidemiology of hallux valgus surgery in Italy: A nationwide study from 2001 to 2016
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1