Do ChatGPT and Gemini Provide Appropriate Recommendations for Pediatric Orthopaedic Conditions?

IF 1.4 3区 医学 Q3 ORTHOPEDICS Journal of Pediatric Orthopaedics Pub Date : 2025-01-01 Epub Date: 2024-08-22 DOI:10.1097/BPO.0000000000002797
Sean Pirkle, JaeWon Yang, Todd J Blumberg
{"title":"Do ChatGPT and Gemini Provide Appropriate Recommendations for Pediatric Orthopaedic Conditions?","authors":"Sean Pirkle, JaeWon Yang, Todd J Blumberg","doi":"10.1097/BPO.0000000000002797","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Artificial intelligence (AI), and in particular large language models (LLMs) such as Chat Generative Pre-Trained Transformer (ChatGPT) and Gemini have provided additional resources for patients to research the management of healthcare conditions, for their own edification and the advocacy in the care of their children. The accuracy of these models, however, and the sources from which they draw conclusions, have been largely unstudied in pediatric orthopaedics. This research aimed to assess the reliability of machine learning tools in providing appropriate recommendations for the care of common pediatric orthopaedic conditions.</p><p><strong>Methods: </strong>ChatGPT and Gemini were queried using plain language generated from the American Academy of Orthopaedic Surgeons (AAOS) Clinical Practice Guidelines (CPGs) listed on the Pediatric Orthopedic Society of North America (POSNA) web page. Two independent reviewers assessed the accuracy of the responses, and chi-square analyses were used to compare the 2 LLMs. Inter-rater reliability was calculated via Cohen's Kappa coefficient. If research studies were cited, attempts were made to assess their legitimacy by searching the PubMed and Google Scholar databases.</p><p><strong>Results: </strong>ChatGPT and Gemini performed similarly, agreeing with the AAOS CPGs at a rate of 67% and 69%. No significant differences were observed in the performance between the 2 LLMs. ChatGPT did not reference specific studies in any response, whereas Gemini referenced a total of 16 research papers in 6 of 24 responses. 12 of the 16 studies referenced contained errors and either were unable to be identified (7) or contained discrepancies (5) regarding publication year, journal, or proper accreditation of authorship.</p><p><strong>Conclusion: </strong>The LLMs investigated were frequently aligned with the AAOS CPGs; however, the rate of neutral statements or disagreement with consensus recommendations was substantial and frequently contained errors with citations of sources. These findings suggest there remains room for growth and transparency in the development of the models which power AI, and they may not yet represent the best source of up-to-date healthcare information for patients or providers.</p>","PeriodicalId":16945,"journal":{"name":"Journal of Pediatric Orthopaedics","volume":" ","pages":"e66-e71"},"PeriodicalIF":1.4000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Pediatric Orthopaedics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/BPO.0000000000002797","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/8/22 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"ORTHOPEDICS","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Artificial intelligence (AI), and in particular large language models (LLMs) such as Chat Generative Pre-Trained Transformer (ChatGPT) and Gemini have provided additional resources for patients to research the management of healthcare conditions, for their own edification and the advocacy in the care of their children. The accuracy of these models, however, and the sources from which they draw conclusions, have been largely unstudied in pediatric orthopaedics. This research aimed to assess the reliability of machine learning tools in providing appropriate recommendations for the care of common pediatric orthopaedic conditions.

Methods: ChatGPT and Gemini were queried using plain language generated from the American Academy of Orthopaedic Surgeons (AAOS) Clinical Practice Guidelines (CPGs) listed on the Pediatric Orthopedic Society of North America (POSNA) web page. Two independent reviewers assessed the accuracy of the responses, and chi-square analyses were used to compare the 2 LLMs. Inter-rater reliability was calculated via Cohen's Kappa coefficient. If research studies were cited, attempts were made to assess their legitimacy by searching the PubMed and Google Scholar databases.

Results: ChatGPT and Gemini performed similarly, agreeing with the AAOS CPGs at a rate of 67% and 69%. No significant differences were observed in the performance between the 2 LLMs. ChatGPT did not reference specific studies in any response, whereas Gemini referenced a total of 16 research papers in 6 of 24 responses. 12 of the 16 studies referenced contained errors and either were unable to be identified (7) or contained discrepancies (5) regarding publication year, journal, or proper accreditation of authorship.

Conclusion: The LLMs investigated were frequently aligned with the AAOS CPGs; however, the rate of neutral statements or disagreement with consensus recommendations was substantial and frequently contained errors with citations of sources. These findings suggest there remains room for growth and transparency in the development of the models which power AI, and they may not yet represent the best source of up-to-date healthcare information for patients or providers.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
ChatGPT 和 Gemini 是否为小儿骨科疾病提供了适当的建议?
背景:人工智能(AI),尤其是大型语言模型(LLMs),如聊天生成预训练转换器(ChatGPT)和双子座(Gemini),为患者提供了更多的资源来研究医疗保健条件的管理,以提高他们自己的知识水平,并为他们的孩子提供医疗服务。然而,这些模型的准确性及其得出结论的来源在儿科矫形外科中大多尚未得到研究。本研究旨在评估机器学习工具在为常见儿科骨科疾病的护理提供适当建议方面的可靠性:方法:使用从北美儿科矫形外科学会(POSNA)网页上列出的美国矫形外科医师学会(AAOS)临床实践指南(CPG)中生成的普通语言对 ChatGPT 和 Gemini 进行了查询。两名独立评审员对回答的准确性进行了评估,并使用卡方分析比较了两种 LLM。通过科恩卡帕系数(Cohen's Kappa coefficient)计算评阅者之间的可靠性。如果引用了研究成果,则通过搜索 PubMed 和 Google Scholar 数据库来评估其合法性:结果:ChatGPT 和 Gemini 的表现相似,与 AAOS CPGs 的一致率分别为 67% 和 69%。两种 LLM 的表现无明显差异。ChatGPT 在任何回复中都没有引用具体的研究,而 Gemini 在 24 个回复中的 6 个回复共引用了 16 篇研究论文。在引用的 16 篇研究论文中,有 12 篇存在错误,要么无法识别(7 篇),要么在出版年份、期刊或适当的作者认证方面存在差异(5 篇):所调查的 LLM 经常与 AAOS CPGs 保持一致;但是,中立声明或不同意共识建议的比例很高,而且经常出现引用来源错误。这些发现表明,人工智能模型的开发仍有发展空间和透明度,它们可能还不能代表患者或医疗服务提供者最新医疗信息的最佳来源。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
3.30
自引率
17.60%
发文量
512
审稿时长
6 months
期刊介绍: ​Journal of Pediatric Orthopaedics is a leading journal that focuses specifically on traumatic injuries to give you hands-on on coverage of a fast-growing field. You''ll get articles that cover everything from the nature of injury to the effects of new drug therapies; everything from recommendations for more effective surgical approaches to the latest laboratory findings.
期刊最新文献
Hip Instability in Children With Spinal Muscular Atrophy: A Retrospective Study. Medial Patellofemoral Ligament Reconstruction Improves Patella Alta. Optimal Timing for Advanced Imaging in Childhood Bone and Joint Infection. Lateral Open Wedge Osteotomy and Lateral Condyle Fusion In Situ for Children With Condyle Nonunion and Cubitus Valgus Deformity. Mitigating Risk of Acute Kidney Injury Among Children With Methicillin-resistant Staphylococcus aureus Osteomyelitis.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1