在对腰骶椎痛做出知情决定时,ChatGPT 与临床实践指南的性能比较:一项横断面研究。

IF 6 1区 医学 Q1 ORTHOPEDICS Journal of Orthopaedic & Sports Physical Therapy Pub Date : 2024-03-01 DOI:10.2519/jospt.2024.12151
Silvia Gianola, Silvia Bargeri, Greta Castellini, Chad Cook, Alvisa Palese, Paolo Pillastrini, Silvia Salvalaggio, Andrea Turolla, Giacomo Rossettini
{"title":"在对腰骶椎痛做出知情决定时,ChatGPT 与临床实践指南的性能比较:一项横断面研究。","authors":"Silvia Gianola, Silvia Bargeri, Greta Castellini, Chad Cook, Alvisa Palese, Paolo Pillastrini, Silvia Salvalaggio, Andrea Turolla, Giacomo Rossettini","doi":"10.2519/jospt.2024.12151","DOIUrl":null,"url":null,"abstract":"<p><p><b>OBJECTIVE:</b> To compare the accuracy of an artificial intelligence chatbot to clinical practice guidelines (CPGs) recommendations for providing answers to complex clinical questions on lumbosacral radicular pain. <b>DESIGN:</b> Cross-sectional study. <b>METHODS:</b> We extracted recommendations from recent CPGs for diagnosing and treating lumbosacral radicular pain. Relative clinical questions were developed and queried to OpenAI's ChatGPT (GPT-3.5). We compared ChatGPT answers to CPGs recommendations by assessing the (1) internal consistency of ChatGPT answers by measuring the percentage of text wording similarity when a clinical question was posed 3 times, (2) reliability between 2 independent reviewers in grading ChatGPT answers, and (3) accuracy of ChatGPT answers compared to CPGs recommendations. Reliability was estimated using Fleiss' kappa (κ) coefficients, and accuracy by interobserver agreement as the frequency of the agreements among all judgments. <b>RESULTS:</b> We tested 9 clinical questions. The internal consistency of text ChatGPT answers was unacceptable across all 3 trials in all clinical questions (mean percentage of 49%, standard deviation of 15). Intrareliability (reviewer 1: κ = 0.90, standard error [SE] = 0.09; reviewer 2: κ = 0.90, SE = 0.10) and interreliability (κ = 0.85, SE = 0.15) between the 2 reviewers was \"almost perfect.\" Accuracy between ChatGPT answers and CPGs recommendations was slight, demonstrating agreement in 33% of recommendations. <b>CONCLUSION:</b> ChatGPT performed poorly in internal consistency and accuracy of the indications generated compared to clinical practice guideline recommendations for lumbosacral radicular pain. <i>J Orthop Sports Phys Ther 2024;54(3):1-7. Epub 29 January 2024. doi:10.2519/jospt.2024.12151</i>.</p>","PeriodicalId":50099,"journal":{"name":"Journal of Orthopaedic & Sports Physical Therapy","volume":" ","pages":"222-228"},"PeriodicalIF":6.0000,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Performance of ChatGPT Compared to Clinical Practice Guidelines in Making Informed Decisions for Lumbosacral Radicular Pain: A Cross-sectional Study.\",\"authors\":\"Silvia Gianola, Silvia Bargeri, Greta Castellini, Chad Cook, Alvisa Palese, Paolo Pillastrini, Silvia Salvalaggio, Andrea Turolla, Giacomo Rossettini\",\"doi\":\"10.2519/jospt.2024.12151\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p><b>OBJECTIVE:</b> To compare the accuracy of an artificial intelligence chatbot to clinical practice guidelines (CPGs) recommendations for providing answers to complex clinical questions on lumbosacral radicular pain. <b>DESIGN:</b> Cross-sectional study. <b>METHODS:</b> We extracted recommendations from recent CPGs for diagnosing and treating lumbosacral radicular pain. Relative clinical questions were developed and queried to OpenAI's ChatGPT (GPT-3.5). We compared ChatGPT answers to CPGs recommendations by assessing the (1) internal consistency of ChatGPT answers by measuring the percentage of text wording similarity when a clinical question was posed 3 times, (2) reliability between 2 independent reviewers in grading ChatGPT answers, and (3) accuracy of ChatGPT answers compared to CPGs recommendations. Reliability was estimated using Fleiss' kappa (κ) coefficients, and accuracy by interobserver agreement as the frequency of the agreements among all judgments. <b>RESULTS:</b> We tested 9 clinical questions. The internal consistency of text ChatGPT answers was unacceptable across all 3 trials in all clinical questions (mean percentage of 49%, standard deviation of 15). Intrareliability (reviewer 1: κ = 0.90, standard error [SE] = 0.09; reviewer 2: κ = 0.90, SE = 0.10) and interreliability (κ = 0.85, SE = 0.15) between the 2 reviewers was \\\"almost perfect.\\\" Accuracy between ChatGPT answers and CPGs recommendations was slight, demonstrating agreement in 33% of recommendations. <b>CONCLUSION:</b> ChatGPT performed poorly in internal consistency and accuracy of the indications generated compared to clinical practice guideline recommendations for lumbosacral radicular pain. <i>J Orthop Sports Phys Ther 2024;54(3):1-7. Epub 29 January 2024. doi:10.2519/jospt.2024.12151</i>.</p>\",\"PeriodicalId\":50099,\"journal\":{\"name\":\"Journal of Orthopaedic & Sports Physical Therapy\",\"volume\":\" \",\"pages\":\"222-228\"},\"PeriodicalIF\":6.0000,\"publicationDate\":\"2024-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Orthopaedic & Sports Physical Therapy\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.2519/jospt.2024.12151\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ORTHOPEDICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Orthopaedic & Sports Physical Therapy","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2519/jospt.2024.12151","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ORTHOPEDICS","Score":null,"Total":0}
引用次数: 0

摘要

目的:比较人工智能聊天机器人与临床实践指南(CPG)建议在回答腰骶根性疼痛复杂临床问题时的准确性。设计:横断面研究。方法:我们从近期的临床实践指南中提取了诊断和治疗腰骶部疼痛的建议。开发了相关临床问题,并在 Open AI 的 ChatGPT (GPT-3.5) 中进行了查询。我们将 ChatGPT 答案与 CPGs 建议进行了比较,评估方法包括:(i) 当一个临床问题被提出三次时,通过测量文本措辞相似度的百分比来评估 ChatGPT 答案的内部一致性;(ii) 两位独立审查员对 ChatGPT 答案评分的可靠性;(iii) ChatGPT 答案与 CPGs 建议相比的准确性。可靠性采用弗莱斯卡帕(κ)系数估算,准确性采用观察者之间的一致性估算,即所有判断中一致的频率。结果:我们测试了九个临床问题。在所有临床问题中,文本 ChatGPT 答案的内部一致性在所有三项试验中都是不可接受的(平均百分比为 49%,标准差为 15)。两位审阅人之间的内部(审阅人 1:κ=0.90 标准误差 (SE) =0.09;审阅人 2:κ=0.90 SE=0.10)和相互之间的可靠性(κ=0.85 SE=0.15)"几乎完美"。ChatGPT 答案与 CPGs 建议之间的准确性略有差异,33% 的建议一致。结论:与腰骶部根性疼痛临床实践指南建议相比,ChatGPT 生成的适应症在内部一致性和准确性方面表现不佳。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Performance of ChatGPT Compared to Clinical Practice Guidelines in Making Informed Decisions for Lumbosacral Radicular Pain: A Cross-sectional Study.

OBJECTIVE: To compare the accuracy of an artificial intelligence chatbot to clinical practice guidelines (CPGs) recommendations for providing answers to complex clinical questions on lumbosacral radicular pain. DESIGN: Cross-sectional study. METHODS: We extracted recommendations from recent CPGs for diagnosing and treating lumbosacral radicular pain. Relative clinical questions were developed and queried to OpenAI's ChatGPT (GPT-3.5). We compared ChatGPT answers to CPGs recommendations by assessing the (1) internal consistency of ChatGPT answers by measuring the percentage of text wording similarity when a clinical question was posed 3 times, (2) reliability between 2 independent reviewers in grading ChatGPT answers, and (3) accuracy of ChatGPT answers compared to CPGs recommendations. Reliability was estimated using Fleiss' kappa (κ) coefficients, and accuracy by interobserver agreement as the frequency of the agreements among all judgments. RESULTS: We tested 9 clinical questions. The internal consistency of text ChatGPT answers was unacceptable across all 3 trials in all clinical questions (mean percentage of 49%, standard deviation of 15). Intrareliability (reviewer 1: κ = 0.90, standard error [SE] = 0.09; reviewer 2: κ = 0.90, SE = 0.10) and interreliability (κ = 0.85, SE = 0.15) between the 2 reviewers was "almost perfect." Accuracy between ChatGPT answers and CPGs recommendations was slight, demonstrating agreement in 33% of recommendations. CONCLUSION: ChatGPT performed poorly in internal consistency and accuracy of the indications generated compared to clinical practice guideline recommendations for lumbosacral radicular pain. J Orthop Sports Phys Ther 2024;54(3):1-7. Epub 29 January 2024. doi:10.2519/jospt.2024.12151.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
8.00
自引率
4.90%
发文量
101
审稿时长
6-12 weeks
期刊介绍: The Journal of Orthopaedic & Sports Physical Therapy® (JOSPT®) publishes scientifically rigorous, clinically relevant content for physical therapists and others in the health care community to advance musculoskeletal and sports-related practice globally. To this end, JOSPT features the latest evidence-based research and clinical cases in musculoskeletal health, injury, and rehabilitation, including physical therapy, orthopaedics, sports medicine, and biomechanics. With an impact factor of 3.090, JOSPT is among the highest ranked physical therapy journals in Clarivate Analytics''s Journal Citation Reports, Science Edition (2017). JOSPT stands eighth of 65 journals in the category of rehabilitation, twelfth of 77 journals in orthopedics, and fourteenth of 81 journals in sport sciences. JOSPT''s 5-year impact factor is 4.061.
期刊最新文献
Concussion Incidence by Type of Sport: Differences by Sex, Age Groups, Type of Session, and Level of Play An Overview of Systematic Reviews With Meta-analysis. Differential Effects of Quadriceps and Hip Muscle Exercises for Patellofemoral Pain: A Secondary Effect Modifier Analysis of a Randomized Trial. Improvements in Forward Bending Are Related to Improvements in Pain and Disability During Cognitive Functional Therapy for People With Chronic Low Back Pain. The Influence of "Labels" for Neck Pain on Recovery Expectations Following a Motor Vehicle Crash: An Online-Randomized Vignette-Based Experiment. Encouraging New Moms to Move More-Are We Missing the Mark? A Systematic Review With Meta-Analysis of the Effect of Exercise Interventions on Postpartum Physical Activity Levels and Cardiorespiratory Fitness.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1