"Dr. AI Will See You Now": How Do ChatGPT-4 Treatment Recommendations Align With Orthopaedic Clinical Practice Guidelines?

IF 4.4 2区医学 Q1 ORTHOPEDICS Clinical Orthopaedics and Related Research® Pub Date : 2024-12-01 Epub Date: 2024-09-06 DOI:10.1097/CORR.0000000000003234

Tanios Dagher, Emma P Dwyer, Hayden P Baker, Senthooran Kalidoss, Jason A Strelzow

{"title":"\"Dr. AI Will See You Now\": How Do ChatGPT-4 Treatment Recommendations Align With Orthopaedic Clinical Practice Guidelines?","authors":"Tanios Dagher, Emma P Dwyer, Hayden P Baker, Senthooran Kalidoss, Jason A Strelzow","doi":"10.1097/CORR.0000000000003234","DOIUrl":null,"url":null,"abstract":"Background: Artificial intelligence (AI) is engineered to emulate tasks that have historically required human interaction and intellect, including learning, pattern recognition, decision-making, and problem-solving. Although AI models like ChatGPT-4 have demonstrated satisfactory performance on medical licensing exams, suggesting a potential for supporting medical diagnostics and decision-making, no study of which we are aware has evaluated the ability of these tools to make treatment recommendations when given clinical vignettes and representative medical imaging of common orthopaedic conditions. As AI continues to advance, a thorough understanding of its strengths and limitations is necessary to inform safe and helpful integration into medical practice.Questions/purposes: (1) What is the concordance between ChatGPT-4-generated treatment recommendations for common orthopaedic conditions with both the American Academy of Orthopaedic Surgeons (AAOS) clinical practice guidelines (CPGs) and an orthopaedic attending physician's treatment plan? (2) In what specific areas do the ChatGPT-4-generated treatment recommendations diverge from the AAOS CPGs?Methods: Ten common orthopaedic conditions with associated AAOS CPGs were identified: carpal tunnel syndrome, distal radius fracture, glenohumeral joint osteoarthritis, rotator cuff injury, clavicle fracture, hip fracture, hip osteoarthritis, knee osteoarthritis, ACL injury, and acute Achilles rupture. For each condition, the medical records of 10 deidentified patients managed at our facility were used to construct clinical vignettes that each had an isolated, single diagnosis with adequate clarity. The vignettes also encompassed a range of diagnostic severity to evaluate more thoroughly adherence to the treatment guidelines outlined by the AAOS. These clinical vignettes were presented alongside representative radiographic imaging. The model was prompted to provide a single treatment plan recommendation. Each treatment plan was compared with established AAOS CPGs and to the treatment plan documented by the attending orthopaedic surgeon treating the specific patient. Vignettes where ChatGPT-4 recommendations diverged from CPGs were reviewed to identify patterns of error and summarized.Results: ChatGPT-4 provided treatment recommendations in accordance with the AAOS CPGs in 90% (90 of 100) of clinical vignettes. Concordance between ChatGPT-generated plans and the plan recommended by the treating orthopaedic attending physician was 78% (78 of 100). One hundred percent (30 of 30) of ChatGPT-4 recommendations for fracture vignettes and hip and knee arthritis vignettes matched with CPG recommendations, whereas the model struggled most with recommendations for carpal tunnel syndrome (3 of 10 instances demonstrated discordance). ChatGPT-4 recommendations diverged from AAOS CPGs for three carpal tunnel syndrome vignettes; two ACL injury, rotator cuff injury, and glenohumeral joint osteoarthritis vignettes; as well as one acute Achilles rupture vignette. In these situations, ChatGPT-4 most often struggled to correctly interpret injury severity and progression, incorporate patient factors (such as lifestyle or comorbidities) into decision-making, and recognize a contraindication to surgery.Conclusion: ChatGPT-4 can generate accurate treatment plans aligned with CPGs but can also make mistakes when it is required to integrate multiple patient factors into decision-making and understand disease severity and progression. Physicians must critically assess the full clinical picture when using AI tools to support their decision-making.Clinical relevance: ChatGPT-4 may be used as an on-demand diagnostic companion, but patient-centered decision-making should continue to remain in the hands of the physician.","PeriodicalId":10404,"journal":{"name":"Clinical Orthopaedics and Related Research®","volume":" ","pages":"2098-2106"},"PeriodicalIF":4.4000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11556953/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical Orthopaedics and Related Research®","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/CORR.0000000000003234","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/9/6 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"ORTHOPEDICS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Artificial intelligence (AI) is engineered to emulate tasks that have historically required human interaction and intellect, including learning, pattern recognition, decision-making, and problem-solving. Although AI models like ChatGPT-4 have demonstrated satisfactory performance on medical licensing exams, suggesting a potential for supporting medical diagnostics and decision-making, no study of which we are aware has evaluated the ability of these tools to make treatment recommendations when given clinical vignettes and representative medical imaging of common orthopaedic conditions. As AI continues to advance, a thorough understanding of its strengths and limitations is necessary to inform safe and helpful integration into medical practice.

Questions/purposes: (1) What is the concordance between ChatGPT-4-generated treatment recommendations for common orthopaedic conditions with both the American Academy of Orthopaedic Surgeons (AAOS) clinical practice guidelines (CPGs) and an orthopaedic attending physician's treatment plan? (2) In what specific areas do the ChatGPT-4-generated treatment recommendations diverge from the AAOS CPGs?

Methods: Ten common orthopaedic conditions with associated AAOS CPGs were identified: carpal tunnel syndrome, distal radius fracture, glenohumeral joint osteoarthritis, rotator cuff injury, clavicle fracture, hip fracture, hip osteoarthritis, knee osteoarthritis, ACL injury, and acute Achilles rupture. For each condition, the medical records of 10 deidentified patients managed at our facility were used to construct clinical vignettes that each had an isolated, single diagnosis with adequate clarity. The vignettes also encompassed a range of diagnostic severity to evaluate more thoroughly adherence to the treatment guidelines outlined by the AAOS. These clinical vignettes were presented alongside representative radiographic imaging. The model was prompted to provide a single treatment plan recommendation. Each treatment plan was compared with established AAOS CPGs and to the treatment plan documented by the attending orthopaedic surgeon treating the specific patient. Vignettes where ChatGPT-4 recommendations diverged from CPGs were reviewed to identify patterns of error and summarized.

Results: ChatGPT-4 provided treatment recommendations in accordance with the AAOS CPGs in 90% (90 of 100) of clinical vignettes. Concordance between ChatGPT-generated plans and the plan recommended by the treating orthopaedic attending physician was 78% (78 of 100). One hundred percent (30 of 30) of ChatGPT-4 recommendations for fracture vignettes and hip and knee arthritis vignettes matched with CPG recommendations, whereas the model struggled most with recommendations for carpal tunnel syndrome (3 of 10 instances demonstrated discordance). ChatGPT-4 recommendations diverged from AAOS CPGs for three carpal tunnel syndrome vignettes; two ACL injury, rotator cuff injury, and glenohumeral joint osteoarthritis vignettes; as well as one acute Achilles rupture vignette. In these situations, ChatGPT-4 most often struggled to correctly interpret injury severity and progression, incorporate patient factors (such as lifestyle or comorbidities) into decision-making, and recognize a contraindication to surgery.

Conclusion: ChatGPT-4 can generate accurate treatment plans aligned with CPGs but can also make mistakes when it is required to integrate multiple patient factors into decision-making and understand disease severity and progression. Physicians must critically assess the full clinical picture when using AI tools to support their decision-making.

Clinical relevance: ChatGPT-4 may be used as an on-demand diagnostic companion, but patient-centered decision-making should continue to remain in the hands of the physician.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

"AI 博士现在就来见您"：ChatGPT-4 治疗建议如何与骨科临床实践指南保持一致？

背景：人工智能（AI）被设计用来模拟历史上需要人类互动和智力的任务，包括学习、模式识别、决策和解决问题。虽然像 ChatGPT-4 这样的人工智能模型在医学执照考试中表现令人满意，表明其具有支持医疗诊断和决策的潜力，但据我们所知，目前还没有研究评估过这些工具在给定常见骨科疾病的临床案例和代表性医学影像时提出治疗建议的能力。问题/目的：（1）ChatGPT-4 生成的常见骨科疾病治疗建议与美国骨科外科医生学会（AAOS）临床实践指南（CPG）和骨科主治医生的治疗计划之间的一致性如何？(2) ChatGPT-4 生成的治疗建议在哪些具体方面与 AAOS CPGs 存在差异？确定了与 AAOS CPGs 相关的 10 种常见骨科疾病：腕管综合征、桡骨远端骨折、盂肱关节骨关节炎、肩袖损伤、锁骨骨折、髋部骨折、髋关节骨关节炎、膝关节骨关节炎、前交叉韧带损伤和急性跟腱断裂。针对每种病症，我们利用本机构管理的 10 名身份已被确认的患者的医疗记录来构建临床小故事，每个小故事都有一个足够清晰的单独诊断。小故事还包括一系列诊断严重程度，以便更全面地评估是否符合 AAOS 概述的治疗指南。这些临床小故事与具有代表性的放射成像一起展示。模型被提示提供单一治疗方案建议。每个治疗方案都与既定的 AAOS CPGs 以及治疗特定患者的骨科主治医生记录的治疗方案进行了比较。对 ChatGPT-4 建议与 CPGs 有偏差的病例进行了审查，以确定错误模式并进行总结：结果：在 90% 的临床案例中（100 个案例中的 90 个），ChatGPT-4 提供的治疗建议与 AAOS CPGs 一致。ChatGPT 生成的计划与骨科主治医生推荐的计划之间的一致性为 78%（100 例中有 78 例）。ChatGPT-4 对骨折小案例以及髋关节和膝关节炎小案例的建议与 CPG 建议的吻合率为 100%（30 项中的 30 项），而该模型在腕管综合征的建议方面最为吃力（10 项中有 3 项不吻合）。ChatGPT-4 的建议与 AAOS CPG 存在分歧，包括三个腕管综合症小案例；两个前交叉韧带损伤、肩袖损伤和盂肱关节骨关节炎小案例；以及一个急性跟腱断裂小案例。在这些情况下，ChatGPT-4 通常难以正确解释损伤的严重程度和进展，无法将患者因素（如生活方式或合并症）纳入决策，也无法识别手术禁忌症：结论：ChatGPT-4 可以生成符合 CPGs 的准确治疗方案，但在需要将多种患者因素纳入决策以及了解疾病严重程度和进展情况时也会犯错。在使用人工智能工具支持决策时，医生必须严格评估临床全貌：ChatGPT-4 可作为按需诊断的辅助工具，但以患者为中心的决策仍应由医生做出。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Clinical Orthopaedics and Related Research® 医学-外科

CiteScore

7.00

自引率

11.90%

发文量

722

审稿时长

2.5 months

期刊介绍： Clinical Orthopaedics and Related Research® is a leading peer-reviewed journal devoted to the dissemination of new and important orthopaedic knowledge. CORR® brings readers the latest clinical and basic research, along with columns, commentaries, and interviews with authors.