Can ChatGPT-4 Diagnose and Treat Like an Orthopaedic Surgeon? Testing Clinical Decision Making and Diagnostic Ability in Soft-Tissue Pathologies of the Foot and Ankle.

IF 2.6 2区医学 Q1 ORTHOPEDICS Journal of the American Academy of Orthopaedic Surgeons Pub Date : 2024-10-15 DOI:10.5435/JAAOS-D-24-00595

Hayden Hartman, Maritza Diane Essis, Wei Shao Tung, Irvin Oh, Sean Peden, Arianna L Gianakos

{"title":"Can ChatGPT-4 Diagnose and Treat Like an Orthopaedic Surgeon? Testing Clinical Decision Making and Diagnostic Ability in Soft-Tissue Pathologies of the Foot and Ankle.","authors":"Hayden Hartman, Maritza Diane Essis, Wei Shao Tung, Irvin Oh, Sean Peden, Arianna L Gianakos","doi":"10.5435/JAAOS-D-24-00595","DOIUrl":null,"url":null,"abstract":"Introduction: ChatGPT-4, a chatbot with an ability to carry human-like conversation, has attracted attention after demonstrating aptitude to pass professional licensure examinations. The purpose of this study was to explore the diagnostic and decision-making capacities of ChatGPT-4 in clinical management specifically assessing for accuracy in the identification and treatment of soft-tissue foot and ankle pathologies.Methods: This study presented eight soft-tissue-related foot and ankle cases to ChatGPT-4, with each case assessed by three fellowship-trained foot and ankle orthopaedic surgeons. The evaluation system included five criteria within a Likert scale, scoring from 5 (lowest) to 25 (highest possible).Results: The average sum score of all cases was 22.0. The Morton neuroma case received the highest score (24.7), and the peroneal tendon tear case received the lowest score (16.3). Subgroup analyses of each of the 5 criterion using showed no notable differences in surgeon grading. Criteria 3 (provide alternative treatments) and 4 (provide comprehensive information) were graded markedly lower than criteria 1 (diagnose), 2 (treat), and 5 (provide accurate information) (for both criteria 3 and 4: P = 0.007; P = 0.032; P < 0.0001). Criteria 5 was graded markedly higher than criteria 2, 3, and 4 (P = 0.02; P < 0.0001; P < 0.0001).Conclusion: This study demonstrates that ChatGPT-4 effectively diagnosed and provided reliable treatment options for most soft-tissue foot and ankle cases presented, noting consistency among surgeon evaluators. Individual criterion assessment revealed that ChatGPT-4 was most effective in diagnosing and suggesting appropriate treatment, but limitations were seen in the chatbot's ability to provide comprehensive information and alternative treatment options. In addition, the chatbot successfully did not suggest fabricated treatment options, a common concern in prior literature. This resource could be useful for clinicians seeking reliable patient education materials without the fear of inconsistencies, although comprehensive information beyond treatment may be limited.","PeriodicalId":51098,"journal":{"name":"Journal of the American Academy of Orthopaedic Surgeons","volume":" ","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Academy of Orthopaedic Surgeons","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.5435/JAAOS-D-24-00595","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ORTHOPEDICS","Score":null,"Total":0}

引用次数: 0

Abstract

Introduction: ChatGPT-4, a chatbot with an ability to carry human-like conversation, has attracted attention after demonstrating aptitude to pass professional licensure examinations. The purpose of this study was to explore the diagnostic and decision-making capacities of ChatGPT-4 in clinical management specifically assessing for accuracy in the identification and treatment of soft-tissue foot and ankle pathologies.

Methods: This study presented eight soft-tissue-related foot and ankle cases to ChatGPT-4, with each case assessed by three fellowship-trained foot and ankle orthopaedic surgeons. The evaluation system included five criteria within a Likert scale, scoring from 5 (lowest) to 25 (highest possible).

Results: The average sum score of all cases was 22.0. The Morton neuroma case received the highest score (24.7), and the peroneal tendon tear case received the lowest score (16.3). Subgroup analyses of each of the 5 criterion using showed no notable differences in surgeon grading. Criteria 3 (provide alternative treatments) and 4 (provide comprehensive information) were graded markedly lower than criteria 1 (diagnose), 2 (treat), and 5 (provide accurate information) (for both criteria 3 and 4: P = 0.007; P = 0.032; P < 0.0001). Criteria 5 was graded markedly higher than criteria 2, 3, and 4 (P = 0.02; P < 0.0001; P < 0.0001).

Conclusion: This study demonstrates that ChatGPT-4 effectively diagnosed and provided reliable treatment options for most soft-tissue foot and ankle cases presented, noting consistency among surgeon evaluators. Individual criterion assessment revealed that ChatGPT-4 was most effective in diagnosing and suggesting appropriate treatment, but limitations were seen in the chatbot's ability to provide comprehensive information and alternative treatment options. In addition, the chatbot successfully did not suggest fabricated treatment options, a common concern in prior literature. This resource could be useful for clinicians seeking reliable patient education materials without the fear of inconsistencies, although comprehensive information beyond treatment may be limited.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

ChatGPT-4 能否像骨科医生一样进行诊断和治疗？测试足踝软组织病变的临床决策和诊断能力。

简介ChatGPT-4是一款能进行类人对话的聊天机器人，在通过专业执照考试后备受关注。本研究的目的是探索 ChatGPT-4 在临床管理中的诊断和决策能力，特别是评估识别和治疗足踝软组织病变的准确性：本研究向 ChatGPT-4 提交了八个与足踝软组织相关的病例，每个病例都由三名受过研究培训的足踝矫形外科医生进行评估。评估系统包括李克特量表中的五个标准，得分从 5 分（最低）到 25 分（最高）不等：所有病例的平均总分为 22.0 分。莫顿神经瘤病例得分最高（24.7 分），腓骨肌腱撕裂病例得分最低（16.3 分）。对采用 5 项标准的每项标准进行的分组分析表明，外科医生的评分没有明显差异。标准 3（提供替代治疗方法）和标准 4（提供全面信息）的评分明显低于标准 1（诊断）、标准 2（治疗）和标准 5（提供准确信息）（标准 3 和标准 4：P = 0.007；P = 0.032；P < 0.0001）。标准 5 的评分明显高于标准 2、3 和 4（P = 0.02；P < 0.0001；P < 0.0001）：本研究表明，ChatGPT-4 能有效诊断大多数足踝软组织病例，并提供可靠的治疗方案，外科医生的评估结果具有一致性。个人标准评估显示，聊天机器人 ChatGPT-4 在诊断和建议适当治疗方面最为有效，但在提供全面信息和替代治疗方案方面存在局限性。此外，聊天机器人成功地没有提出捏造的治疗方案，这也是之前文献中常见的问题。虽然治疗以外的综合信息可能有限，但这一资源对于寻求可靠的患者教育材料而不必担心不一致的临床医生来说可能很有用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of the American Academy of Orthopaedic Surgeons 医学-整形外科

CiteScore

6.10

自引率

6.20%

发文量

529

审稿时长

4-8 weeks

期刊介绍： The Journal of the American Academy of Orthopaedic Surgeons was established in the fall of 1993 by the Academy in response to its membership’s demand for a clinical review journal. Two issues were published the first year, followed by six issues yearly from 1994 through 2004. In September 2005, JAAOS began publishing monthly issues. Each issue includes richly illustrated peer-reviewed articles focused on clinical diagnosis and management. Special features in each issue provide commentary on developments in pharmacotherapeutics, materials and techniques, and computer applications.