ChatGPT Achieves Only Fair Agreement with ACFAS Expert Panelist Clinical Consensus Statements.

Dominick J Casciato, Joshua Calhoun
{"title":"ChatGPT Achieves Only Fair Agreement with ACFAS Expert Panelist Clinical Consensus Statements.","authors":"Dominick J Casciato, Joshua Calhoun","doi":"10.1177/19386400251319567","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>As artificial intelligence (AI) becomes increasingly integrated into medicine and surgery, its applications are expanding rapidly-from aiding clinical documentation to providing patient information. However, its role in medical decision-making remains uncertain. This study evaluates an AI language model's alignment with clinical consensus statements in foot and ankle surgery.</p><p><strong>Methods: </strong>Clinical consensus statements from the American College of Foot and Ankle Surgeons (ACFAS; 2015-2022) were collected and rated by ChatGPT-o1 as being inappropriate, neither appropriate nor inappropriate, and appropriate. Ten repetitions of the statements were entered into ChatGPT-o1 in a random order, and the model was prompted to assign a corresponding rating. The AI-generated scores were compared to the expert panel's ratings, and intra-rater analysis was performed.</p><p><strong>Results: </strong>The analysis of 9 clinical consensus documents and 129 statements revealed an overall Cohen's kappa of 0.29 (95% CI: 0.12, 0.46), indicating fair alignment between expert panelists and ChatGPT. Overall, ankle arthritis and heel pain showed the highest concordance at 100%, while flatfoot exhibited the lowest agreement at 25%, reflecting variability between ChatGPT and expert panelists. Among the ChatGPT ratings, Cohen's kappa values ranged from 0.41 to 0.92, highlighting variability in internal reliability across topics.</p><p><strong>Conclusion: </strong>ChatGPT achieved overall fair agreement and demonstrated variable consistency when repetitively rating ACFAS expert panel clinical practice guidelines representing a variety of topics. These data reflect the need for further study of the causes, impacts, and solutions for this disparity between intelligence and human intelligence.</p><p><strong>Level of evidence: </strong>Level IV: Retrospective cohort study.</p>","PeriodicalId":73046,"journal":{"name":"Foot & ankle specialist","volume":" ","pages":"19386400251319567"},"PeriodicalIF":2.1000,"publicationDate":"2025-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Foot & ankle specialist","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/19386400251319567","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction: As artificial intelligence (AI) becomes increasingly integrated into medicine and surgery, its applications are expanding rapidly-from aiding clinical documentation to providing patient information. However, its role in medical decision-making remains uncertain. This study evaluates an AI language model's alignment with clinical consensus statements in foot and ankle surgery.

Methods: Clinical consensus statements from the American College of Foot and Ankle Surgeons (ACFAS; 2015-2022) were collected and rated by ChatGPT-o1 as being inappropriate, neither appropriate nor inappropriate, and appropriate. Ten repetitions of the statements were entered into ChatGPT-o1 in a random order, and the model was prompted to assign a corresponding rating. The AI-generated scores were compared to the expert panel's ratings, and intra-rater analysis was performed.

Results: The analysis of 9 clinical consensus documents and 129 statements revealed an overall Cohen's kappa of 0.29 (95% CI: 0.12, 0.46), indicating fair alignment between expert panelists and ChatGPT. Overall, ankle arthritis and heel pain showed the highest concordance at 100%, while flatfoot exhibited the lowest agreement at 25%, reflecting variability between ChatGPT and expert panelists. Among the ChatGPT ratings, Cohen's kappa values ranged from 0.41 to 0.92, highlighting variability in internal reliability across topics.

Conclusion: ChatGPT achieved overall fair agreement and demonstrated variable consistency when repetitively rating ACFAS expert panel clinical practice guidelines representing a variety of topics. These data reflect the need for further study of the causes, impacts, and solutions for this disparity between intelligence and human intelligence.

Level of evidence: Level IV: Retrospective cohort study.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
ChatGPT仅与ACFAS专家小组成员临床共识声明达成公平协议。
导读:随着人工智能(AI)越来越多地融入医学和外科,其应用范围正在迅速扩大——从辅助临床文档到提供患者信息。然而,它在医疗决策中的作用仍然不确定。本研究评估了人工智能语言模型与足部和踝关节手术的临床共识陈述的一致性。方法:美国足踝外科学会(ACFAS)的临床共识声明;收集2015-2022),并由chatgpt - 01评定为不适当、不适当和适当。在chatgpt - 01中按随机顺序输入10个重复的语句,并提示模型分配相应的评级。将人工智能生成的分数与专家小组的评分进行比较,并进行内部评分分析。结果:对9份临床共识文件和129份声明的分析显示,总体Cohen's kappa为0.29 (95% CI: 0.12, 0.46),表明专家小组成员和ChatGPT之间的公平一致。总体而言,踝关节关节炎和脚后跟疼痛的一致性最高,为100%,而平底足的一致性最低,为25%,反映了ChatGPT和专家小组成员之间的差异。在ChatGPT评级中,Cohen的kappa值从0.41到0.92不等,突出了不同主题内部可靠性的可变性。结论:ChatGPT在对代表各种主题的ACFAS专家小组临床实践指南进行重复评分时,达到了总体公平一致,并表现出可变的一致性。这些数据反映了需要进一步研究智能和人类智能之间差异的原因、影响和解决方案。证据等级:IV级:回顾性队列研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Influence of Implant Design on Clinical Outcomes, Complications, and Revisions Rate in Anterior Approach Total Ankle Arthroplasty A Systematic Review and Meta-Analysis. Percentage of Weight Placed on Acute, Subacute, and Chronic Foot and Ankle Injuries in Weightbearing Radiographs. Mallet Hallux Injury Fixed With Extension Blocking Pin Technique: A Case Report. Early Definitive Fixation of Talus Fractures Is Safe: A Retrospective Review. Advancements in Managing Wound Biofilm: A Systematic Review and Meta-analysis of Randomized Controlled Trials on Topical Modalities.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1