Responses From ChatGPT-4 Show Limited Correlation With Expert Consensus Statement on Anterior Shoulder Instability

Alexander Artamonov M.D. , Ira Bachar-Avnieli M.D. , Eyal Klang M.D. , Omri Lubovsky M.D. , Ehud Atoun M.D. , Alexander Bermant M.D. , Philip J. Rosinsky M.D.
{"title":"Responses From ChatGPT-4 Show Limited Correlation With Expert Consensus Statement on Anterior Shoulder Instability","authors":"Alexander Artamonov M.D. ,&nbsp;Ira Bachar-Avnieli M.D. ,&nbsp;Eyal Klang M.D. ,&nbsp;Omri Lubovsky M.D. ,&nbsp;Ehud Atoun M.D. ,&nbsp;Alexander Bermant M.D. ,&nbsp;Philip J. Rosinsky M.D.","doi":"10.1016/j.asmr.2024.100923","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><p>To compare the similarity of answers provided by Generative Pretrained Transformer-4 (GPT-4) with those of a consensus statement on diagnosis, nonoperative management, and Bankart repair in anterior shoulder instability (ASI).</p></div><div><h3>Methods</h3><p>An expert consensus statement on ASI published by Hurley et al. in 2022 was reviewed and questions laid out to the expert panel were extracted. GPT-4, the subscription version of ChatGPT, was queried using the same set of questions. Answers provided by GPT-4 were compared with those of the expert panel and subjectively rated for similarity by 2 experienced shoulder surgeons. GPT-4 was then used to rate the similarity of its own responses to the consensus statement, classifying them as low, medium, or high. Rates of similarity as classified by the shoulder surgeons and GPT-4 were then compared and interobserver reliability calculated using weighted κ scores.</p></div><div><h3>Results</h3><p>The degree of similarity between responses of GPT-4 and the ASI consensus statement, as defined by shoulder surgeons, was high in 25.8%, medium in 45.2%, and low 29% of questions. GPT-4 assessed similarity as high in 48.3%, medium in 41.9%, and low 9.7% of questions. Surgeons and GPT-4 reached consensus on the classification of 18 questions (58.1%) and disagreement on 13 questions (41.9%).</p></div><div><h3>Conclusions</h3><p>The responses generated by artificial intelligence exhibit limited correlation with an expert statement on the diagnosis and treatment of ASI.</p></div><div><h3>Clinical Relevance</h3><p>As the use of artificial intelligence becomes more prevalent, it is important to understand how closely information resembles content produced by human authors.</p></div>","PeriodicalId":34631,"journal":{"name":"Arthroscopy Sports Medicine and Rehabilitation","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666061X24000415/pdfft?md5=64dacbe11c8dcaec53b3b836778ff98c&pid=1-s2.0-S2666061X24000415-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Arthroscopy Sports Medicine and Rehabilitation","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666061X24000415","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose

To compare the similarity of answers provided by Generative Pretrained Transformer-4 (GPT-4) with those of a consensus statement on diagnosis, nonoperative management, and Bankart repair in anterior shoulder instability (ASI).

Methods

An expert consensus statement on ASI published by Hurley et al. in 2022 was reviewed and questions laid out to the expert panel were extracted. GPT-4, the subscription version of ChatGPT, was queried using the same set of questions. Answers provided by GPT-4 were compared with those of the expert panel and subjectively rated for similarity by 2 experienced shoulder surgeons. GPT-4 was then used to rate the similarity of its own responses to the consensus statement, classifying them as low, medium, or high. Rates of similarity as classified by the shoulder surgeons and GPT-4 were then compared and interobserver reliability calculated using weighted κ scores.

Results

The degree of similarity between responses of GPT-4 and the ASI consensus statement, as defined by shoulder surgeons, was high in 25.8%, medium in 45.2%, and low 29% of questions. GPT-4 assessed similarity as high in 48.3%, medium in 41.9%, and low 9.7% of questions. Surgeons and GPT-4 reached consensus on the classification of 18 questions (58.1%) and disagreement on 13 questions (41.9%).

Conclusions

The responses generated by artificial intelligence exhibit limited correlation with an expert statement on the diagnosis and treatment of ASI.

Clinical Relevance

As the use of artificial intelligence becomes more prevalent, it is important to understand how closely information resembles content produced by human authors.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
ChatGPT-4 的回复与肩关节前方不稳专家共识声明的相关性有限
目的比较生成式预训练转换器-4(GPT-4)与关于肩关节前不稳定(ASI)的诊断、非手术治疗和Bankart修复的共识声明所提供答案的相似性。方法回顾了Hurley等人于2022年发表的关于ASI的专家共识声明,并提取了向专家小组提出的问题。使用相同的问题集对 ChatGPT 的订阅版本 GPT-4 进行了查询。GPT-4 提供的答案与专家小组提供的答案进行了比较,并由两名经验丰富的肩部外科医生对相似度进行了主观评分。然后,GPT-4 用于评定自己的回答与共识声明的相似度,将其分为低、中、高三个等级。然后比较肩部外科医生和 GPT-4 的相似度,并使用加权 κ 分数计算观察者之间的可靠性。结果根据肩部外科医生的定义,GPT-4 和 ASI 共识声明之间的相似度在 25.8% 的问题中为高、45.2% 为中、29% 为低。GPT-4 对相似性的评估为高的问题占 48.3%,中等的问题占 41.9%,低的问题占 9.7%。外科医生和 GPT-4 就 18 个问题(占 58.1%)的分类达成了共识,就 13 个问题(占 41.9%)的分类存在分歧。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
2.70
自引率
0.00%
发文量
218
审稿时长
45 weeks
期刊最新文献
Continuous Meniscal Repair Technique Allows for Shorter Operative Time and Learning Curve Compared With Traditional Vertical Mattress Technique in Controlled Arthroscopic Training in Porcine Model Concomitant Popliteomeniscal Fascicles Tears Are Found in 21% of Professional Soccer Players With Acute Anterior Cruciate Ligament Injuries Mini-Open Technique for Gluteus Medius Tendon Repairs Is Associated With Low Complication Rates and Sustained Improvement in Patient Reported Outcomes at 2-Year Follow-Up The Top-20 Studies About Anterior Shoulder Instability From an Altmetric Analysis Had Higher Levels of Evidence Than Those From a Traditional Bibliometric Analysis Medial Patellofemoral Ligament Augmented With a Reinforced Bioinductive Implant Is Biomechanically Similar to the Native Medial Patellofemoral Ligament at Time Zero in a Cadaveric Model
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1