Utility of ChatGPT as a preparation tool for the Orthopaedic In-Training Examination

IF 2 Q2 ORTHOPEDICS Journal of Experimental Orthopaedics Pub Date : 2025-01-02 DOI:10.1002/jeo2.70135
Dhruv Mendiratta, Isabel Herzog, Rohan Singh, Ashok Para, Tej Joshi, Michael Vosbikian, Neil Kaushal
{"title":"Utility of ChatGPT as a preparation tool for the Orthopaedic In-Training Examination","authors":"Dhruv Mendiratta,&nbsp;Isabel Herzog,&nbsp;Rohan Singh,&nbsp;Ashok Para,&nbsp;Tej Joshi,&nbsp;Michael Vosbikian,&nbsp;Neil Kaushal","doi":"10.1002/jeo2.70135","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Purpose</h3>\n \n <p>Chat Generative Pre-Trained Transformer (ChatGPT) may have implications as a novel educational resource. There are differences in opinion on the best resource for the Orthopaedic In-Training Exam (OITE) as information changes from year to year. This study assesses ChatGPT's performance on the OITE for use as a potential study resource for residents.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>Questions for the OITE data set were sourced from the American Academy of Orthopaedic Surgeons (AAOS) website. All questions from the 2022 OITE were included. All questions, including those with images, were included in the analysis. The questions were formatted in the same manner as presented on the AAOS website, with the question, narrative text and answer choices separated by a line. Each question was evaluated in a new chat session to minimize confounding variables. Answers from ChatGPT were characterized by whether they contained logical, internal or external information. Incorrect responses were further categorized into logical, informational or explicit fallacies.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>ChatGPT yielded an overall success rate of 48.3% based on the 2022 AAOS OITE. ChatGPT demonstrated the ability to apply logic and stepwise thinking in 67.6% of the questions. ChatGPT effectively utilized internal information from the question stem in 68.1% of the questions. ChatGPT also demonstrated the ability to incorporate external information in 68.1% of the questions. The utilization of logical reasoning (<i>p</i> &lt; 0.001), internal information (<i>p</i> = 0.004) and external information (p = 0.009) was greater among correct responses than incorrect responses. Informational fallacy was the most common shortcoming of ChatGPT's responses. There was no difference in correct responses based on whether or not an image was present (<i>p</i> = 0.320).</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>ChatGPT demonstrates logical, informational and explicit fallacies which, at this time, may lead to misinformation and hinder resident education.</p>\n </section>\n \n <section>\n \n <h3> Level of Evidence</h3>\n \n <p>Level V.</p>\n </section>\n </div>","PeriodicalId":36909,"journal":{"name":"Journal of Experimental Orthopaedics","volume":"12 1","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11693985/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Experimental Orthopaedics","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/jeo2.70135","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ORTHOPEDICS","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose

Chat Generative Pre-Trained Transformer (ChatGPT) may have implications as a novel educational resource. There are differences in opinion on the best resource for the Orthopaedic In-Training Exam (OITE) as information changes from year to year. This study assesses ChatGPT's performance on the OITE for use as a potential study resource for residents.

Methods

Questions for the OITE data set were sourced from the American Academy of Orthopaedic Surgeons (AAOS) website. All questions from the 2022 OITE were included. All questions, including those with images, were included in the analysis. The questions were formatted in the same manner as presented on the AAOS website, with the question, narrative text and answer choices separated by a line. Each question was evaluated in a new chat session to minimize confounding variables. Answers from ChatGPT were characterized by whether they contained logical, internal or external information. Incorrect responses were further categorized into logical, informational or explicit fallacies.

Results

ChatGPT yielded an overall success rate of 48.3% based on the 2022 AAOS OITE. ChatGPT demonstrated the ability to apply logic and stepwise thinking in 67.6% of the questions. ChatGPT effectively utilized internal information from the question stem in 68.1% of the questions. ChatGPT also demonstrated the ability to incorporate external information in 68.1% of the questions. The utilization of logical reasoning (p < 0.001), internal information (p = 0.004) and external information (p = 0.009) was greater among correct responses than incorrect responses. Informational fallacy was the most common shortcoming of ChatGPT's responses. There was no difference in correct responses based on whether or not an image was present (p = 0.320).

Conclusions

ChatGPT demonstrates logical, informational and explicit fallacies which, at this time, may lead to misinformation and hinder resident education.

Level of Evidence

Level V.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
ChatGPT作为骨科实习考试准备工具的应用。
目的:聊天生成预训练转换器(ChatGPT)可能具有作为一种新的教育资源的意义。关于骨科培训考试(OITE)的最佳资源,随着信息的逐年变化,存在着不同的意见。本研究评估ChatGPT在OITE上的表现,以作为居民潜在的研究资源。方法:OITE数据集的问题来自美国骨科医师学会(AAOS)网站。所有2022年OITE考试的题目都包括在内。所有问题,包括带有图像的问题,都被纳入分析。这些问题的格式与AAOS网站上的相同,问题、叙述文本和答案选项用一行分隔。每个问题都在一个新的聊天会话中进行评估,以尽量减少混杂变量。ChatGPT的回答的特点是它们是否包含逻辑信息、内部信息或外部信息。不正确的回答进一步分为逻辑谬误、信息谬误和显性谬误。结果:基于2022年AAOS OITE, ChatGPT的总成功率为48.3%。在67.6%的问题中,ChatGPT展示了运用逻辑和逐步思维的能力。在68.1%的问题中,ChatGPT有效地利用了问题系统的内部信息。ChatGPT还展示了在68.1%的问题中纳入外部信息的能力。正确回答中逻辑推理(p = 0.004)和外部信息(p = 0.009)的利用率高于错误回答。信息谬误是ChatGPT回复中最常见的缺点。是否有图像存在对正确反应没有影响(p = 0.320)。结论:ChatGPT显示逻辑,信息和明确的谬论,在这个时候,可能会导致错误的信息和阻碍居民教育。证据等级:V级。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Experimental Orthopaedics
Journal of Experimental Orthopaedics Medicine-Orthopedics and Sports Medicine
CiteScore
3.20
自引率
5.60%
发文量
114
审稿时长
13 weeks
期刊最新文献
Calcaneo-stop for paediatric idiopathic flexible flatfoot: High functional results and return to sport in 644 feet at mid-term follow-up Reproducibility of a new device for robotic-assisted TKA surgery The central fibre areas in the tibial footprint of the posterior cruciate ligament show the highest contribution to restriction of a posterior drawer force—A biomechanical robotic investigation The short version of the ALR-RSI scale is a valid and reproducible scale to evaluate psychological readiness to return to sport after ankle lateral reconstruction Which treatment strategy for irreparable rotator cuff tears is most cost-effective? A Markov model-based cost-utility analysis comparing superior capsular reconstruction, lower trapezius tendon transfer, subacromial balloon spacer implantation and reverse shoulder arthroplasty
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1