Utility of ChatGPT as a preparation tool for the Orthopaedic In-Training Examination

IF 2 Q2 ORTHOPEDICS Journal of Experimental Orthopaedics Pub Date : 2025-01-02 DOI:10.1002/jeo2.70135

Dhruv Mendiratta, Isabel Herzog, Rohan Singh, Ashok Para, Tej Joshi, Michael Vosbikian, Neil Kaushal

{"title":"Utility of ChatGPT as a preparation tool for the Orthopaedic In-Training Examination","authors":"Dhruv Mendiratta, Isabel Herzog, Rohan Singh, Ashok Para, Tej Joshi, Michael Vosbikian, Neil Kaushal","doi":"10.1002/jeo2.70135","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Purpose</h3>\n \n <p>Chat Generative Pre-Trained Transformer (ChatGPT) may have implications as a novel educational resource. There are differences in opinion on the best resource for the Orthopaedic In-Training Exam (OITE) as information changes from year to year. This study assesses ChatGPT's performance on the OITE for use as a potential study resource for residents.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>Questions for the OITE data set were sourced from the American Academy of Orthopaedic Surgeons (AAOS) website. All questions from the 2022 OITE were included. All questions, including those with images, were included in the analysis. The questions were formatted in the same manner as presented on the AAOS website, with the question, narrative text and answer choices separated by a line. Each question was evaluated in a new chat session to minimize confounding variables. Answers from ChatGPT were characterized by whether they contained logical, internal or external information. Incorrect responses were further categorized into logical, informational or explicit fallacies.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>ChatGPT yielded an overall success rate of 48.3% based on the 2022 AAOS OITE. ChatGPT demonstrated the ability to apply logic and stepwise thinking in 67.6% of the questions. ChatGPT effectively utilized internal information from the question stem in 68.1% of the questions. ChatGPT also demonstrated the ability to incorporate external information in 68.1% of the questions. The utilization of logical reasoning (<i>p</i> < 0.001), internal information (<i>p</i> = 0.004) and external information (p = 0.009) was greater among correct responses than incorrect responses. Informational fallacy was the most common shortcoming of ChatGPT's responses. There was no difference in correct responses based on whether or not an image was present (<i>p</i> = 0.320).</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>ChatGPT demonstrates logical, informational and explicit fallacies which, at this time, may lead to misinformation and hinder resident education.</p>\n </section>\n \n <section>\n \n <h3> Level of Evidence</h3>\n \n <p>Level V.</p>\n </section>\n </div>","PeriodicalId":36909,"journal":{"name":"Journal of Experimental Orthopaedics","volume":"12 1","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11693985/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Experimental Orthopaedics","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/jeo2.70135","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ORTHOPEDICS","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose

Chat Generative Pre-Trained Transformer (ChatGPT) may have implications as a novel educational resource. There are differences in opinion on the best resource for the Orthopaedic In-Training Exam (OITE) as information changes from year to year. This study assesses ChatGPT's performance on the OITE for use as a potential study resource for residents.

Methods

Questions for the OITE data set were sourced from the American Academy of Orthopaedic Surgeons (AAOS) website. All questions from the 2022 OITE were included. All questions, including those with images, were included in the analysis. The questions were formatted in the same manner as presented on the AAOS website, with the question, narrative text and answer choices separated by a line. Each question was evaluated in a new chat session to minimize confounding variables. Answers from ChatGPT were characterized by whether they contained logical, internal or external information. Incorrect responses were further categorized into logical, informational or explicit fallacies.

Results

ChatGPT yielded an overall success rate of 48.3% based on the 2022 AAOS OITE. ChatGPT demonstrated the ability to apply logic and stepwise thinking in 67.6% of the questions. ChatGPT effectively utilized internal information from the question stem in 68.1% of the questions. ChatGPT also demonstrated the ability to incorporate external information in 68.1% of the questions. The utilization of logical reasoning (p < 0.001), internal information (p = 0.004) and external information (p = 0.009) was greater among correct responses than incorrect responses. Informational fallacy was the most common shortcoming of ChatGPT's responses. There was no difference in correct responses based on whether or not an image was present (p = 0.320).

Conclusions

ChatGPT demonstrates logical, informational and explicit fallacies which, at this time, may lead to misinformation and hinder resident education.

Level of Evidence

Level V.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

ChatGPT作为骨科实习考试准备工具的应用。

目的：聊天生成预训练转换器（ChatGPT）可能具有作为一种新的教育资源的意义。关于骨科培训考试（OITE）的最佳资源，随着信息的逐年变化，存在着不同的意见。本研究评估ChatGPT在OITE上的表现，以作为居民潜在的研究资源。方法：OITE数据集的问题来自美国骨科医师学会（AAOS）网站。所有2022年OITE考试的题目都包括在内。所有问题，包括带有图像的问题，都被纳入分析。这些问题的格式与AAOS网站上的相同，问题、叙述文本和答案选项用一行分隔。每个问题都在一个新的聊天会话中进行评估，以尽量减少混杂变量。ChatGPT的回答的特点是它们是否包含逻辑信息、内部信息或外部信息。不正确的回答进一步分为逻辑谬误、信息谬误和显性谬误。结果：基于2022年AAOS OITE， ChatGPT的总成功率为48.3%。在67.6%的问题中，ChatGPT展示了运用逻辑和逐步思维的能力。在68.1%的问题中，ChatGPT有效地利用了问题系统的内部信息。ChatGPT还展示了在68.1%的问题中纳入外部信息的能力。正确回答中逻辑推理（p = 0.004）和外部信息（p = 0.009）的利用率高于错误回答。信息谬误是ChatGPT回复中最常见的缺点。是否有图像存在对正确反应没有影响（p = 0.320）。结论：ChatGPT显示逻辑，信息和明确的谬论，在这个时候，可能会导致错误的信息和阻碍居民教育。证据等级：V级。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊