Dhruv Mendiratta, Isabel Herzog, Rohan Singh, Ashok Para, Tej Joshi, Michael Vosbikian, Neil Kaushal
{"title":"ChatGPT作为骨科实习考试准备工具的应用。","authors":"Dhruv Mendiratta, Isabel Herzog, Rohan Singh, Ashok Para, Tej Joshi, Michael Vosbikian, Neil Kaushal","doi":"10.1002/jeo2.70135","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Purpose</h3>\n \n <p>Chat Generative Pre-Trained Transformer (ChatGPT) may have implications as a novel educational resource. There are differences in opinion on the best resource for the Orthopaedic In-Training Exam (OITE) as information changes from year to year. This study assesses ChatGPT's performance on the OITE for use as a potential study resource for residents.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>Questions for the OITE data set were sourced from the American Academy of Orthopaedic Surgeons (AAOS) website. All questions from the 2022 OITE were included. All questions, including those with images, were included in the analysis. The questions were formatted in the same manner as presented on the AAOS website, with the question, narrative text and answer choices separated by a line. Each question was evaluated in a new chat session to minimize confounding variables. Answers from ChatGPT were characterized by whether they contained logical, internal or external information. Incorrect responses were further categorized into logical, informational or explicit fallacies.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>ChatGPT yielded an overall success rate of 48.3% based on the 2022 AAOS OITE. ChatGPT demonstrated the ability to apply logic and stepwise thinking in 67.6% of the questions. ChatGPT effectively utilized internal information from the question stem in 68.1% of the questions. ChatGPT also demonstrated the ability to incorporate external information in 68.1% of the questions. The utilization of logical reasoning (<i>p</i> < 0.001), internal information (<i>p</i> = 0.004) and external information (p = 0.009) was greater among correct responses than incorrect responses. Informational fallacy was the most common shortcoming of ChatGPT's responses. There was no difference in correct responses based on whether or not an image was present (<i>p</i> = 0.320).</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>ChatGPT demonstrates logical, informational and explicit fallacies which, at this time, may lead to misinformation and hinder resident education.</p>\n </section>\n \n <section>\n \n <h3> Level of Evidence</h3>\n \n <p>Level V.</p>\n </section>\n </div>","PeriodicalId":36909,"journal":{"name":"Journal of Experimental Orthopaedics","volume":"12 1","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11693985/pdf/","citationCount":"0","resultStr":"{\"title\":\"Utility of ChatGPT as a preparation tool for the Orthopaedic In-Training Examination\",\"authors\":\"Dhruv Mendiratta, Isabel Herzog, Rohan Singh, Ashok Para, Tej Joshi, Michael Vosbikian, Neil Kaushal\",\"doi\":\"10.1002/jeo2.70135\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n \\n <section>\\n \\n <h3> Purpose</h3>\\n \\n <p>Chat Generative Pre-Trained Transformer (ChatGPT) may have implications as a novel educational resource. There are differences in opinion on the best resource for the Orthopaedic In-Training Exam (OITE) as information changes from year to year. This study assesses ChatGPT's performance on the OITE for use as a potential study resource for residents.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Methods</h3>\\n \\n <p>Questions for the OITE data set were sourced from the American Academy of Orthopaedic Surgeons (AAOS) website. All questions from the 2022 OITE were included. All questions, including those with images, were included in the analysis. The questions were formatted in the same manner as presented on the AAOS website, with the question, narrative text and answer choices separated by a line. Each question was evaluated in a new chat session to minimize confounding variables. Answers from ChatGPT were characterized by whether they contained logical, internal or external information. Incorrect responses were further categorized into logical, informational or explicit fallacies.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Results</h3>\\n \\n <p>ChatGPT yielded an overall success rate of 48.3% based on the 2022 AAOS OITE. ChatGPT demonstrated the ability to apply logic and stepwise thinking in 67.6% of the questions. ChatGPT effectively utilized internal information from the question stem in 68.1% of the questions. ChatGPT also demonstrated the ability to incorporate external information in 68.1% of the questions. The utilization of logical reasoning (<i>p</i> < 0.001), internal information (<i>p</i> = 0.004) and external information (p = 0.009) was greater among correct responses than incorrect responses. Informational fallacy was the most common shortcoming of ChatGPT's responses. There was no difference in correct responses based on whether or not an image was present (<i>p</i> = 0.320).</p>\\n </section>\\n \\n <section>\\n \\n <h3> Conclusions</h3>\\n \\n <p>ChatGPT demonstrates logical, informational and explicit fallacies which, at this time, may lead to misinformation and hinder resident education.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Level of Evidence</h3>\\n \\n <p>Level V.</p>\\n </section>\\n </div>\",\"PeriodicalId\":36909,\"journal\":{\"name\":\"Journal of Experimental Orthopaedics\",\"volume\":\"12 1\",\"pages\":\"\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2025-01-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11693985/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Experimental Orthopaedics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/jeo2.70135\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ORTHOPEDICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Experimental Orthopaedics","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/jeo2.70135","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ORTHOPEDICS","Score":null,"Total":0}
Utility of ChatGPT as a preparation tool for the Orthopaedic In-Training Examination
Purpose
Chat Generative Pre-Trained Transformer (ChatGPT) may have implications as a novel educational resource. There are differences in opinion on the best resource for the Orthopaedic In-Training Exam (OITE) as information changes from year to year. This study assesses ChatGPT's performance on the OITE for use as a potential study resource for residents.
Methods
Questions for the OITE data set were sourced from the American Academy of Orthopaedic Surgeons (AAOS) website. All questions from the 2022 OITE were included. All questions, including those with images, were included in the analysis. The questions were formatted in the same manner as presented on the AAOS website, with the question, narrative text and answer choices separated by a line. Each question was evaluated in a new chat session to minimize confounding variables. Answers from ChatGPT were characterized by whether they contained logical, internal or external information. Incorrect responses were further categorized into logical, informational or explicit fallacies.
Results
ChatGPT yielded an overall success rate of 48.3% based on the 2022 AAOS OITE. ChatGPT demonstrated the ability to apply logic and stepwise thinking in 67.6% of the questions. ChatGPT effectively utilized internal information from the question stem in 68.1% of the questions. ChatGPT also demonstrated the ability to incorporate external information in 68.1% of the questions. The utilization of logical reasoning (p < 0.001), internal information (p = 0.004) and external information (p = 0.009) was greater among correct responses than incorrect responses. Informational fallacy was the most common shortcoming of ChatGPT's responses. There was no difference in correct responses based on whether or not an image was present (p = 0.320).
Conclusions
ChatGPT demonstrates logical, informational and explicit fallacies which, at this time, may lead to misinformation and hinder resident education.